Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Label define"

    Fellow statalisters,

    I would really appreciate if there is a smarter way to define labels. I have 10years of data containing roughly 3million observations and 106 variables each year. Out of those 106 I need to define labels for 50 variables and the labels vary across years. to get an idea please see the dataex below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(B3_q5 B3_q10) int(B5_q1 B10_q1)
    1 1 327 467
    1 1 282 459
    1 1 301 459
    1 1 129 479
    1 1 280 459
    1 1 279 519
    1 1 179 459
    1 1 309 502
    1 1 229 459
    1 1 148 459
    1 1 308 459
    1 1 174 453
    1 1 289 459
    1 1 288 459
    1 1 190 459
    1 1 214 459
    1 1 191 459
    1 1 290 459
    1 1 102 459
    1 1 211 459
    1 1 212 459
    1 1 329 459
    1 1 159 459
    1 1 180 454
    1 1 309 454
    1 1 148 502
    1 1 214 453
    1 1 290 420
    1 1 288 479
    1 1 189 454
    1 1 159 454
    1 1 229 454
    1 1 174 450
    1 1 301 454
    1 1 211 454
    1 1 129 467
    1 1 308 429
    1 1 279 454
    1 1 102 519
    1 1 212 454
    1 1 280 454
    1 1 289 459
    1 1 190 454
    1 1 191 454
    1 1 327 454
    1 1 303 454
    1 1 179 454
    1 1 282 454
    1 1 329 454
    1 1 140 510
    1 1 174 452
    1 1 290 539
    1 1 152 483
    1 1 108 452
    1 1 301 452
    1 1 287 452
    1 1 103 479
    1 1 240 452
    1 1 245 420
    1 1 259 492
    1 1 207 472
    1 1 201 452
    1 1 309 451
    1 1 190 452
    1 1 308 470
    1 1 261 452
    1 1 216 452
    1 1 289 452
    1 1 169 499
    1 1 285 429
    1 1 111 452
    1 1 251 437
    1 1 202 493
    1 1 211 467
    1 1 282 452
    1 1 291 549
    1 1 300 456
    1 1 288 540
    1 1 160 459
    1 1 279 453
    1 1 249 452
    1 1 214 502
    1 1 102 443
    1 1 191 452
    1 1 229 452
    1 1 159 494
    1 1 221 519
    1 1 179 439
    1 1 222 452
    1 1 269 452
    1 1 283 457
    1 1 230 454
    1 1 164 452
    1 1 129 450
    1 1 280 452
    1 1 256 449
    1 1 283 483
    1 1 245 451
    1 1 280 492
    1 1 190 510
    end
    label values B3_q5 B3_q5
    label def B3_q5 1 "Hinduism", modify
    label values B3_q10 B3_q10
    label def B3_q10 1 "pucca", modify

    B3_q5=religion having values Hinduism-1, Islam-2, Christianity –3, Sikhism-4, Jainism-5, Buddhism-6, Zoroastrianism-7, others-9
    B3_q10= type of house structure: pucca-1, semi-pucca-2, serviceable katcha –3, unserviceable katcha – 4, no structure-5

    However B5_q1 has values 100,101,102,.....,339 signifying different food for example 100 signifies rice 101 signifies potato 102 signifies radish, etc.
    B10_q1 has values 420,421,422,...,549 signifying expenditure on various things such as 420 signifies medical expense, 430 signifies movies expense, etc.

    Labelling them in the manner below manually would be quite demanding

    label define B3_q5 1 "Hinduism" 2 "Islam" 3 "Christianity" 4 "Sikhism" 5 "Jainism" 6 "Buddhism" 7 "Zoroastrianism" 9 "others"
    label values B3_q5 B3_q5

    label define B3_q10 1 "pucca" 2 "semi pucca" 3 "serviceable katcha" 4 "unserviceable katcha" 5 "no structures"
    label values B3_q10 B3_q10

    I want to know if there is a smarter way how I can approach this. for example, can stata read a codebook somehow and label them accordingly?

  • #2
    Originally posted by Anustup Kundu View Post
    for example, can stata read a codebook somehow and label them accordingly?
    Perhaps; it all depends on the format of this codebook. Stata can process a plain text (ASCII) file pretty well. Can you say more about this?

    Best
    Daniel

    Comment


    • #3
      Originally posted by daniel klein View Post

      Perhaps; it all depends on the format of this codebook. Stata can process a plain text (ASCII) file pretty well. Can you say more about this?

      Best
      Daniel
      Hi Daniel,
      Thank you for getting back to me. I am afraid they aren't in ASCII format. Most of the definition comes from the questionnaire. But if they were in ASCII format, how would one go about it?

      regards,
      Anustup

      Comment


      • #4
        I am sorry, but you would really need to show the exact structure of the files to get specific answers. For the generic approach, see

        Code:
        help file
        Best
        Daniel

        Comment


        • #5
          Hi Daniel,

          I want to define B10_q1 label no and B10_q1 label name for my entire B10_q1. How would I go about it? There are several other variable for which i need to do the same thing so I wanted to know if I can tackle this issue in a smarter way.

          Code:
           
          B10_q1 label no B10_q1 label name
          420 medicine (allopathic)
          421 medicine (homeopathic)
          422 medicine (ayurvedic)
          423 medicine (unani)
          424 medicine (others)
          425 X-ray, ECG, pathological test, etc.
          426 doctor’s/ surgeon’s fee
          427 family planning appliances
          428 other medical expenses
          429 medical – non-institutional: sub-total (420-428)
          430 cinema, theatre
          431 mela, fair, picnic
          432 sports goods, toys, etc.
          433 club fees
          434 goods for recreation and hobbies
          435 photography
          436 video cassette/ VCR / VCP – hire
          437 cable TV
          438 other entertainment
          439 entertainment: sub-total (430-438)
          440 spectacles
          441 torch
          442 lock
          443 umbrella, raincoat
          444 lighter (bidi/ cigarette/ gas stove)
          445 other goods for personal care and effects
          449 goods for personal care and effects: sub-total (440-445)
          450 toilet soap
          451 toothpaste, toothbrush, comb, etc.
          452 powder, snow, cream, lotion
          453 hair oil, shampoo, hair cream
          454 shaving blades, shaving stick, razor
          455 shaving cream
          456 sanitary napkins
          457 other toilet articles
          459 toilet articles: sub-total (450-457)
          460 electric bulb, tubelight
          461 electric batteries
          462 other non-durable electric goods
          463 earthenware
          464 glassware
          465 bucket, water bottle/ feeding bottle & other plastic goods
          466 coir, rope, etc.
          467 washing soap/soda
          468 other washing requisites
          470 incense (agarbatti), room freshener
          471 flower (fresh): all purposes
          472 mosquito mat, insecticide, acid etc.
          473 other petty articles
          479 sundry articles: sub-total (460-473)
          480 domestic servant/cook
          481 attendant
          482 sweeper
          483 barber, beautician, etc.
          484 washerman, laundry, ironing
          485 tailor
          486 priest
          487 legal expenses
          488 telephone charges: landline
          490 telephone charges: mobile
          491 postage & telegram
          492 miscellaneous expenses
          493 grinding charges
          494 repair charges for non-durables
          495 pet animals (incl. birds, fish)
          496 other consumer services excluding conveyance
          499 consumer services excluding conveyance: sub-total (480-496)
          500 air fare
          501 railway fare
          502 bus/tram fare
          503 taxi, auto-rickshaw fare
          504 steamer, boat fare
          505 rickshaw (hand drawn & cycle) fare
          506 horse cart fare
          507 porter charges
          508 diesel for vehicle
          510 petrol, other fuels & lubricants for vehicle
          511 school bus/van
          512 other conveyance expenses
          519 conveyance : sub-total (500-512)
          520 house rent, garage rent (actual)
          521 hotel lodging charges
          522 residential land rent
          523 other consumer rent
          529 rent: sub-total (520-523)
          539 house rent, garage rent (imputed- urban only)
          540 water charges
          541 other consumer taxes & cesses
          549 consumer taxes and cesses: sub-total (540-541)
          Regards,
          Anustup
          Last edited by Anustup Kundu; 02 Jun 2018, 10:41.

          Comment


          • #6
            What exactly is it that you show here? Is this copied from a text file, asis? Are these (string) variables in Stata?

            Best
            Daniel

            Comment


            • #7
              Hi,
              I am sorry for not being clearer, I am using household consumer expenditure data. Variable B10_q1 shows expenditure on durable goods. B10_q1 can have multiple values between (B10_q1 label no) 420-549. Each of these values signifies expenditure on some durable good (B10_q1 label name) .

              My question now is I want to have these label names embedded in the B10_q1 along with their values.

              I have to label a large no of variable apart from B10_q1. So I was wondering if there is a smart way to tackle this issue? I found LABELDATADSYNTAX by Mark Chatfield pretty similar to what I want to do but I haven't been able to figure it out completely.
              Last edited by Anustup Kundu; 02 Jun 2018, 11:30.

              Comment


              • #8
                I understand what you are trying to do.

                What I do not get is where and how the information you show in #5 is stored. That is, what is the format that holds the values to text mappings?

                You stated that

                Most of the definition comes from the questionnaire.
                but never explained what this means. Is the questionnaire a pdf document? Is it a Word document? Is it some sort of spreadsheet?

                Best
                Daniel

                Comment


                • #9
                  Dear Daniel,
                  Thank you for trying to understand my question and I am really sorry for not making things clearer earlier.

                  What I do not get is where and how the information you show in #5 is stored
                  I initially found the information from the survey questionnaire. The survey questionnaire is a PDF document. The information in my #5 post now comes from

                  http://icssrdataservice.in/datarepos...ata-dictionary

                  where they define the variable 1B10_q1 and all the other variables. The dictionaries are in PDF format.

                  Regards,
                  Anustup

                  Comment


                  • #10
                    Hm, this is not going to be simple. I would ask the data provider whether they have value label information in another format, preferably integrated into Stata datasets.

                    Otherwise, your best option is probably to copy-paste the contents into a plain text file (e.g., do-file editor) and read it from there. Assume that you have the contents (i.e., everything inside code delimiters) of #5 saved to labels.txt, then

                    Code:
                    tempname fh
                    tempfile tmp
                    
                        // tab stops to spaces
                    filefilter labels.txt `tmp' , from(\t) to(" ")
                    
                        // open file, read first line
                    file open `fh' using `tmp' , read
                    file read `fh' dump // first line has no value to text mapping
                    file read `fh' line
                        
                        // read successive lines and define label
                    while (!r(eof)) {
                        gettoken value line : line
                        label define B10_q1 `value' `"`line'"' , modify
                        file read `fh' line
                    }
                    file close `fh'
                    
                    label list
                    results in

                    Code:
                    .         // tab stops to spaces
                    . filefilter labels.txt `tmp' , from(\t) to(" ")
                    
                    . 
                    .         // open file, read first line
                    . file open `fh' using `tmp' , read
                    
                    . file read `fh' dump // first line has no value to text mapping
                    
                    . file read `fh' line
                    
                    .         
                    .         // read successive lines and define label
                    . while (!r(eof)) {
                      2.         gettoken value line : line
                      3.         label define B10_q1 `value' `"`line'"' , modify
                      4.         file read `fh' line
                      5. }
                    
                    . file close `fh'
                    
                    . 
                    . label list
                    B10_q1:
                             420  medicine (allopathic)
                             421  medicine (homeopathic)
                             422  medicine (ayurvedic)
                             423  medicine (unani)
                             424  medicine (others)
                             425  X-ray, ECG, pathological test, etc.
                             426  doctor’s/ surgeon’s fee
                    output omitted
                             522  residential land rent
                             523  other consumer rent
                             529  rent: sub-total (520-523)
                             539  house rent, garage rent (imputed- urban only)
                             540  water charges
                             541  other consumer taxes & cesses
                             549  consumer taxes and cesses: sub-total (540-541)
                    
                    . 
                    end of do-file
                    Best
                    Daniel

                    Comment


                    • #11
                      Hi,

                      I stumbled upon this post about two hours ago. I fully agree with daniel klein that it would be best to save the metadata (value labels etc) to a plain text file, and parse it afterwards. After I read his solution, I followed the hyperlink Anustup Kundu posted in #9.

                      Sorry, this will get a little excessive now. Surfing around on the data provider's website, I found that they provide a DDI (1.2.2 -- this is a quite old format) metadata file for download in the section "Study Description --> Download Metadata". This plaintext XML file contains all we need, so we can read this into Stata (with -file-, as Daniel demonstrated), and label everything accordingly.

                      So I sat down to produce some example code. It took a while. It's not perfect. It is not thoroughly tested. It's not the most efficient or elegant code snippet I have ever produced, sorry. More or less, it is a quick shot for XML parsing, specialized on this very specific XML DDI file.

                      However, it does the trick for the example data: It can (1) download the DDI XML file, (2) read all labels (including variable labels) from it, and (3) apply these labels to the (example) dataset in memory.

                      Code:
                      clear
                      cls
                      version 14 // this do-file requires Stata 14, as it's using Unicode string functions
                      * input information: URL to fetch DDI metadata from
                      local sourceurl "http://icssrdataservice.in/datarepository/index.php/catalog/ddi/82"
                      local debug 0 // flip this to 1 to see muuuuch output
                      
                      // example data -- replace this with a "use" or "import delimited" statement, if you wish
                      input byte(B3_q5 B3_q10) int(B5_q1 B10_q1)
                      1 1 327 467
                      1 1 282 459
                      1 1 301 459
                      1 1 129 479
                      1 1 280 459
                      1 1 279 519
                      1 1 179 459
                      1 1 309 502
                      1 1 229 459
                      1 1 148 459
                      1 1 308 459
                      1 1 174 453
                      1 1 289 459
                      1 1 288 459
                      1 1 190 459
                      1 1 214 459
                      1 1 191 459
                      1 1 290 459
                      1 1 102 459
                      1 1 211 459
                      1 1 212 459
                      1 1 329 459
                      1 1 159 459
                      1 1 180 454
                      1 1 309 454
                      1 1 148 502
                      1 1 214 453
                      1 1 290 420
                      1 1 288 479
                      1 1 189 454
                      1 1 159 454
                      1 1 229 454
                      1 1 174 450
                      1 1 301 454
                      1 1 211 454
                      1 1 129 467
                      1 1 308 429
                      1 1 279 454
                      1 1 102 519
                      1 1 212 454
                      1 1 280 454
                      1 1 289 459
                      1 1 190 454
                      1 1 191 454
                      1 1 327 454
                      1 1 303 454
                      1 1 179 454
                      1 1 282 454
                      1 1 329 454
                      1 1 140 510
                      1 1 174 452
                      1 1 290 539
                      1 1 152 483
                      1 1 108 452
                      1 1 301 452
                      1 1 287 452
                      1 1 103 479
                      1 1 240 452
                      1 1 245 420
                      1 1 259 492
                      1 1 207 472
                      1 1 201 452
                      1 1 309 451
                      1 1 190 452
                      1 1 308 470
                      1 1 261 452
                      1 1 216 452
                      1 1 289 452
                      1 1 169 499
                      1 1 285 429
                      1 1 111 452
                      1 1 251 437
                      1 1 202 493
                      1 1 211 467
                      1 1 282 452
                      1 1 291 549
                      1 1 300 456
                      1 1 288 540
                      1 1 160 459
                      1 1 279 453
                      1 1 249 452
                      1 1 214 502
                      1 1 102 443
                      1 1 191 452
                      1 1 229 452
                      1 1 159 494
                      1 1 221 519
                      1 1 179 439
                      1 1 222 452
                      1 1 269 452
                      1 1 283 457
                      1 1 230 454
                      1 1 164 452
                      1 1 129 450
                      1 1 280 452
                      1 1 256 449
                      1 1 283 483
                      1 1 245 451
                      1 1 280 492
                      1 1 190 510
                      end
                      
                      * temporary names
                      tempname fh
                      tempfile ddifile
                      
                      * download DDI metadata, copy to local (temporary) file
                      copy "`sourceurl'" "`ddifile'"
                      
                      // open DDI file
                      file open `fh' using `ddifile' , read text
                      file read `fh' line
                      // read DDI file, save variable and value labels
                      while (!r(eof)) {
                          if (missing(`"`varname'"')) {
                              if (ustrregexm(`"`line'"',`"<var ID="[^"]+" name="([^"]+)""')) {
                                  local varname=ustrregexs(1)
                                  if (`"`debug'"'=="1") display as text `"{text}variable {result}`varname'"'
                                  file read `fh' line
                                  continue
                              }
                          }
                          else {
                              // check for end of variable tag
                              if (ustrregexm(`"`line'"',"</var>")) {
                                  local varname
                                  file read `fh' line
                                  continue
                              }
                              // check for value label tag
                              if (ustrregexm(`"`line'"',`"<catgry>"')) {
                                  file read `fh' line
                                  while (!ustrregexm(`"`line'"',`"</catgry>"')) {
                                      // check for value
                                      if (ustrregexm(`"`line'"',`"<catValu>"')) {
                                          file read `fh' line
                                          local value=ustrtrim(`"`line'"')
                                          file read `fh' line
                                          assert (ustrregexm(`"`line'"',`"</catValu>"'))
                                          capture : numlist `"`value'"'
                                          if (_rc!=0) {
                                              display as error `"non-numeric value {res:`value'} in variable {res:`varname'} can not be labeled in Stata, will be skipped"'
                                              local value
                                              continue
                                          }
                                      }
                                      // check for label
                                      else if (ustrregexm(`"`line'"',`"<labl>"')) {
                                          // special case: CDATA element inline with </labl>
                                          if (ustrregexm(`"`line'"',`"</labl>"')) {
                                              assert (ustrregexm(`"`line'"',`"<labl><!\[CDATA\[([^\]]+)\]\]></labl>"'))
                                              local vallab=ustrtrim(ustrregexs(1))
                                          }
                                          // regular case: no CDATA, next line is value label, line after this is </labl>
                                          else {
                                              file read `fh' line
                                              local vallab=ustrtrim(`"`line'"')
                                              file read `fh' line
                                              assert (ustrregexm(`"`line'"',`"</labl>"'))
                                          }
                                      }
                                      // found value/label pair?
                                      if (!missing(`"`value'"') & !missing(`"`vallab'"')) {
                                          if (`"`debug'"'=="1") display as text `"{text}value label for value {result}`value'{text} of variable {res:`varname'}: "{result}`vallab'""'
                                          if (missing(`"`: label lbl_`varname' `value' , strict'"')) label define lbl_`varname' `value' `"`vallab'"' , add
                                          else if (`"`: label lbl_`varname' `value', strict'"'!=`"`vallab'"') {
                                              display as error `"duplicate, conflicting value labels for variable {res:`varname'} in metadata, latter one will be ignored:"' _newline ///
                                              `"{tab}{res:`: label lbl_`varname' `value', strict'}{tab}vs.{tab}{res:`vallab'}"'
                                          }
                                          local value
                                          local vallab
                                      }
                                      file read `fh' line
                                  }
                                  file read `fh' line
                                  continue    
                              }
                              // check for variable label tag
                              if (ustrregexm(`"`line'"',`"<labl>"')) {
                                  file read `fh' line
                                  local varlab_`varname'=ustrtrim(`"`line'"')
                                  if (`"`debug'"'=="1") display as text `"{text}variable label for variable {res:`varname'}: {result}`varlab_`varname''"'
                                  file read `fh' line
                                  assert (ustrregexm(`"`line'"',`"</labl>"'))
                                  file read `fh' line
                                  continue
                              }
                          }
                          file read `fh' line
                      }
                      * close DDI file
                      file close `fh'
                      
                      * _all_ value and variable labels have been read from the DDI file to memory now;
                      * they can be applied to each variable in the dataset
                      * using the following simple looping mechanism:
                      foreach var of varlist _all {
                          if (`: label lbl_`var' maxlength'>0) label values `var' lbl_`var'
                          if (!missing(`"`varlab_`var''"')) label variable `var' `"`varlab_`var''"'
                      }
                      describe // variables and values have been labeled
                      list // many value labels!
                      Be warned, it is not thoroughly tested, and may terminate in case there are peculiarities in the DDI XML that Stata does not support. Some of these peculiarities, namely value labels for non-numeric values and duplicate, conflicting value labels, are detected and skipped, but others may not.

                      As DDI is more generous in defining names and objects, there could be (in Stata terms) illegal variable names in the DDI file, or illegal value label names (they are constructed on-the-fly as 'lbl_<VARIABLE>' (which is not the best thing to do). There are several other issues that would require more coding to make the code a more generic program. So use the code with care, I hope it helps.

                      Note that this code does not transcode HTML entities (i.e. the string "&amp;" instead of "&") in the labels.

                      Regards
                      Bela
                      Last edited by Daniel Bela; 04 Jun 2018, 10:37. Reason: added note about HTML entities

                      Comment


                      • #12
                        Copy-paste into a plain text is not the best option, ... and complicated. Instead, just simply select, copy the relevant info (from your link), and paste directly into (stata) data edit mode. Such copy-paste provides the necessary outcome as showed below.

                        Then, the next step is "transferring" the label value into your file with the help of labmask, a wonderful tool created by Nick Cox.
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input int value str59 category
                        420 "medicine (allopathic)"                                      
                        421 "medicine (homeopathic)"                                    
                        422 "medicine (ayurvedic)"                                      
                        423 "medicine (unani)"                                          
                        424 "medicine (others)"                                          
                        425 "X-ray, ECG, pathological test, etc."                        
                        426 "doctor’s/ surgeon’s fee"                                
                        427 "family planning appliances"                                
                        428 "other medical expenses"                                    
                        429 "medical – non-institutional: sub-total (420-428)"        
                        430 "cinema, theatre"                                            
                        431 "mela, fair, picnic"                                        
                        432 "sports goods, toys, etc."                                  
                        433 "club fees"                                                  
                        434 "goods for recreation and hobbies"                          
                        435 "photography"                                                
                        436 "video cassette/ VCR / VCP – hire"                        
                        437 "cable TV"                                                  
                        438 "other entertainment"                                        
                        439 "entertainment: sub-total (430-438)"                        
                        440 "spectacles"                                                
                        441 "torch"                                                      
                        442 "lock"                                                      
                        443 "umbrella, raincoat"                                        
                        444 "lighter (bidi/ cigarette/ gas stove)"                      
                        445 "other goods for personal care and effects"                  
                        449 "goods for personal care and effects: sub-total (440-445)"  
                        450 "toilet soap"                                                
                        451 "toothpaste, toothbrush, comb, etc."                        
                        452 "powder, snow, cream, lotion"                                
                        453 "hair oil, shampoo, hair cream"                              
                        454 "shaving blades, shaving stick, razor"                      
                        455 "shaving cream"                                              
                        456 "sanitary napkins"                                          
                        457 "other toilet articles"                                      
                        459 "toilet articles: sub-total (450-457)"                      
                        460 "electric bulb, tubelight"                                  
                        461 "electric batteries"                                        
                        462 "other non-durable electric goods"                          
                        463 "earthenware"                                                
                        464 "glassware"                                                  
                        465 "bucket, water bottle/ feeding bottle &amp;amp;amp;amp; other plastic goods"
                        466 "coir, rope, etc."                                          
                        467 "washing soap/soda"                                          
                        468 "other washing requisites"                                  
                        470 "incense (agarbatti), room freshener"                        
                        471 "flower (fresh): all purposes"                              
                        472 "mosquito mat, insecticide, acid etc."                      
                        473 "other petty articles"                                      
                        479 "sundry articles: sub-total (460-473)"                      
                        480 "domestic servant/cook"                                      
                        481 "attendant"                                                  
                        482 "sweeper"                                                    
                        483 "barber, beautician, etc."                                  
                        484 "washerman, laundry, ironing"                                
                        485 "tailor"                                                    
                        486 "priest"                                                    
                        487 "legal expenses"                                            
                        488 "telephone charges: landline"                                
                        490 "telephone charges: mobile"                                  
                        491 "postage &amp;amp;amp;amp; telegram"                                        
                        492 "miscellaneous expenses"                                    
                        493 "grinding charges"                                          
                        494 "repair charges for non-durables"                            
                        495 "pet animals (incl. birds, fish)"                            
                        496 "other consumer services excluding conveyance"              
                        499 "consumer services excluding conveyance: sub-total (480-496)"
                        500 "air fare"                                                  
                        501 "railway fare"                                              
                        502 "bus/tram fare"                                              
                        503 "taxi, auto-rickshaw fare"                                  
                        504 "steamer, boat fare"                                        
                        505 "rickshaw (hand drawn &amp;amp;amp;amp; cycle) fare"                        
                        506 "horse cart fare"                                            
                        507 "porter charges"                                            
                        508 "diesel for vehicle"                                        
                        510 "petrol, other fuels &amp;amp;amp;amp; lubricants for vehicle"              
                        511 "school bus/van"                                            
                        512 "other conveyance expenses"                                  
                        519 "conveyance : sub-total (500-512)"                          
                        520 "house rent, garage rent (actual)"                          
                        521 "hotel lodging charges"                                      
                        522 "residential land rent"                                      
                        523 "other consumer rent"                                        
                        529 "rent: sub-total (520-523)"                                  
                        539 "house rent, garage rent (imputed- urban only)"              
                        540 "water charges"                                              
                        541 "other consumer taxes &amp;amp;amp;amp; cesses"                              
                        549 "consumer taxes and cesses: sub-total (540-541)"            
                        end
                        
                        tempfile temp
                        save `temp'
                        use yourfile, clear
                        gen value = B10_q1
                        joinby value using `temp', unm(m)
                        labmask B10_q1, value(category)
                        drop value category _merge

                        Comment


                        • #13
                          Originally posted by Daniel Bela View Post
                          Hi,

                          I stumbled upon this post about two hours ago. I fully agree with daniel klein that it would be best to save the metadata (value labels etc) to a plain text file, and parse it afterwards. After I read his solution, I followed the hyperlink Anustup Kundu posted in #9.

                          Sorry, this will get a little excessive now. Surfing around on the data provider's website, I found that they provide a DDI (1.2.2 -- this is a quite old format) metadata file for download in the section "Study Description --&amp;gt; Download Metadata". This plaintext XML file contains all we need, so we can read this into Stata (with -file-, as Daniel demonstrated), and label everything accordingly.

                          So I sat down to produce some example code. It took a while. It's not perfect. It is not thoroughly tested. It's not the most efficient or elegant code snippet I have ever produced, sorry. More or less, it is a quick shot for XML parsing, specialized on this very specific XML DDI file.

                          However, it does the trick for the example data: It can (1) download the DDI XML file, (2) read all labels (including variable labels) from it, and (3) apply these labels to the (example) dataset in memory.

                          Code:
                          clear
                          cls
                          version 14 // this do-file requires Stata 14, as it's using Unicode string functions
                          * input information: URL to fetch DDI metadata from
                          local sourceurl "http://icssrdataservice.in/datarepository/index.php/catalog/ddi/82"
                          local debug 0 // flip this to 1 to see muuuuch output
                          
                          // example data -- replace this with a "use" or "import delimited" statement, if you wish
                          input byte(B3_q5 B3_q10) int(B5_q1 B10_q1)
                          1 1 327 467
                          1 1 282 459
                          1 1 301 459
                          1 1 129 479
                          1 1 280 459
                          1 1 279 519
                          1 1 179 459
                          1 1 309 502
                          1 1 229 459
                          1 1 148 459
                          1 1 308 459
                          1 1 174 453
                          1 1 289 459
                          1 1 288 459
                          1 1 190 459
                          1 1 214 459
                          1 1 191 459
                          1 1 290 459
                          1 1 102 459
                          1 1 211 459
                          1 1 212 459
                          1 1 329 459
                          1 1 159 459
                          1 1 180 454
                          1 1 309 454
                          1 1 148 502
                          1 1 214 453
                          1 1 290 420
                          1 1 288 479
                          1 1 189 454
                          1 1 159 454
                          1 1 229 454
                          1 1 174 450
                          1 1 301 454
                          1 1 211 454
                          1 1 129 467
                          1 1 308 429
                          1 1 279 454
                          1 1 102 519
                          1 1 212 454
                          1 1 280 454
                          1 1 289 459
                          1 1 190 454
                          1 1 191 454
                          1 1 327 454
                          1 1 303 454
                          1 1 179 454
                          1 1 282 454
                          1 1 329 454
                          1 1 140 510
                          1 1 174 452
                          1 1 290 539
                          1 1 152 483
                          1 1 108 452
                          1 1 301 452
                          1 1 287 452
                          1 1 103 479
                          1 1 240 452
                          1 1 245 420
                          1 1 259 492
                          1 1 207 472
                          1 1 201 452
                          1 1 309 451
                          1 1 190 452
                          1 1 308 470
                          1 1 261 452
                          1 1 216 452
                          1 1 289 452
                          1 1 169 499
                          1 1 285 429
                          1 1 111 452
                          1 1 251 437
                          1 1 202 493
                          1 1 211 467
                          1 1 282 452
                          1 1 291 549
                          1 1 300 456
                          1 1 288 540
                          1 1 160 459
                          1 1 279 453
                          1 1 249 452
                          1 1 214 502
                          1 1 102 443
                          1 1 191 452
                          1 1 229 452
                          1 1 159 494
                          1 1 221 519
                          1 1 179 439
                          1 1 222 452
                          1 1 269 452
                          1 1 283 457
                          1 1 230 454
                          1 1 164 452
                          1 1 129 450
                          1 1 280 452
                          1 1 256 449
                          1 1 283 483
                          1 1 245 451
                          1 1 280 492
                          1 1 190 510
                          end
                          
                          * temporary names
                          tempname fh
                          tempfile ddifile
                          
                          * download DDI metadata, copy to local (temporary) file
                          copy "`sourceurl'" "`ddifile'"
                          
                          // open DDI file
                          file open `fh' using `ddifile' , read text
                          file read `fh' line
                          // read DDI file, save variable and value labels
                          while (!r(eof)) {
                          if (missing(`"`varname'"')) {
                          if (ustrregexm(`"`line'"',`"&amp;lt;var ID="[^"]+" name="([^"]+)""')) {
                          local varname=ustrregexs(1)
                          if (`"`debug'"'=="1") display as text `"{text}variable {result}`varname'"'
                          file read `fh' line
                          continue
                          }
                          }
                          else {
                          // check for end of variable tag
                          if (ustrregexm(`"`line'"',"&amp;lt;/var&amp;gt;")) {
                          local varname
                          file read `fh' line
                          continue
                          }
                          // check for value label tag
                          if (ustrregexm(`"`line'"',`"&amp;lt;catgry&amp;gt;"')) {
                          file read `fh' line
                          while (!ustrregexm(`"`line'"',`"&amp;lt;/catgry&amp;gt;"')) {
                          // check for value
                          if (ustrregexm(`"`line'"',`"&amp;lt;catValu&amp;gt;"')) {
                          file read `fh' line
                          local value=ustrtrim(`"`line'"')
                          file read `fh' line
                          assert (ustrregexm(`"`line'"',`"&amp;lt;/catValu&amp;gt;"'))
                          capture : numlist `"`value'"'
                          if (_rc!=0) {
                          display as error `"non-numeric value {res:`value'} in variable {res:`varname'} can not be labeled in Stata, will be skipped"'
                          local value
                          continue
                          }
                          }
                          // check for label
                          else if (ustrregexm(`"`line'"',`"&amp;lt;labl&amp;gt;"')) {
                          // special case: CDATA element inline with &amp;lt;/labl&amp;gt;
                          if (ustrregexm(`"`line'"',`"&amp;lt;/labl&amp;gt;"')) {
                          assert (ustrregexm(`"`line'"',`"&amp;lt;labl&amp;gt;&amp;lt;!&lt;span class="MathJax_Preview" style="color: inherit;"&gt;&lt;span class="MJXp-math MJXp-display" id="MJXp-Span-1"&gt;&lt;span class="MJXp-merror" id="MJXp-Span-2"&gt;&lt;span class="MJXp-mtext" id="MJXp-Span-3"&gt;CDATA<span class="MathJax_Preview" style="color: inherit; display: none;"></span><div class="MathJax_Display" style="text-align: center;"><span class="MathJax" id="MathJax-Element-1-Frame" tabindex="0" style="text-align: center; position: relative;" data-mathml="<math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot; display=&quot;block&quot;><mo stretchy=&quot;false&quot;>(</mo><msup><mo stretchy=&quot;false&quot;>[</mo><mo>&amp;lt;</mo></msup><mrow class=&quot;MJX-TeXAtom-ORD&quot;><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&amp;gt;&amp;lt;</mo><mrow class=&quot;MJX-TeXAtom-ORD&quot;><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&amp;gt;&amp;lt;</mo><mrow class=&quot;MJX-TeXAtom-ORD&quot;><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&amp;gt;&amp;lt;</mo><mrow class=&quot;MJX-TeXAtom-ORD&quot;><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&amp;gt;&amp;lt;</mo><mi>s</mi><mi>c</mi><mi>r</mi><mi>i</mi><mi>p</mi><mi>t</mi><mi>t</mi><mi>y</mi><mi>p</mi><mi>e</mi><mo>=&amp;quot;</mo><mi>m</mi><mi>a</mi><mi>t</mi><mi>h</mi><mrow class=&quot;MJX-TeXAtom-ORD&quot;><mo>/</mo></mrow><mi>t</mi><mi>e</mi><mi>x</mi><mo>;</mo><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mo>=</mo><mi>d</mi><mi>i</mi><mi>s</mi><mi>p</mi><mi>l</mi><mi>a</mi><mi>y</mi><mo>&amp;quot;</mo><mi>i</mi><mi>d</mi><mo>=&amp;quot;</mo><mi>M</mi><mi>a</mi><mi>t</mi><mi>h</mi><mi>J</mi><mi>a</mi><mi>x</mi><mo>&amp;#x2212;</mo><mi>E</mi><mi>l</mi><mi>e</mi><mi>m</mi><mi>e</mi><mi>n</mi><mi>t</mi><mo>&amp;#x2212;</mo><mn>1</mn><mo>&amp;quot;&amp;gt;</mo><mi>C</mi><mi>D</mi><mi>A</mi><mi>T</mi><mi>A</mi><mtext mathcolor=&quot;red&quot;>\[</mtext><mo stretchy=&quot;false&quot;>(</mo><msup><mo stretchy=&quot;false&quot;>[</mo><mo>&amp;lt;</mo></msup><mrow class=&quot;MJX-TeXAtom-ORD&quot;><mo>/</mo></mrow><mi>s</mi><mi>c</mi><mi>r</mi><mi>i</mi><mi>p</mi><mi>t</mi><mo>&amp;gt;</mo><mo stretchy=&quot;false&quot;>]</mo><mo>+</mo><mo stretchy=&quot;false&quot;>)</mo></math>" role="presentation"><nobr aria-hidden="true"><span class="math" id="MathJax-Span-1" style="width: 78.465em; display: inline-block;"><span style="display: inline-block; position: relative; width: 65.388em; height: 0px; font-size: 120%;"><span style="position: absolute; clip: rect(1.221em, 1065.32em, 2.631em, -999.997em); top: -2.176em; left: 0em;"><span class="mrow" id="MathJax-Span-2"><span class="mo" id="MathJax-Span-3" style="font-family: MathJax_Main;">(</span><span class="msubsup" id="MathJax-Span-4"><span style="display: inline-block; position: relative; width: 0.901em; height: 0px;"><span style="position: absolute; clip: rect(3.016em, 1000.26em, 4.426em, -999.997em); top: -3.971em; left: 0em;"><span class="mo" id="MathJax-Span-5" style="font-family: MathJax_Main;">[</span><span style="display: inline-block; width: 0px; height: 3.978em;"></span></span><span style="position: absolute; top: -4.356em; left: 0.26em;"><span class="mo" id="MathJax-Span-6" style="font-size: 70.7%; font-family: MathJax_Main;">&lt;</span><span style="display: inline-block; width: 0px; height: 3.978em;"></span></span></span></span><span class="texatom" id="MathJax-Span-7"><span class="mrow" id="MathJax-Span-8"><span class="mo" id="MathJax-Span-9" style="font-family: MathJax_Main;">/</span></span></span><span class="mi" id="MathJax-Span-10" style="font-family: MathJax_Math-italic;">s</span><span class="mi" id="MathJax-Span-11" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-12" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-13" style="font-family: MathJax_Math-italic;">n</span><span class="mo" id="MathJax-Span-14" style="font-family: MathJax_Main; padding-left: 0.26em;">&gt;<span style="font-family: MathJax_Main;">&lt;</span></span><span class="texatom" id="MathJax-Span-15" style="padding-left: 0.26em;"><span class="mrow" id="MathJax-Span-16"><span class="mo" id="MathJax-Span-17" style="font-family: MathJax_Main;">/</span></span></span><span class="mi" id="MathJax-Span-18" style="font-family: MathJax_Math-italic;">s</span><span class="mi" id="MathJax-Span-19" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-20" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-21" style="font-family: MathJax_Math-italic;">n</span><span class="mo" id="MathJax-Span-22" style="font-family: MathJax_Main; padding-left: 0.26em;">&gt;<span style="font-family: MathJax_Main;">&lt;</span></span><span class="texatom" id="MathJax-Span-23" style="padding-left: 0.26em;"><span class="mrow" id="MathJax-Span-24"><span class="mo" id="MathJax-Span-25" style="font-family: MathJax_Main;">/</span></span></span><span class="mi" id="MathJax-Span-26" style="font-family: MathJax_Math-italic;">s</span><span class="mi" id="MathJax-Span-27" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-28" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-29" style="font-family: MathJax_Math-italic;">n</span><span class="mo" id="MathJax-Span-30" style="font-family: MathJax_Main; padding-left: 0.26em;">&gt;<span style="font-family: MathJax_Main;">&lt;</span></span><span class="texatom" id="MathJax-Span-31" style="padding-left: 0.26em;"><span class="mrow" id="MathJax-Span-32"><span class="mo" id="MathJax-Span-33" style="font-family: MathJax_Main;">/</span></span></span><span class="mi" id="MathJax-Span-34" style="font-family: MathJax_Math-italic;">s</span><span class="mi" id="MathJax-Span-35" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-36" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-37" style="font-family: MathJax_Math-italic;">n</span><span class="mo" id="MathJax-Span-38" style="font-family: MathJax_Main; padding-left: 0.26em;">&gt;<span style="font-family: MathJax_Main;">&lt;</span></span><span class="mi" id="MathJax-Span-39" style="font-family: MathJax_Math-italic; padding-left: 0.26em;">s</span><span class="mi" id="MathJax-Span-40" style="font-family: MathJax_Math-italic;">c</span><span class="mi" id="MathJax-Span-41" style="font-family: MathJax_Math-italic;">r</span><span class="mi" id="MathJax-Span-42" style="font-family: MathJax_Math-italic;">i</span><span class="mi" id="MathJax-Span-43" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-44" style="font-family: MathJax_Math-italic;">t</span><span class="mi" id="MathJax-Span-45" style="font-family: MathJax_Math-italic;">t</span><span class="mi" id="MathJax-Span-46" style="font-family: MathJax_Math-italic;">y<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span><span class="mi" id="MathJax-Span-47" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-48" style="font-family: MathJax_Math-italic;">e</span><span class="mo" id="MathJax-Span-49" style="font-family: MathJax_Main; padding-left: 0.26em;">=<span style="font-family: MathJax_Main;">"</span></span><span class="mi" id="MathJax-Span-50" style="font-family: MathJax_Math-italic; padding-left: 0.26em;">m</span><span class="mi" id="MathJax-Span-51" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-52" style="font-family: MathJax_Math-italic;">t</span><span class="mi" id="MathJax-Span-53" style="font-family: MathJax_Math-italic;">h</span><span class="texatom" id="MathJax-Span-54"><span class="mrow" id="MathJax-Span-55"><span class="mo" id="MathJax-Span-56" style="font-family: MathJax_Main;">/</span></span></span><span class="mi" id="MathJax-Span-57" style="font-family: MathJax_Math-italic;">t</span><span class="mi" id="MathJax-Span-58" style="font-family: MathJax_Math-italic;">e</span><span class="mi" id="MathJax-Span-59" style="font-family: MathJax_Math-italic;">x</span><span class="mo" id="MathJax-Span-60" style="font-family: MathJax_Main;">;</span><span class="mi" id="MathJax-Span-61" style="font-family: MathJax_Math-italic; padding-left: 0.196em;">m</span><span class="mi" id="MathJax-Span-62" style="font-family: MathJax_Math-italic;">o</span><span class="mi" id="MathJax-Span-63" style="font-family: MathJax_Math-italic;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span><span class="mi" id="MathJax-Span-64" style="font-family: MathJax_Math-italic;">e</span><span class="mo" id="MathJax-Span-65" style="font-family: MathJax_Main; padding-left: 0.26em;">=</span><span class="mi" id="MathJax-Span-66" style="font-family: MathJax_Math-italic; padding-left: 0.26em;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span><span class="mi" id="MathJax-Span-67" style="font-family: MathJax_Math-italic;">i</span><span class="mi" id="MathJax-Span-68" style="font-family: MathJax_Math-italic;">s</span><span class="mi" id="MathJax-Span-69" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-70" style="font-family: MathJax_Math-italic;">l</span><span class="mi" id="MathJax-Span-71" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-72" style="font-family: MathJax_Math-italic;">y<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span><span class="mo" id="MathJax-Span-73" style="font-family: MathJax_Main; padding-left: 0.26em;">"</span><span class="mi" id="MathJax-Span-74" style="font-family: MathJax_Math-italic; padding-left: 0.26em;">i</span><span class="mi" id="MathJax-Span-75" style="font-family: MathJax_Math-italic;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span><span class="mo" id="MathJax-Span-76" style="font-family: MathJax_Main; padding-left: 0.26em;">=<span style="font-family: MathJax_Main;">"</span></span><span class="mi" id="MathJax-Span-77" style="font-family: MathJax_Math-italic; padding-left: 0.26em;">M<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.067em;"></span></span><span class="mi" id="MathJax-Span-78" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-79" style="font-family: MathJax_Math-italic;">t</span><span class="mi" id="MathJax-Span-80" style="font-family: MathJax_Math-italic;">h</span><span class="mi" id="MathJax-Span-81" style="font-family: MathJax_Math-italic;">J<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.067em;"></span></span><span class="mi" id="MathJax-Span-82" style="font-family: MathJax_Math-italic;">a</span><span class="mi" id="MathJax-Span-83" style="font-family: MathJax_Math-italic;">x</span><span class="mo" id="MathJax-Span-84" style="font-family: MathJax_Main; padding-left: 0.196em;">−</span><span class="mi" id="MathJax-Span-85" style="font-family: MathJax_Math-italic; padding-left: 0.196em;">E<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.003em;"></span></span><span class="mi" id="MathJax-Span-86" style="font-family: MathJax_Math-italic;">l</span><span class="mi" id="MathJax-Span-87" style="font-family: MathJax_Math-italic;">e</span><span class="mi" id="MathJax-Span-88" style="font-family: MathJax_Math-italic;">m</span><span class="mi" id="MathJax-Span-89" style="font-family: MathJax_Math-italic;">e</span><span class="mi" id="MathJax-Span-90" style="font-family: MathJax_Math-italic;">n</span><span class="mi" id="MathJax-Span-91" style="font-family: MathJax_Math-italic;">t</span><span class="mo" id="MathJax-Span-92" style="font-family: MathJax_Main; padding-left: 0.196em;">−</span><span class="mn" id="MathJax-Span-93" style="font-family: MathJax_Main; padding-left: 0.196em;">1</span><span class="mo" id="MathJax-Span-94" style="font-family: MathJax_Main; padding-left: 0.26em;">"<span style="font-family: MathJax_Main;">&gt;</span></span><span class="mi" id="MathJax-Span-95" style="font-family: MathJax_Math-italic; padding-left: 0.26em;">C<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.067em;"></span></span><span class="mi" id="MathJax-Span-96" style="font-family: MathJax_Math-italic;">D</span><span class="mi" id="MathJax-Span-97" style="font-family: MathJax_Math-italic;">A</span><span class="mi" id="MathJax-Span-98" style="font-family: MathJax_Math-italic;">T<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.131em;"></span></span><span class="mi" id="MathJax-Span-99" style="font-family: MathJax_Math-italic;">A</span><span class="mtext" id="MathJax-Span-100" style="font-family: MathJax_Main; color: red;">\[</span><span class="mo" id="MathJax-Span-101" style="font-family: MathJax_Main;">(</span><span class="msubsup" id="MathJax-Span-102"><span style="display: inline-block; position: relative; width: 0.901em; height: 0px;"><span style="position: absolute; clip: rect(3.016em, 1000.26em, 4.426em, -999.997em); top: -3.971em; left: 0em;"><span class="mo" id="MathJax-Span-103" style="font-family: MathJax_Main;">[</span><span style="display: inline-block; width: 0px; height: 3.978em;"></span></span><span style="position: absolute; top: -4.356em; left: 0.26em;"><span class="mo" id="MathJax-Span-104" style="font-size: 70.7%; font-family: MathJax_Main;">&lt;</span><span style="display: inline-block; width: 0px; height: 3.978em;"></span></span></span></span><span class="texatom" id="MathJax-Span-105"><span class="mrow" id="MathJax-Span-106"><span class="mo" id="MathJax-Span-107" style="font-family: MathJax_Main;">/</span></span></span><span class="mi" id="MathJax-Span-108" style="font-family: MathJax_Math-italic;">s</span><span class="mi" id="MathJax-Span-109" style="font-family: MathJax_Math-italic;">c</span><span class="mi" id="MathJax-Span-110" style="font-family: MathJax_Math-italic;">r</span><span class="mi" id="MathJax-Span-111" style="font-family: MathJax_Math-italic;">i</span><span class="mi" id="MathJax-Span-112" style="font-family: MathJax_Math-italic;">p</span><span class="mi" id="MathJax-Span-113" style="font-family: MathJax_Math-italic;">t</span><span class="mo" id="MathJax-Span-114" style="font-family: MathJax_Main; padding-left: 0.26em;">&gt;</span><span class="mo" id="MathJax-Span-115" style="font-family: MathJax_Main;">]</span><span class="mo" id="MathJax-Span-116" style="font-family: MathJax_Main;">+</span><span class="mo" id="MathJax-Span-117" style="font-family: MathJax_Main;">)</span></span><span style="display: inline-block; width: 0px; height: 2.183em;"></span></span></span><span style="display: inline-block; overflow: hidden; vertical-align: -0.381em; border-left: 0px solid; width: 0px; height: 1.388em;"></span></span></nobr><span class="MJX_Assistive_MathML MJX_Assistive_MathML_Block" role="presentation"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo stretchy="false">(</mo><msup><mo stretchy="false">[</mo><mo>&lt;</mo></msup><mrow class="MJX-TeXAtom-ORD"><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&gt;&lt;</mo><mrow class="MJX-TeXAtom-ORD"><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&gt;&lt;</mo><mrow class="MJX-TeXAtom-ORD"><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&gt;&lt;</mo><mrow class="MJX-TeXAtom-ORD"><mo>/</mo></mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>n</mi><mo>&gt;&lt;</mo><mi>s</mi><mi>c</mi><mi>r</mi><mi>i</mi><mi>p</mi><mi>t</mi><mi>t</mi><mi>y</mi><mi>p</mi><mi>e</mi><mo>="</mo><mi>m</mi><mi>a</mi><mi>t</mi><mi>h</mi><mrow class="MJX-TeXAtom-ORD"><mo>/</mo></mrow><mi>t</mi><mi>e</mi><mi>x</mi><mo>;</mo><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mo>=</mo><mi>d</mi><mi>i</mi><mi>s</mi><mi>p</mi><mi>l</mi><mi>a</mi><mi>y</mi><mo>"</mo><mi>i</mi><mi>d</mi><mo>="</mo><mi>M</mi><mi>a</mi><mi>t</mi><mi>h</mi><mi>J</mi><mi>a</mi><mi>x</mi><mo>−</mo><mi>E</mi><mi>l</mi><mi>e</mi><mi>m</mi><mi>e</mi><mi>n</mi><mi>t</mi><mo>−</mo><mn>1</mn><mo>"&gt;</mo><mi>C</mi><mi>D</mi><mi>A</mi><mi>T</mi><mi>A</mi><mtext mathcolor="red">\[</mtext><mo stretchy="false">(</mo><msup><mo stretchy="false">[</mo><mo>&lt;</mo></msup><mrow class="MJX-TeXAtom-ORD"><mo>/</mo></mrow><mi>s</mi><mi>c</mi><mi>r</mi><mi>i</mi><mi>p</mi><mi>t</mi><mo>&gt;</mo><mo stretchy="false">]</mo><mo>+</mo><mo stretchy="false">)</mo></math></span></span></div><script type="math/tex; mode=display" id="MathJax-Element-1">([^</span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-1">CDATA\[([^</script>]+)</script>\]&amp;gt;&amp;lt;/labl&amp;gt;"'))
                          local vallab=ustrtrim(ustrregexs(1))
                          }
                          // regular case: no CDATA, next line is value label, line after this is &amp;lt;/labl&amp;gt;
                          else {
                          file read `fh' line
                          local vallab=ustrtrim(`"`line'"')
                          file read `fh' line
                          assert (ustrregexm(`"`line'"',`"&amp;lt;/labl&amp;gt;"'))
                          }
                          }
                          // found value/label pair?
                          if (!missing(`"`value'"') &amp;amp; !missing(`"`vallab'"')) {
                          if (`"`debug'"'=="1") display as text `"{text}value label for value {result}`value'{text} of variable {res:`varname'}: "{result}`vallab'""'
                          if (missing(`"`: label lbl_`varname' `value' , strict'"')) label define lbl_`varname' `value' `"`vallab'"' , add
                          else if (`"`: label lbl_`varname' `value', strict'"'!=`"`vallab'"') {
                          display as error `"duplicate, conflicting value labels for variable {res:`varname'} in metadata, latter one will be ignored:"' _newline ///
                          `"{tab}{res:`: label lbl_`varname' `value', strict'}{tab}vs.{tab}{res:`vallab'}"'
                          }
                          local value
                          local vallab
                          }
                          file read `fh' line
                          }
                          file read `fh' line
                          continue
                          }
                          // check for variable label tag
                          if (ustrregexm(`"`line'"',`"&amp;lt;labl&amp;gt;"')) {
                          file read `fh' line
                          local varlab_`varname'=ustrtrim(`"`line'"')
                          if (`"`debug'"'=="1") display as text `"{text}variable label for variable {res:`varname'}: {result}`varlab_`varname''"'
                          file read `fh' line
                          assert (ustrregexm(`"`line'"',`"&amp;lt;/labl&amp;gt;"'))
                          file read `fh' line
                          continue
                          }
                          }
                          file read `fh' line
                          }
                          * close DDI file
                          file close `fh'
                          
                          * _all_ value and variable labels have been read from the DDI file to memory now;
                          * they can be applied to each variable in the dataset
                          * using the following simple looping mechanism:
                          foreach var of varlist _all {
                          if (`: label lbl_`var' maxlength'&amp;gt;0) label values `var' lbl_`var'
                          if (!missing(`"`varlab_`var''"')) label variable `var' `"`varlab_`var''"'
                          }
                          describe // variables and values have been labeled
                          list // many value labels!
                          Be warned, it is not thoroughly tested, and may terminate in case there are peculiarities in the DDI XML that Stata does not support. Some of these peculiarities, namely value labels for non-numeric values and duplicate, conflicting value labels, are detected and skipped, but others may not.

                          As DDI is more generous in defining names and objects, there could be (in Stata terms) illegal variable names in the DDI file, or illegal value label names (they are constructed on-the-fly as 'lbl_&amp;lt;VARIABLE&amp;gt;' (which is not the best thing to do). There are several other issues that would require more coding to make the code a more generic program. So use the code with care, I hope it helps.

                          Note that this code does not transcode HTML entities (i.e. the string "&amp;amp;amp;" instead of "&amp;amp;") in the labels.


                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input byte State int State_District long b3_q2
                          1 105  1112
                          1 105  1112
                          1 105     .
                          1 105  1112
                          1 105  1112
                          1 105  1112
                          1 105 35109
                          1 105     .
                          1 105 41001
                          1 105  1113
                          1 105  1113
                          1 105 56302
                          1 105 41001
                          1 105  1113
                          1 105 41001
                          1 105  1113
                          1 106 41001
                          1 106 41001
                          1 106  1111
                          1 106  1111
                          1 106 41001
                          1 106 41001
                          1 106  1111
                          1 106 41001
                          1 106 41001
                          1 106 41001
                          1 106 42101
                          1 106 42101
                          1 106 42101
                          1 106 41001
                          1 106 41001
                          1 106 41001
                          1 107 85211
                          1 107  1122
                          1 107 47211
                          1 107 47110
                          1 107 41001
                          1 107 41001
                          1 107 41001
                          1 107 41001
                          1 107 84111
                          1 107 31001
                          1 107 84111
                          1 107 75000
                          1 107 56101
                          1 107 35109
                          1 107 47594
                          1 107 47613
                          1 116  1111
                          1 116 42101
                          1 116     .
                          1 116 42101
                          1 116     .
                          1 116 42101
                          1 116  1113
                          1 116 41002
                          1 116 84119
                          1 116  1111
                          1 116 41001
                          1 116 64191
                          1 116 56101
                          1 116 95230
                          1 116 41001
                          1 116     .
                          1 117 49231
                          1 117 85211
                          1 117 42101
                          1 117 41001
                          1 117 42101
                          1 117 41001
                          1 117  1111
                          1 117 79900
                          1 117 47711
                          1 117 42101
                          1 117 42101
                          1 117 84111
                          1 117 85211
                          1 117     .
                          1 117 42101
                          1 117 42101
                          1 118 41002
                          1 118  1113
                          1 118 84230
                          1 118  1113
                          1 118 41002
                          1 118 35109
                          1 118  1113
                          1 118 35109
                          1 119 35109
                          1 119  1111
                          1 119  1111
                          1 119 41002
                          1 119  1111
                          1 119 49231
                          1 119 53100
                          1 119 41002
                          1 119  1111
                          1 119  1111
                          1 119  1111
                          1 119  1111
                          end
                          label values State lbl_State
                          label def lbl_State 1 "Jammu &amp; Kashmir", modify
                          label values State_District lbl_State_District
                          label def lbl_State_District 105 "Punch", modify
                          label def lbl_State_District 106 "Rajouri", modify
                          label def lbl_State_District 107 "Kathua", modify
                          label def lbl_State_District 116 "Doda", modify
                          label def lbl_State_District 117 "Ramban", modify
                          label def lbl_State_District 118 "Kishtwar", modify
                          label def lbl_State_District 119 "Udhampur", modify
                          label values b3_q2 lbl_b3_q2
                          label def lbl_b3_q2 1111 "Growing of wheat", modify
                          label def lbl_b3_q2 1112 "Growing of jowar, bajra and millets", modify
                          label def lbl_b3_q2 1113 "Growing of other cereals", modify
                          label def lbl_b3_q2 1122 "Organic farming of basmati rice", modify
                          label def lbl_b3_q2 31001 "Manufacture of other transport equipments n.e.c. such as pushcarts, handcarts etc", modify
                          label def lbl_b3_q2 35109 "Transmission of electric energy", modify
                          label def lbl_b3_q2 41001 "Remediation activities and other waste management services", modify
                          label def lbl_b3_q2 41002 "Construction of buildings carried out on own-account basis or on a fee or contract basis", modify
                          label def lbl_b3_q2 42101 "Construction of buildings", modify
                          label def lbl_b3_q2 47110 "Other non-specialised wholesale trade n.e.c.", modify
                          label def lbl_b3_q2 47211 "Retail sale in non-specialized stores", modify
                          label def lbl_b3_q2 47594 "Retail sale of gas stoves, cooking/kitchen appliances", modify
                          label def lbl_b3_q2 47613 "Retail sale of newspapers and magazines", modify
                          label def lbl_b3_q2 47711 "Retail sale of games and toys in specialized stores", modify
                          label def lbl_b3_q2 49231 "Other non urban passenger land transport n.e.c.", modify
                          label def lbl_b3_q2 53100 "Warehousing and support activities for transportation", modify
                          label def lbl_b3_q2 56101 "Worker hostels and boarding houses", modify
                          label def lbl_b3_q2 56302 "Bars and Restaurants with bars", modify
                          label def lbl_b3_q2 64191 "Central banking", modify
                          label def lbl_b3_q2 75000 "Other professional, scientific and technical activities n.e.c.", modify
                          label def lbl_b3_q2 79900 "Tour operator activities", modify
                          label def lbl_b3_q2 84111 "Other business support service activities n.e.c", modify
                          label def lbl_b3_q2 84119 "General public service activities relating to legislation", modify
                          label def lbl_b3_q2 84230 "Defence activities", modify
                          label def lbl_b3_q2 85211 "Primary education", modify
                          label def lbl_b3_q2 95230 "Repair and servicing of home and garden equipment such lawn mowers, edgers, trimmers etc.", modify

                          Regards
                          Bela
                          Dear Daniel Bela thanks to your code and several weeks of work I finally managed to make it a general DDI code for my data source. I apologize for not getting back in this thread earlier but there were some issues that needed to be sorted out before the code could be generalized for the datasets I am using. I really can't stress how much your code helped me.

                          Regards,
                          ​​​​​​​Anustup

                          Comment


                          • #14
                            Originally posted by Romalpa Akzo View Post
                            Copy-paste into a plain text is not the best option, ... and complicated. Instead, just simply select, copy the relevant info (from your link), and paste directly into (stata) data edit mode. Such copy-paste provides the necessary outcome as showed below.

                            Then, the next step is "transferring" the label value into your file with the help of labmask, a wonderful tool created by Nick Cox.
                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input int value str59 category
                            420 "medicine (allopathic)"
                            421 "medicine (homeopathic)"
                            422 "medicine (ayurvedic)"
                            423 "medicine (unani)"
                            424 "medicine (others)"
                            425 "X-ray, ECG, pathological test, etc."
                            426 "doctor’s/ surgeon’s fee"
                            427 "family planning appliances"
                            428 "other medical expenses"
                            429 "medical – non-institutional: sub-total (420-428)"
                            430 "cinema, theatre"
                            431 "mela, fair, picnic"
                            432 "sports goods, toys, etc."
                            433 "club fees"
                            434 "goods for recreation and hobbies"
                            435 "photography"
                            436 "video cassette/ VCR / VCP – hire"
                            437 "cable TV"
                            438 "other entertainment"
                            439 "entertainment: sub-total (430-438)"
                            440 "spectacles"
                            441 "torch"
                            442 "lock"
                            443 "umbrella, raincoat"
                            444 "lighter (bidi/ cigarette/ gas stove)"
                            445 "other goods for personal care and effects"
                            449 "goods for personal care and effects: sub-total (440-445)"
                            450 "toilet soap"
                            451 "toothpaste, toothbrush, comb, etc."
                            452 "powder, snow, cream, lotion"
                            453 "hair oil, shampoo, hair cream"
                            454 "shaving blades, shaving stick, razor"
                            455 "shaving cream"
                            456 "sanitary napkins"
                            457 "other toilet articles"
                            459 "toilet articles: sub-total (450-457)"
                            460 "electric bulb, tubelight"
                            461 "electric batteries"
                            462 "other non-durable electric goods"
                            463 "earthenware"
                            464 "glassware"
                            465 "bucket, water bottle/ feeding bottle &amp;amp;amp;amp; other plastic goods"
                            466 "coir, rope, etc."
                            467 "washing soap/soda"
                            468 "other washing requisites"
                            470 "incense (agarbatti), room freshener"
                            471 "flower (fresh): all purposes"
                            472 "mosquito mat, insecticide, acid etc."
                            473 "other petty articles"
                            479 "sundry articles: sub-total (460-473)"
                            480 "domestic servant/cook"
                            481 "attendant"
                            482 "sweeper"
                            483 "barber, beautician, etc."
                            484 "washerman, laundry, ironing"
                            485 "tailor"
                            486 "priest"
                            487 "legal expenses"
                            488 "telephone charges: landline"
                            490 "telephone charges: mobile"
                            491 "postage &amp;amp;amp;amp; telegram"
                            492 "miscellaneous expenses"
                            493 "grinding charges"
                            494 "repair charges for non-durables"
                            495 "pet animals (incl. birds, fish)"
                            496 "other consumer services excluding conveyance"
                            499 "consumer services excluding conveyance: sub-total (480-496)"
                            500 "air fare"
                            501 "railway fare"
                            502 "bus/tram fare"
                            503 "taxi, auto-rickshaw fare"
                            504 "steamer, boat fare"
                            505 "rickshaw (hand drawn &amp;amp;amp;amp; cycle) fare"
                            506 "horse cart fare"
                            507 "porter charges"
                            508 "diesel for vehicle"
                            510 "petrol, other fuels &amp;amp;amp;amp; lubricants for vehicle"
                            511 "school bus/van"
                            512 "other conveyance expenses"
                            519 "conveyance : sub-total (500-512)"
                            520 "house rent, garage rent (actual)"
                            521 "hotel lodging charges"
                            522 "residential land rent"
                            523 "other consumer rent"
                            529 "rent: sub-total (520-523)"
                            539 "house rent, garage rent (imputed- urban only)"
                            540 "water charges"
                            541 "other consumer taxes &amp;amp;amp;amp; cesses"
                            549 "consumer taxes and cesses: sub-total (540-541)"
                            end
                            
                            tempfile temp
                            save `temp'
                            use yourfile, clear
                            gen value = B10_q1
                            joinby value using `temp', unm(m)
                            labmask B10_q1, value(category)
                            drop value category _merge
                            I think a simpler way would be to

                            Code:
                               
                            B10_q1 label no B10_q1 label name
                            420 medicine (allopathic)
                            421 medicine (homeopathic)
                            422 medicine (ayurvedic)
                            423 medicine (unani)
                            424 medicine (others)
                            425 X-ray, ECG, pathological test, etc.
                            426 doctor’s/ surgeon’s fee
                            427 family planning appliances
                            428 other medical expenses
                            429 medical – non-institutional: sub-total (420-428)
                            430 cinema, theatre
                            431 mela, fair, picnic
                            432 sports goods, toys, etc.
                            433 club fees
                            434 goods for recreation and hobbies
                            435 photography
                            436 video cassette/ VCR / VCP – hire
                            437 cable TV
                            438 other entertainment
                            439 entertainment: sub-total (430-438)
                            440 spectacles
                            441 torch
                            442 lock
                            443 umbrella, raincoat
                            444 lighter (bidi/ cigarette/ gas stove)
                            445 other goods for personal care and effects
                            449 goods for personal care and effects: sub-total (440-445)
                            450 toilet soap
                            451 toothpaste, toothbrush, comb, etc.
                            452 powder, snow, cream, lotion
                            453 hair oil, shampoo, hair cream
                            454 shaving blades, shaving stick, razor
                            455 shaving cream
                            456 sanitary napkins
                            457 other toilet articles
                            459 toilet articles: sub-total (450-457)
                            460 electric bulb, tubelight
                            461 electric batteries
                            462 other non-durable electric goods
                            463 earthenware
                            464 glassware
                            465 bucket, water bottle/ feeding bottle & other plastic goods
                            466 coir, rope, etc.
                            467 washing soap/soda
                            468 other washing requisites
                            470 incense (agarbatti), room freshener
                            471 flower (fresh): all purposes
                            472 mosquito mat, insecticide, acid etc.
                            473 other petty articles
                            479 sundry articles: sub-total (460-473)
                            480 domestic servant/cook
                            481 attendant
                            482 sweeper
                            483 barber, beautician, etc.
                            484 washerman, laundry, ironing
                            485 tailor
                            486 priest
                            487 legal expenses
                            488 telephone charges: landline
                            490 telephone charges: mobile
                            491 postage & telegram
                            492 miscellaneous expenses
                            493 grinding charges
                            494 repair charges for non-durables
                            495 pet animals (incl. birds, fish)
                            496 other consumer services excluding conveyance
                            499 consumer services excluding conveyance: sub-total (480-496)
                            500 air fare
                            501 railway fare
                            502 bus/tram fare
                            503 taxi, auto-rickshaw fare
                            504 steamer, boat fare
                            505 rickshaw (hand drawn & cycle) fare
                            506 horse cart fare
                            507 porter charges
                            508 diesel for vehicle
                            510 petrol, other fuels & lubricants for vehicle
                            511 school bus/van
                            512 other conveyance expenses
                            519 conveyance : sub-total (500-512)
                            520 house rent, garage rent (actual)
                            521 hotel lodging charges
                            522 residential land rent
                            523 other consumer rent
                            529 rent: sub-total (520-523)
                            539 house rent, garage rent (imputed- urban only)
                            540 water charges
                            541 other consumer taxes & cesses
                            549 consumer taxes and cesses: sub-total (540-541)
                            add a column containing quotes before and after the description and then add a column containing "///" in excel. then copy and paste the whole thing and run. However doing this for 70+ variables for 10 years would take considerable amount of effort. I think the solution Daniel Bela posted was much smarter but due to my limited coding skills it took a while. For anyone moderately good with coding should consider taking that approach.

                            Regards,
                            Anustup

                            Comment


                            • #15
                              Originally posted by daniel klein View Post
                              Hm, this is not going to be simple. I would ask the data provider whether they have value label information in another format, preferably integrated into Stata datasets.

                              Otherwise, your best option is probably to copy-paste the contents into a plain text file (e.g., do-file editor) and read it from there. Assume that you have the contents (i.e., everything inside code delimiters) of #5 saved to labels.txt, then

                              Code:
                              tempname fh
                              tempfile tmp
                              
                              // tab stops to spaces
                              filefilter labels.txt `tmp' , from(\t) to(" ")
                              
                              // open file, read first line
                              file open `fh' using `tmp' , read
                              file read `fh' dump // first line has no value to text mapping
                              file read `fh' line
                              
                              // read successive lines and define label
                              while (!r(eof)) {
                              gettoken value line : line
                              label define B10_q1 `value' `"`line'"' , modify
                              file read `fh' line
                              }
                              file close `fh'
                              
                              label list
                              results in

                              Code:
                              . // tab stops to spaces
                              . filefilter labels.txt `tmp' , from(\t) to(" ")
                              
                              .
                              . // open file, read first line
                              . file open `fh' using `tmp' , read
                              
                              . file read `fh' dump // first line has no value to text mapping
                              
                              . file read `fh' line
                              
                              .
                              . // read successive lines and define label
                              . while (!r(eof)) {
                              2. gettoken value line : line
                              3. label define B10_q1 `value' `"`line'"' , modify
                              4. file read `fh' line
                              5. }
                              
                              . file close `fh'
                              
                              .
                              . label list
                              B10_q1:
                              420 medicine (allopathic)
                              421 medicine (homeopathic)
                              422 medicine (ayurvedic)
                              423 medicine (unani)
                              424 medicine (others)
                              425 X-ray, ECG, pathological test, etc.
                              426 doctor’s/ surgeon’s fee
                              output omitted
                              522 residential land rent
                              523 other consumer rent
                              529 rent: sub-total (520-523)
                              539 house rent, garage rent (imputed- urban only)
                              540 water charges
                              541 other consumer taxes & cesses
                              549 consumer taxes and cesses: sub-total (540-541)
                              
                              .
                              end of do-file
                              Best
                              Daniel
                              Thanks for all your help daniel klein

                              Comment

                              Working...
                              X