Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import variable labels and values from .csv

    Hi,

    I have a dataset, variable labels, and variable values in three different .csv-files. The import of the dataset to stata is no problem (works great with import excel). Now there are variable names like "A101_3" and I have over 800 variables, so I would first like to use the .csv-file with the variable labels to label my variables. Second, I would like to add the value labels. How can I ipmort the variable and value labels? (Both incule the corresponding variable names.)

    Thanks for help!
    Florian

  • #2
    Please show us an example of how how the respective .csv files look like.

    Best
    Daniel

    Comment


    • #3
      Thanks for the reply!

      Attached you can find the first rows of the variables and values .csv-files.
      Attached Files

      Comment


      • #4
        Ok, here is the first thing that comes to my mind

        Code:
        // import the file
        cd mydirectory
        import delimited variables_example.csv , delimiter(";") clear
        
        // keep relevant observations
        keep in 2/`= _N-1'
        
        // should look like this
        list v1 v2
        
        tempname fh
        local N = c(N)
        
        // create a new do-file
        file open `fh' using my_labels.do , write replace
        forvalues i = 1/`N' {
            file write `fh' "label variable `= v1[`i']' "
            file write `fh' `""`= v2[`i']'""' _newline
        }
        file close `fh'
        
        // look at new dofile
        type my_labels.do
        The key idea is to create a do-file that can then be run on the dataset that contains the variables. This process makes the labeling reproducible.

        The above assumes that all variable names are spelled correctly (i.e. uppercase etc.). If this is not the case, you can easily change it using Stata's string functions once the dataset is imported. The above also assumes that there are no (nested) double quotes (or other awkward) characters in the value labels. If this is not the case, change it.

        For value labels, the process should in principle be similar, although the details might be bit trickier.

        Here is what the above looks like

        Code:
        . // import the file
        . cd "C:\Users\daniel\Desktop\tmp"
        C:\Users\daniel\Desktop\tmp
        
        . import delimited variables_example.csv , delimiter(";") clear
        (5 vars, 22 obs)
        
        . 
        . // keep relevant observations
        . keep in 2/`= _N-1'
        (2 observations deleted)
        
        . 
        . // should look like this
        . list v1 v2
        
             +---------------------------------------------------------+
             |       v1                                             v2 |
             |---------------------------------------------------------|
          1. |     CASE                 Interview-Nummer (fortlaufend) |
          2. |   SERIAL                Seriennummer (sofern verwendet) |
          3. |      REF            Referenz (sofern im Link angegeben) |
          4. | QUESTNNR   Fragebogen, der im Interview verwendet wurde |
          5. |     MODE                                Interview-Modus |
             |---------------------------------------------------------|
          6. | LANGUAGE                                        Sprache |
          7. |  STARTED    Zeitpunkt zu dem das Interview begonnen hat |
          8. |     A001                    Anzahl verschickter E-Mails |
          9. |     A101                                     Bundesland |
         10. |     A102                                   Hochschulart |
             |---------------------------------------------------------|
         11. |  A102_09                        Hochschulart: Sonstiges |
         12. |     A103                                   Trägerschaft |
         13. |  A103_04                        Trägerschaft: Sonstiges |
         14. |     A104                                 Funktion/Rolle |
         15. |  A104_05                      Funktion/Rolle: Sonstiges |
             |---------------------------------------------------------|
         16. |     A211                                          VW BW |
         17. |  A211_10                               VW BW: Sonstiges |
         18. |     A210                                      VW Bayern |
         19. |  A210_10                           VW Bayern: Sonstiges |
         20. |     A209                                      VW Berlin |
             +---------------------------------------------------------+
        
        . 
        . tempname fh
        
        . local N = c(N)
        
        . 
        . // create a new do-file
        . file open `fh' using my_labels.do , write replace
        
        . forvalues i = 1/`N' {
          2.         file write `fh' "label variable `= v1[`i']' "
          3.         file write `fh' `""`= v2[`i']'""' _newline
          4. }
        
        . file close `fh'
        
        . 
        . // look at new dofile
        . type my_labels.do
        label variable CASE "Interview-Nummer (fortlaufend)"
        label variable SERIAL "Seriennummer (sofern verwendet)"
        label variable REF "Referenz (sofern im Link angegeben)"
        label variable QUESTNNR "Fragebogen, der im Interview verwendet wurde"
        label variable MODE "Interview-Modus"
        label variable LANGUAGE "Sprache"
        label variable STARTED "Zeitpunkt zu dem das Interview begonnen hat"
        label variable A001 "Anzahl verschickter E-Mails"
        label variable A101 "Bundesland"
        label variable A102 "Hochschulart"
        label variable A102_09 "Hochschulart: Sonstiges"
        label variable A103 "Trägerschaft"
        label variable A103_04 "Trägerschaft: Sonstiges"
        label variable A104 "Funktion/Rolle"
        label variable A104_05 "Funktion/Rolle: Sonstiges"
        label variable A211 "VW BW"
        label variable A211_10 "VW BW: Sonstiges"
        label variable A210 "VW Bayern"
        label variable A210_10 "VW Bayern: Sonstiges"
        label variable A209 "VW Berlin"
        
        . 
        end of do-file
        Best
        Daniel
        Last edited by daniel klein; 14 Mar 2018, 06:49.

        Comment


        • #5
          Ok, here is the complete thing

          Code:
          // import the file
          cd "C:\Users\daniel\Desktop\tmp"
          import delimited variables_example.csv , delimiter(";") clear
          
          // keep relevant observations
          keep in 2/`= _N-1'
          
          // should look like this
          list v1 v2
          
          tempname fh
          local N = c(N)
          
          // create a new do-file
          file open `fh' using my_labels.do , write replace
          forvalues i = 1/`N' {
              file write `fh' "label variable `= v1[`i']' "
              file write `fh' `""`= v2[`i']'""' _newline
          }
          file close `fh'
          
          // now get value labels
          import delimited values_example.csv , delimiter(";") clear
          keep in 1/`= _N-1'
          
          local N = c(N)
          
          file open `fh' using my_labels.do , write append
          file write `fh' _newline
          forvalues i = 1/`N' {
              file write `fh' "label define `=var[`i']' `=response[`i']' "
              file write `fh' `""`=meaning[`i']'" , modify"' _newline 
          }
          file close `fh'
          
          type my_labels.do
          After that, load your data and

          Code:
          do my_labels.do
          Best
          Daniel

          Comment


          • #6
            Thanks a lot for the effort! Unfortunately I do not get further than "create a new do-file" because when I type
            file open `fh' using my_labels.do , write replace Stata tells me "varlist not allowed". I ckecked the variable names and they spelled correctly. Any idea?

            Note: I changed the import command to use an excel file because the .csv file wasn't imported correctly. But I also tried to reproduce the example with the same file and syntax you used and got the same error message.

            Comment


            • #7
              Sounds like you are not running the complete piece of code at once. Note that fh is a temporary name and referenced just as a local macro (technically, it is a local macro). You need to run at least the line

              Code:
              tempname fh
              together with the file commands; better yet: run the complete code at once. Executing only pieces of the code leaves local macros undefined and they evaluate to "" (i.e. missing string); if you are lucky this leads to error messages, if you are not so lucky no errors occur but the results you get will be rubbish.

              Best
              Daniel

              Comment


              • #8
                Thanks again for the reply! You're right, I wasn't running the complete code at once. Now the code works on the example files, but not completely on the whole .csv files: I fixed a problem with the variable labels, but I get stucked now when running the code for the values.
                Code:
                . forvalues i = 1/`N' {
                  2.     file write `fh' "label define `=var[`i']' `=response[`i']' "
                  3.     file write `fh' `""`= meaning[`i']'" , modify"' _newline
                  4. }
                invalid syntax
                r(198);
                Because the example file works I assume that there is something wrong with the value labels, which are not always unique on the first 12 characters. Is that an issue here? If not, what could be the problem (you were talking about "awkward characters" before...)?

                Comment


                • #9
                  The fact that not all value labels (text) are unique up to any number of characters should not cause any technical problems; though it might get you into trouble when your interpretation is based on wrong labels. However, that is nothing I can comment on much. You will need to inspect the value labels once they are created and attached and verify that you got what you expected.

                  The characters that I had in mind are double quotes and especially single left quotes, i.e., `. Those might cause problems. Double quotes are easily dealt with, just replace

                  Code:
                  file write `fh' `""`= meaning[`i']'" , modify"' _newline
                  with

                  Code:
                  file write `fh' `"`"`= meaning[`i']'"' , modify"' _newline
                  (note the nested compound quotes). This will sometimes make problems with single left quotes worse. Scan your Excel sheet for single left quotes and remove them.

                  To see where things go wrong type

                  Code:
                  set trace on
                  before you execute the do-file. You can then get a better understanding of when the reported error occurs.

                  To say more, I would need an example that reproduces the problem since, as you say, with the example you gave the code works fine.

                  Best
                  Daniel

                  Comment


                  • #10
                    Thanks a lot! I think I can finally analyze my data...

                    The problem with double quotes occured only one time, but I had another issue with the .csv file of the data: if the value "keine Angabe" appeared at a certain point, "Angabe" and all the following values slipped to the next row, which confused the date. I assume this was produced by line breaks in the .csv file of the value labels (although I don't know how these date were generated...), which also caused problems before.

                    Fortunately only a few cases were affected, so I fixed it manually. Now the data set looks good!

                    Comment


                    • #11
                      My goodness why would they want to make something this basic that complicated?
                      And in 6 years since this post they didn't figure out how to do it sensibly?
                      That's a major black eye on this software in my opinion.
                      Along with several other black eyes, like no "undo".
                      Weird.

                      Comment


                      • #12
                        Name another software with similar features that also handles the issue the way you think it should be handled, please. If you can name one, why not use it?

                        Comment

                        Working...
                        X