Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using foreach and levelsof to create multiple variables from levels

    Dear All,

    I have a variable that corresponds to different indicators from an OECD database. I would like to create as many variables as the number of levels of indicator2
    for instance, I already created pct_millions with this
    Code:
    g pct_million = patents if indicator2 =="pct_million"
    But since there are a lot of levels it's quite time consuming and "old -fashuoned" let's say, I was trying to looping over the levels of indicator2 to create the variables :
    Blow an example of data and after that an example of the code I used, hope you can help.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str19 indicator2 float(pct_million pct_total)
    "pct_ict"        .      .
    "rd_exp_he"      .      .
    "pct_pharma"     .      .
    "pct_million" 79.3      .
    "pct_pharma"     .      .
    "pct_biotech"    .      .
    "pct_total"      . 2032.3
    "pct_medical"    .      .
    "rd_exp_he"      .      .
    "pct_million" 85.5      .
    end
    Code:
    levelsof indicator2, local(levels)
     . foreach l of local levels {
     .    gen pct_`l' if varname == `l'
     . }
    But i get the following error foreach command may not result from a macro expansion interactively or in do files.
    Hope I make myself clear and someone can help.

    Thanks a lot,
    Dalila
    Last edited by Dalila Rib; 21 Feb 2022, 12:01. Reason: levelsof

  • #2
    I don't get what you want to do. A command like -gen pct_`l' if varname == `l'- is not valid Stata syntax. When you generate a variable, you must specify an expression to say what values it takes on. The command in your loop fails to do that. The command you show before your example data does that, but it sets it equal to patents, a variable that, at least in your example data, does not exist. Even if I assume that in your real data set you have a variable called patents, do you want to also set pct_ict, rd_exp_he, and all those other new variables equal to the value of patents? It seems rather pointless to create a large number of variables that are all the same. So I'm guessing you have something else in mind, but I can't figure out what.

    Please post back with a clearer explanation of what you want the end result to look like. Perhaps even mock up an example of the intended result and show that in your post.

    Comment


    • #3
      Yes, sorry. Patents is the number of patents. But the idea is to create as many variables as the levels of indicator 2 following this logic
      Code:
       
       g pct_million = patents if indicator2 =="pct_million"
      , but instead of wrtiting 10 lines of code, I'd like to know if I can come up with a foreach/forval looping solutions. As far as it may seem pointless to have as many variables, it is useful to me to use some functions that are not compatible with the "if" conditions, and also because it is much easier to label and recall a variable instead of a condition (for the case at hand of course).
      In the dataex example above you see that I already created 2 vars: pct_millions and pct_total. I want to loop over the levels to create the other variables.
      Thanks
      Dalila

      Comment


      • #4
        I still don't understand. And I think you did not understand my question, which was not stated clearly enough. So let me restate it. I see that you have created pct_millions and pct_total. Where did the numbers 79.3 and 89.5 for pct_millions and 2032.3 for pct_total in those variables come from? I'm guessing that they came from some other variable(s) in your data set. If that's the case, what are those variables, and please post a new -datex- example that includes them.
        Last edited by Clyde Schechter; 21 Feb 2022, 12:31.

        Comment


        • #5
          Hi Clyde,

          I am sorry. Here is (i hope) a clearer data example. The variable patents corresponds to the number of patents in each industrial sector. So in Australia there are 177.31 patents in pharmaceutical.
          Indicator2 has a lot of levels and for some commands and functions I aim to implement I can't use the if condition, which is either way cumbersome coding-wise; so I was wondering if there is a way to loop over indicator2 and obtain the same as
          g pct_million = patents if indicator2 =="pct_million"

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str24 country str19 indicator2 float(patents pct_ict pct_million pct_total)
          "Australia" "pct_pharma"    177.31       .    . .
          "Australia" "pct_pharma"   180.453       .    . .
          "Australia" "pct_ict"      506.754 506.754    . .
          "Australia" "pct_nanotech"  9.7143       .    . .
          "Australia" "pct_biotech"  217.789       .    . .
          "Australia" "pct_biotech"  180.272       .    . .
          "Australia" "pct_medical"  164.226       .    . .
          "Australia" "pct_ict"      559.837 559.837    . .
          "Australia" "rd_exp_he"     9918.6       .    . .
          "Australia" "pct_million"     83.8       . 83.8 .
          end
          So I want: pct_pharma = patents if indicator2 == pct_pharma" and so on and so forth, but withouth writing more than 10 lines of code as with a loop I may solve in three.

          Thanks again and sorry for being unclear
          Dalila

          Comment


          • #6
            Code:
            help separate 
            This works with your data example, but it's not clear that lots of new variables will help much with most Stata problems.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str24 country str19 indicator2 float(patents pct_ict pct_million pct_total)
            "Australia" "pct_pharma"    177.31       .    . .
            "Australia" "pct_pharma"   180.453       .    . .
            "Australia" "pct_ict"      506.754 506.754    . .
            "Australia" "pct_nanotech"  9.7143       .    . .
            "Australia" "pct_biotech"  217.789       .    . .
            "Australia" "pct_biotech"  180.272       .    . .
            "Australia" "pct_medical"  164.226       .    . .
            "Australia" "pct_ict"      559.837 559.837    . .
            "Australia" "rd_exp_he"     9918.6       .    . .
            "Australia" "pct_million"     83.8       . 83.8 .
            end
            
            separate patents, by(indicator2) veryshortlabel 
            
            local newvars `r(varlist)'
            
            foreach v of local newvars {
               local label : var label `v'
               rename `v' patents_`label'
            }

            Comment


            • #7
              Thanks a lot Nick, that worked perfectly!
              I did not the - separate - command, very useful indeed

              Dalila

              Comment


              • #8
                As the record shows, I was the original author of separate and so remain positive about its usefulness. But almost always in my experience that's when you need distinct variables for graphical purposes. For most data management or statistical goals, there is a better way to proceed.

                Comment


                • #9
                  I basically need to do some maps and graphics that is why it was better to have separate variables. Normally I'd use the if qualifier instead of creating superflous variables, therefore many thanks!

                  Comment

                  Working...
                  X