Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generalize loop for all countries

    Hi Statalist,


    I am currently working on my final thesis on whether democracies pursue a higher tax rate than autocracies. Therefore I have a list of countries with the top tax rate and how democratic (from 0 = completely autocratic to 4 = liberal democracy) a country was in that year.
    Now I want to avoid writing a query for each country. How do I generalize the following sequence:
    Code:
    gen DE_AUTO = . gen DE_DEMO = . forvalues year = 1900/2023 { replace DE_DEMO = DE_PIT if DE_SYS >= 2 replace DE_AUTO = DE_PIT if DE_SYS < 2 }
    Thank you!

  • #2
    Code:
    gen DE_AUTO = .
    gen DE_DEMO = .   
    
    forvalues year = 1900/2023 {     
        replace DE_DEMO = DE_PIT if DE_SYS >= 2     
        replace DE_AUTO = DE_PIT if DE_SYS < 2 
    }
    On the face of it year is irrelevant unless you have years before 1900.

    The loop is therefore not needed and the code seems therefore to reduce to

    Code:
    gen DE_DEMO = DE_PIT if DE_SYS >= 2
    gen DE_AUTO = DE_PIT if DE_SYS < 2
    See also

    Code:
    help separate
    Last edited by Nick Cox; 23 Jan 2024, 06:06.

    Comment


    • #3
      Thanks for the advice.
      However, I have a period before 1900 for the variable PIT, but not for SYS, which is why I limited the period.

      Comment


      • #4
        That may imply adding an extra condition to the if qualifier.

        It does not, I think, mean using a loop.

        I can't follow whether you mean that you have missing values, or just that certain values are absent from your data. But watch out: mjssing on DE_SYS counts as > 2.

        Comment


        • #5
          I have no values for _SYS before 1900, but I do have values for _PIT before 1900 (even if there are some missing values).

          Regardless, as far as I can see the code above is giving me the results I need. My question, as written, is whether if it's possible to generalize the code. In other words, can I write a prompt for all variables *_PIT and *_SYS. I have 42 countries and do not want to write a prompt for each of them.

          Comment


          • #6
            You could loop over prefixes, something like this, but I can't test this code:

            Code:
            unab prefix : *_SYS 
            local prefix : subinstr local prefix "_SYS" "", all 
            
            foreach p of local prefix { 
                  gen `p'_DEMO = `p'_PIT if `p'_SYS >= 2
                  gen `p'_AUTO = `p'_PIT if `p'_SYS < 2
            }
            84 extra variables!

            Comment


            • #7
              Thank you very much! It works perfectly.

              Using a similar method, I should now be able to draw an average from all _DEMO and _AUTO. I tried it with this loop, but there is an error somewhere:

              Code:
              local varlist *_DEMO
              foreach `var' of local varlist {
              egen AVG_DEMO = mean(`var'_DEMO)
              }

              Comment


              • #8
                Remove the single quotation marks from the foreach statement.

                These 42 variables contain constants only.

                Comment


                • #9
                  Thanks for your help.

                  With the code
                  Code:
                  unab prefix : *_SYS 
                  local prefix : subinstr local prefix "_SYS" "", all 
                  
                  foreach p of local prefix { 
                      gen `p'_DEMO = `p'_PIT if `p'_SYS >= 2
                      gen `p'_AUTO = `p'_PIT if `p'_SYS < 2
                  }
                  
                  local varlist *_DEMO
                  foreach var of local varlist {
                  egen AVG_DEMO = mean(`var'_DEMO)
                  }
                  STATA gives me the following output
                  Code:
                  *_DEMO_DEMO invalid name
                  removing _DEMO from `var'_DEMO results in "*_DEMO invalid name"

                  Comment


                  • #10
                    So, you've specified DEMO twice. once by implication as part of each variable name and once explicitly. So, fix the loop

                    Code:
                    egen AVG_DEMO = mean(`var')
                    except that you can't, or shouldn't, mean what you say there. You are asking for the mean of each variable to be put in AVG_DEMO. Second time round the loop, that will fail because the variable AVG_DEMO already exists. If you really want 42 new variables each with a mean repeated, you must give them different names. Other way round, if you want the mean across 42 variables, that's not a loop, but
                    Code:
                    egen AVG_DEMO = rowmean(*_DEMO)

                    #8 was a quick reply from my phone. Sorry that I didn't spot the other problems then.

                    Comment


                    • #11
                      It now works perfectly. Thank you very much!

                      Comment

                      Working...
                      X