Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using the foreach/forval commands for different categories of a categorical variable

    I have a panel dataset with the following key variables: race, wealth, year. Race is a string variable with the following categories: black, Hispanic, white.

    I wish to generate mean wealth for each race by year using the foreach/forval commands. I tried running the following code:

    foreach i in black Hispanic white {
    2. egen meanwealth_`i' = mean(wealth) if race==`i', by(year)
    3. }

    Stata returns the following error message: black not found r(111);

    I then encoded the race variable, calling the new variable race1 and ran the following code:

    forval i = 1/3 {
    2. egen meanwealth_`i' = mean(wealth) if race1==`i', by(year)
    3. }

    The code worked perfectly but instead of having variables like meanwealth_black I now have variables like meanwealth_1.

    How can I have my variables be named after their categories instead of numbers when using the foreach command?

    Thanks in advance for any and all help.
    Last edited by Siddharth Jamad; 24 Oct 2021, 04:15.

  • #2
    Siddharth:
    welcome to this forum.
    -forval- loops over numbers(not names).
    Your code asks it for calculating -meanwealth_1- if [bla, bla; just to make the story shorter and not to be disrespectful] -race1==1- (and then 2 and eventually 3).
    But the -men- function from -egen- is byable.
    So you may want to try:
    Code:
    bysort race1 (year): egen meanwealth=mean(wealth)
    The following toy-example might be helpful:
    Code:
    set obs 10
    number of observations (_N) was 0, now 10
    
    . g id=_n
    
    . g y=runiform()*10
    
    . g race="black" in 1/4
    
    . replace race="white" in 5/7
    
    . replace race="venusian" in 8/10
    
    . encode race, g(num_race)
    
    
    . label list
    num_race:
               1 black
               2 venusian
               3 white
    
    . forval i = 1/3 {
      2. sum y if num_race==`i'
      3. }
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               y |          4    1.952401    1.413651   .2855687   3.488717
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               y |          3    5.848207    2.775075    3.23368   8.759911
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               y |          3    4.302978    4.048035   .7110509   8.689333
    
    . bysort race: egen wanted=mean(y)
    
    . list
    
         +------------------------------------------------+
         | id          y       race   num_race     wanted |
         |------------------------------------------------|
      1. |  3   1.366463      black      black   1.952401 |
      2. |  4   .2855687      black      black   1.952401 |
      3. |  2   2.668857      black      black   1.952401 |
      4. |  1   3.488717      black      black   1.952401 |
      5. |  9   5.551032   venusian   venusian   5.848207 |
         |------------------------------------------------|
      6. |  8    3.23368   venusian   venusian   5.848207 |
      7. | 10   8.759911   venusian   venusian   5.848207 |
      8. |  6   3.508549      white      white   4.302978 |
      9. |  7   .7110509      white      white   4.302978 |
     10. |  5   8.689333      white      white   4.302978 |
         +------------------------------------------------+
    
    .
    Last edited by Carlo Lazzaro; 24 Oct 2021, 04:42.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Note as a start that


      0. The same results could be held in

      Code:
      egen meanwealth = mean(wealth, by(race1 year)
      Otherwise you can get what you want in several ways. Here are some of them.

      1. Apply separate to the results of #0.

      2. rename your existing variables. One line

      3. Loop over races.

      Code:
      foreach  i in black Hispanic white   {
          egen meanwealth_`i' = mean(wealth) if race == "`i'", by(year)
      }
      4. A parallel loop

      Code:
      tokenize "black Hispanic white"
      
      forval i = 1/3 {
          egen meanwealth_``i'' = mean(wealth) if race1==`i', by(year)
      }
      5. Another parallel loop.

      Code:
      local names black Hispanic white
      
      forval i = 1/3 {
             gettoken name names : names
             egen meanwealth_`name' = mean(wealth) if race1==`i', by(year)
      }

      Comment


      • #4
        -men- reads -mean- in post #2. Thanks.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Another typo: the first line of code in #3 should be

          Code:
           
           egen meanwealth = mean(wealth), by(race1 year)

          Comment


          • #6
            Perfect, the 3rd suggestion by Nick is what I was fussing over. Thanks a lot to you both.

            Comment

            Working...
            X