Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a mean variable for different years and states

    Hi all,

    I am trying to generate the mean for the abortion rate variable, for years 1973-1980 for different U.S. states. Each state ID takes on the values 1-56; the annual abortion rate for each state is denoted by ab1973, ab1974..., ab1980. I have tried the code:

    Code:
    . egen mean_abrate = mean(abrate), by(ab1980)
    But that does not work. I am also not sure if the way I have coded my data is wrong too. Can anyone help me out please?

    I am running a Probit model with the average abortion rate variable for reference. Here's the sample of the data set that I am working with:

    Code:
    clear
    id float(ab1973 ab1974 ab1975 ab1976 ab1977 ab1978 ab1979 ab1980)
     1               6   6.4 7.4  10.1  13.8  17.5 19.9  23.1
     2              14.8  14.2  18.9  18.5    28  26.6 18.6  17.9
     4              6.1  11.8  13.4    15  17.3  23.1 23.9    25
     5              2.7   4.5   5.9   9.3   7.7  10.7 12.4  12.3
     6             30.5  32.7  33.2  37.5  39.6  43.1 44.4  43.7
     8             13.5  19.3  22.3  23.4  28.2  28.8 31.4  31.4
     9              10.4  14.1  17.1    23  24.1  25.4 25.8  25.6
    10             16.3    17  17.9  19.1  22.2  24.9 25.7  25.9
    11             251.4 187.7 180.1 185.2 183.3 179.7  168 168.3
    12             14.7  18.7  27.4  28.8    31    33 35.1  35.5
    13               10  19.2  20.4  21.5  26.3  28.5 29.8  28.4
    15             24.9  27.2  29.5  28.3  37.2  32.3 33.7  34.4
    16              2.3   4.5     6   7.1     9   8.9  9.8  12.7
    17             13.6  19.2  24.4  29.1  28.1  27.1 26.9  25.9
    18               1.5   5.1   6.5   7.2   8.5  12.3   15  15.3
    19                4  10.2  10.4  11.1  11.3  14.5 11.9  14.3
    20            26.5  27.8  29.2  26.1    29  26.2   25  25.6
    21             3.6  10.6  11.4  13.2  14.5  15.5 15.4  15.1
    22               0   5.1     6   9.4  15.7  13.9 16.4  17.6
    23              3.3   8.4   8.8  11.3    14  21.2 18.8  18.6
    24             12.2  18.5  23.1  24.4    24  25.6 27.3  29.2
    25              10.1  22.4  26.4  29.9  30.2  32.2 33.6  33.5
    26             19.1  18.8  21.1  24.2  25.5  26.5 29.1  29.7
    27              8.8  13.7  14.8  17.7  18.9    19 20.2  20.7
    28               .2    .8    .6   3.2   5.2   7.3    9  10.6
    29              3.5   8.3  10.7    13  13.9  14.4 16.1  19.4
    30               3   7.9   9.3  10.1  13.5  19.9 17.8  20.1
    31              7.2    10  14.2  13.9  16.1  19.3 16.2  17.9
    32              7.1  10.3  19.9    24  25.4  33.9 43.7  46.6
    33              3.3   4.7   9.9  13.2  17.3  19.3 19.7  21.1
    34              6.7  14.6  20.6  24.9  27.7  29.2 30.7  32.8
    35            18.9  14.1  19.2  20.8  22.2  21.3 26.5    27
    36            54.5  43.1  43.3    43  46.3  46.4 45.1  45.8
    37            10.2  13.8    16  18.5    19  22.5 22.2  22.8
    38            1.2   6.7  12.1  13.8  15.9 19.6  21.5
    39            7.4    13  18.8  22.3  24.8  23.3 25.9  26.8
    40            1.   7.4  11.2  12.2  15.7  17.1 16.1  16.4
    41             23.3  26.3  25.1  23.4  26.6  24.6 29.2  28.3
    42            12.2  16.8  18.9  22.2  24.2  25.1 24.8  26.1
    44.           5.8  14.7  16.5  19.1  20.3  21.7 26.5  30.7
    45            3.5   8.1   9.5   8.5  12.8  18.3   18  18.2
    46           12.2  11.9  11.2  11.3   9.7   9.6  9.7     9
    47           5.8  16.9  18.3  23.3  30.2    24 22.3  23.6
    48           6.4  13.5  17.7  20.3  22.8  28.5   26    30
    49          .4   5.4     7   8.7    10  11.5  9.8  12.3
    50           14.8  17.9    21  24.9  20.8  28.1   28  30.4
    51           7.6  14.6  17.5  21.1  22.9  24.2 24.4  24.2
    53          21.8  24.3  24.7  26.2  34.6    37   37  37.5
    54          .1    .1    .3   2.4   5.5   6.8  7.3   6.9
    55          8.6  11.1  11.5  14.4  15.7  16.7   19  20.1
    56            2.2   3.7   5.9     6   9.2  10.3  9.4   9.5
    end
    Last edited by Yen Khor; 15 Apr 2021, 14:52.

  • #2
    Long form may work better:
    Code:
    reshape long ab, i(id) j(year)
    tabstat ab, by(id) stat(mean) format(%5.2f)
    Results:
    Code:
          id |      mean
    ---------+----------
           1 |     13.03
           2 |     19.69
           4 |     16.95
           5 |      8.19
           6 |     38.09
           8 |     24.79
           9 |     20.69
          10 |     21.13
          11 |    187.96
          12 |     28.02
          13 |     23.01
          15 |     30.94
          16 |      7.54
          17 |     24.29
          18 |      8.93
          19 |     10.96
          20 |     26.93
          21 |     12.41
          22 |     10.51
          23 |     13.05
          24 |     23.04
          25 |     27.29
          26 |     24.25
          27 |     16.73
          28 |      4.61
          29 |     12.41
          30 |     12.70
          31 |     14.35
          32 |     26.36
          33 |     13.56
          34 |     23.40
          35 |     21.25
          36 |     45.94
          37 |     18.13
          39 |     20.29
          40 |     12.14
          41 |     25.85
          42 |     21.29
          44 |     19.41
          45 |     12.11
          46 |     10.57
          47 |     20.55
          48 |     20.65
          49 |      8.14
          50 |     23.24
          51 |     19.56
          53 |     30.39
          54 |      3.68
          55 |     14.64
          56 |      7.02
    ---------+----------
       Total |     22.01
    --------------------

    Comment


    • #3
      You have the right idea, but not data layout. You currently have your data in wide layout. While you can do:

      Code:
      egen wanted= rowmean(ab*)
      it's better to reshape long and work in long layout:

      Code:
      reshape long ab, i(id) j(year)
      bys id: egen wanted= mean(ab)

      Comment


      • #4
        Hi Ken and Andrew,

        Thank you for your suggestions. I tried Andrew's code for the wide layout and it worked out.

        I know it is better to reshape my data into a long layout but I have other variables attached to the state id that only take one value (example: Year of the legalization of abortion). Is there a possible way to go about this?

        Comment


        • #5
          You don't need a workaround. It's fine that some variables are constant for blocks of observations in long layout, just as state will be.

          The layout you started with was no doubt convenient for the data provider and is easy to read into Stata, but the advantages don't go much further and the serious disadvantages are immediate. For example, consider how you would show some sample graphs of time series.

          reshape long is excellent advice.

          Comment

          Working...
          X