Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I generate a new variable which is the average of another variable (conditioned on another variable)

    Hi,

    I am trying to create a new variable, pm25_final, from "pm25," which is defined as the mean pm2.5 levels averaged over different hours of the day. Time is the variable that increases by 1 every day. Like so:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 station_name byte station_id float pm25 int time byte hour float policy
    "Anand Vihar" 1 443.67 1 10 0
    "Anand Vihar" 1  457.5 1 11 0
    "Anand Vihar" 1 342.83 1 12 0
    "Anand Vihar" 1 152.83 1 13 0
    "Anand Vihar" 1  129.5 1 14 0
    "Anand Vihar" 1 122.83 1 15 0
    "Anand Vihar" 1 151.67 1 16 0
    "Anand Vihar" 1    147 1 17 0
    "Anand Vihar" 1 137.33 1 18 0
    "Anand Vihar" 1 190.33 1 19 0
    "Anand Vihar" 1 327.82 1 20 0
    "Anand Vihar" 1    351 1 21 0
    "Anand Vihar" 1    324 1 22 0
    "Anand Vihar" 1    266 1 23 0
    "Anand Vihar" 1  286.5 1 24 0
    "Anand Vihar" 1 357.67 2  1 0
    "Anand Vihar" 1 440.17 2  2 0
    "Anand Vihar" 1    385 2  3 0
    "Anand Vihar" 1 378.33 2  4 0
    "Anand Vihar" 1 368.33 2  5 0
    "Anand Vihar" 1 369.17 2  6 0
    "Anand Vihar" 1  418.5 2  7 0
    "Anand Vihar" 1 437.33 2  8 0
    "Anand Vihar" 1 433.17 2  9 0
    "Anand Vihar" 1 364.33 2 10 0
    "Anand Vihar" 1 281.18 2 11 0
    "Anand Vihar" 1 179.55 2 12 0
    "Anand Vihar" 1 158.17 2 13 0
    "Anand Vihar" 1    132 2 14 0
    "Anand Vihar" 1 141.67 2 15 0
    "Anand Vihar" 1 128.33 2 16 0
    "Anand Vihar" 1 129.83 2 17 0
    "Anand Vihar" 1  180.5 2 18 0
    "Anand Vihar" 1  192.5 2 19 0
    "Anand Vihar" 1 224.83 2 20 0
    "Anand Vihar" 1 280.33 2 21 0
    "Anand Vihar" 1 309.83 2 22 0
    "Anand Vihar" 1 294.33 2 23 0
    "Anand Vihar" 1  270.5 2 24 0
    "Anand Vihar" 1 264.83 3  1 0
    "Anand Vihar" 1 295.09 3  3 0
    "Anand Vihar" 1 254.91 3  4 0
    "Anand Vihar" 1 292.27 3  5 0
    "Anand Vihar" 1 295.27 3  6 0
    "Anand Vihar" 1 368.18 3  7 0
    "Anand Vihar" 1 391.18 3  8 0
    "Anand Vihar" 1 418.36 3  9 0
    "Anand Vihar" 1 452.73 3 10 0
    "Anand Vihar" 1 431.64 3 11 0
    "Anand Vihar" 1  377.4 3 12 0
    "Anand Vihar" 1 267.25 3 13 0
    "Anand Vihar" 1 185.09 3 14 0
    "Anand Vihar" 1  192.6 3 15 0
    "Anand Vihar" 1  148.2 3 16 0
    "Anand Vihar" 1    208 3 17 0
    "Anand Vihar" 1  275.1 3 18 0
    "Anand Vihar" 1 583.33 3 19 0
    "Anand Vihar" 1 381.83 3 20 0
    "Anand Vihar" 1    485 3 21 0
    "Anand Vihar" 1 582.33 3 22 0
    "Anand Vihar" 1 600.67 3 23 0
    "Anand Vihar" 1 640.83 3 24 0
    "Anand Vihar" 1 686.27 4  1 0
    "Anand Vihar" 1 548.09 4  3 0
    "Anand Vihar" 1 469.09 4  4 0
    "Anand Vihar" 1 460.36 4  5 0
    "Anand Vihar" 1  504.6 4  6 0
    "Anand Vihar" 1 591.73 4  7 0
    "Anand Vihar" 1 542.82 4  8 0
    "Anand Vihar" 1 561.36 4  9 0
    "Anand Vihar" 1 590.36 4 10 0
    "Anand Vihar" 1    473 4 11 0
    "Anand Vihar" 1  304.2 4 12 0
    "Anand Vihar" 1  299.5 4 13 0
    "Anand Vihar" 1 221.33 4 14 0
    "Anand Vihar" 1 164.55 4 15 0
    "Anand Vihar" 1 145.83 4 16 0
    "Anand Vihar" 1    136 4 17 0
    "Anand Vihar" 1 177.17 4 18 0
    "Anand Vihar" 1  249.5 4 19 0
    "Anand Vihar" 1  343.5 4 20 0
    "Anand Vihar" 1 394.17 4 21 0
    "Anand Vihar" 1 371.33 4 22 0
    "Anand Vihar" 1  419.5 4 23 0
    "Anand Vihar" 1  386.5 4 24 0
    "Anand Vihar" 1    434 5  1 0
    "Anand Vihar" 1  414.2 5  2 0
    "Anand Vihar" 1 341.67 5  3 0
    "Anand Vihar" 1 380.17 5  4 0
    "Anand Vihar" 1 405.17 5  5 0
    "Anand Vihar" 1 370.17 5  6 0
    "Anand Vihar" 1 212.17 5  7 0
    "Anand Vihar" 1    187 5  8 0
    "Anand Vihar" 1 148.17 5  9 0
    "Anand Vihar" 1    103 5 10 0
    "Anand Vihar" 1  98.67 5 11 0
    "Anand Vihar" 1    100 5 12 0
    "Anand Vihar" 1  86.33 5 13 0
    "Anand Vihar" 1   64.5 5 14 0
    "Anand Vihar" 1  50.67 5 15 0
    end
    I want to create a variable that gives the average of pm25 which gives only ONE observation for time (so that one observation is the mean pm25 levels averaged over every hour of the day (represented by time).

    So it will look like (just an example):


    station_id. pm25_ final time
    1. 258. 1
    2. 290. 2

    Thank you,
    Anisha

  • #2
    Does this work?

    Code:
    collapse (mean) mean_pm25=pm25, by(station_id time)
    -or-

    Code:
    bys station_id time: egen pm25_final = mean(pm25)
    bys station_id time: keep if _n==1
    keep station_id pm25_final time
    order station_id pm25_final time

    Comment


    • #3
      both of these commands sadly make all other variables disappear. Is there a way around this?

      Comment


      • #4
        Try:
        Code:
         
         bys station_id time: egen pm25_final = mean(pm25)

        Comment


        • #5
          In addition to

          Code:
          bys station_id time: egen pm25_final = mean(pm25)
          note
          Code:
          egen tag = tag(station_id time)
          See the help for egen for explanation.

          Comment

          Working...
          X