Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data regression

    Hi,

    My data looks like:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int time byte station_id str10 date float(pm25 policy)
      1 1 "2015-11-01" 255.38733 0
      2 1 "2015-11-02" 285.64792 0
      3 1 "2015-11-03"  364.8735 0
      4 1 "2015-11-04"  393.0765 0
      5 1 "2015-11-05" 204.98445 0
      6 1 "2015-11-06" 215.31944 0
      7 1 "2015-11-07"  483.1946 0
      8 1 "2015-11-08" 258.02063 0
      9 1 "2015-11-09" 306.41415 0
     10 1 "2015-11-10"  349.3691 0
     11 1 "2015-11-11"  242.0024 0
     12 1 "2015-11-12" 346.17285 0
     13 1 "2015-11-13" 453.81625 0
     14 1 "2015-11-14"  332.8383 0
     15 1 "2015-11-15"  325.8029 0
     16 1 "2015-11-16" 289.93582 0
     17 1 "2015-11-17" 186.54706 0
     18 1 "2015-11-18"  308.7605 0
     19 1 "2015-11-19"  362.0068 0
     20 1 "2015-11-20"    298.77 0
     21 1 "2015-11-21"  468.6359 0
     22 1 "2015-11-22"  337.9147 0
     23 1 "2015-11-23"   310.935 0
     24 1 "2015-11-24"  341.2752 0
     25 1 "2015-11-25" 207.26666 0
     26 1 "2015-11-26"   294.845 0
     27 1 "2015-11-27" 290.82584 0
     28 1 "2015-11-28" 228.25706 0
     29 1 "2015-11-29"  355.5382 0
     30 1 "2015-11-30"  448.9146 0
     31 1 "2015-12-01"  259.2621 0
     32 1 "2015-12-02" 227.75783 0
     33 1 "2015-12-03" 271.53583 0
     34 1 "2015-12-04" 273.63434 0
     35 1 "2015-12-05"  426.3975 0
     37 1 "2015-12-07" 338.68335 0
     38 1 "2015-12-08"  476.0822 0
     39 1 "2015-12-09"  331.6991 0
     40 1 "2015-12-10" 272.02832 0
     41 1 "2015-12-11" 269.66824 0
     42 1 "2015-12-12" 268.65543 0
     43 1 "2015-12-13" 213.85167 0
     44 1 "2015-12-14" 192.96727 0
     53 1 "2015-12-23"  397.8218 0
     54 1 "2015-12-24" 209.17546 0
     55 1 "2015-12-25" 188.85374 0
     56 1 "2015-12-26"  209.6796 0
     57 1 "2015-12-27"  215.4125 0
     58 1 "2015-12-28" 204.12695 0
     59 1 "2015-12-29"  193.9854 0
     60 1 "2015-12-30" 280.76434 0
     61 1 "2015-12-31"   293.795 0
     62 1 "2016-01-01"  329.5895 1
     63 1 "2016-01-02"  292.3737 1
     64 1 "2016-01-03"  339.5119 1
     65 1 "2016-01-04"  463.6259 1
     66 1 "2016-01-05" 418.11285 1
     67 1 "2016-01-06"  365.9205 1
     68 1 "2016-01-07"   431.468 1
     72 1 "2016-01-11"  442.4733 1
     73 1 "2016-01-12"    592.19 1
     74 1 "2016-01-13" 289.71167 1
     75 1 "2016-01-14"  274.2575 1
     76 1 "2016-01-15"  182.7229 1
     77 1 "2016-01-16" 143.23666 0
     78 1 "2016-01-17"  201.5161 0
     79 1 "2016-01-18"    335.98 0
     80 1 "2016-01-19"  322.1179 0
     81 1 "2016-01-20"   307.585 0
     82 1 "2016-01-21" 234.40916 0
     83 1 "2016-01-22" 301.30435 0
     84 1 "2016-01-23"  361.8912 0
     85 1 "2016-01-24"  388.8046 0
     86 1 "2016-01-25" 274.71957 0
     87 1 "2016-01-26"  372.9309 0
     88 1 "2016-01-27"  339.7796 0
     89 1 "2016-01-28"  361.7968 0
     90 1 "2016-01-29" 387.95935 0
     91 1 "2016-01-30"  346.5737 0
     95 1 "2016-02-03"   114.052 0
     96 1 "2016-02-04"    144.55 0
     97 1 "2016-02-05" 213.89067 0
     98 1 "2016-02-06"   298.728 0
     99 1 "2016-02-07" 170.50786 0
    100 1 "2016-02-08" 117.76826 0
    101 1 "2016-02-09" 172.35167 0
    102 1 "2016-02-10" 261.45782 0
    103 1 "2016-02-11" 218.81667 0
    104 1 "2016-02-12"   193.488 0
    105 1 "2016-02-13"  153.3725 0
    106 1 "2016-02-14"  89.75692 0
    107 1 "2016-02-15" 120.46435 0
    108 1 "2016-02-16" 137.67166 0
    109 1 "2016-02-17" 133.53696 0
    110 1 "2016-02-18" 160.53833 0
    111 1 "2016-02-19"   213.855 0
    112 1 "2016-02-20"  193.9894 0
    113 1 "2016-02-21"    70.775 0
    114 1 "2016-02-22" 104.34834 0
    116 1 "2016-02-24"        79 0
    end
    I am trying to run the regression: PM25st = 𝛼 + 𝛾s +𝛽(Policyst) + πœ–it




    In this model 𝛼 is the constant term, 𝛾𝑠 is a fixed effect for each station in Delhi, 𝑠 stands for

    station 𝑠, 𝑑 stands for time, 𝑃 π‘œπ‘™π‘–π‘π‘¦ is a dummy that is 1 between January 1 and January 15

    (both days included) and zero everywhere else.

    I just want to check that the way I am approaching this (with the panel nature of the data and pobservations on multiple stations) is correct:

    Code:
    xtset station_id time
    Panel variable: station_id (unbalanced)
     Time variable: time, 1 to 181, but with gaps
             Delta: 1 unit
    
    . xtreg pm25 policy, fe robust
    
    Fixed-effects (within) regression               Number of obs     =        992
    Group variable: station_id                      Number of groups  =          7
    
    R-squared:                                      Obs per group:
         Within  = 0.0672                                         min =         73
         Between = 0.0830                                         avg =      141.7
         Overall = 0.0593                                         max =        159
    
                                                    F(1, 6)           =      16.07
    corr(u_i, Xb) = 0.0041                          Prob > F          =     0.0070
    
                                 (Std. err. adjusted for 7 clusters in station_id)
    ------------------------------------------------------------------------------
                 |               Robust
            pm25 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          policy |   80.04457   19.96843     4.01   0.007     31.18358    128.9056
           _cons |   169.9105   1.570098   108.22   0.000     166.0686    173.7524
    -------------+----------------------------------------------------------------
         sigma_u |  34.883837
         sigma_e |   80.56265
             rho |  .15788854   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . regress pm25 policy
    
          Source |       SS           df       MS      Number of obs   =       992
    -------------+----------------------------------   F(1, 990)       =     62.41
           Model |   465993.26         1   465993.26   Prob > F        =    0.0000
        Residual |  7391431.64       990  7466.09257   R-squared       =    0.0593
    -------------+----------------------------------   Adj R-squared   =    0.0584
           Total |   7857424.9       991  7928.78396   Root MSE        =    86.407
    
    ------------------------------------------------------------------------------
            pm25 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          policy |   80.52398   10.19253     7.90   0.000     60.52253    100.5254
           _cons |   169.8728   2.858075    59.44   0.000     164.2642    175.4814
    ------------------------------------------------------------------------------

  • #2
    Hi
    I am not a guru by any means but here are some things to think about (and reasons why you have had several view but no replies).
    1. The data example is not what you use for your regression. Specifically the data example has 100 observations with IDs 1 to 116 but with gaps:
    Code:
    xtset station_id time
    gives:
    Panel variable: station_id (strongly balanced)
    Time variable: time, 1 to 116, but with gaps
    Delta: 1 unit
    which is not what you show in your output (n=100 not 992 as you show, and there are 100 IDs not 181).
    2. The example data is not panel data - there is only one value for your panel identifier (station_id) and there is only one observation for each time (date).
    3. The output you show is not from the data example. When I try to run you model, because of 1 & 2 above returns an error:
    Code:
     xtreg pm25 policy, fe robust
    returns the error r(2000) no observations
    4. Putting all that aside, your model suggests the only variable associated with (or less strictly, causing) pm25 is a dummy (policy). This is clearly mis-specified - there must be omitted variables.
    I hope that helps.
    Laurence
    Last edited by Laurence Lester; 25 Feb 2025, 18:55.

    Comment


    • #3
      Anisha:
      1) you are dealing with a T>N pabel dataset, Therefore, yo are recommended to switch from -xtreg- to -xtregar, fe-;
      2) your -regress- code is for a cross-sectional dataset (992 independent observations and a single wave of data);
      3) no simple (that is, with one predictor only) can be considered informative, for the very same reason Laurence pointed out (#4).
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thanks, Laurence and Carlo! I investigated the data more closely and realized that there are multiple observations for a station for any given time period (sub-divided by different hours of day) where time is a variable that increase by 1 for every single day:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str12 station_name byte station_id int time byte hour float pm25
        "Anand Vihar" 1 1 10 443.67
        "Anand Vihar" 1 1 11  457.5
        "Anand Vihar" 1 1 12 342.83
        "Anand Vihar" 1 1 13 152.83
        "Anand Vihar" 1 1 14  129.5
        "Anand Vihar" 1 1 15 122.83
        "Anand Vihar" 1 1 16 151.67
        "Anand Vihar" 1 1 17    147
        "Anand Vihar" 1 1 18 137.33
        "Anand Vihar" 1 1 19 190.33
        "Anand Vihar" 1 1 20 327.82
        "Anand Vihar" 1 1 21    351
        "Anand Vihar" 1 1 22    324
        "Anand Vihar" 1 1 23    266
        "Anand Vihar" 1 1 24  286.5
        "Anand Vihar" 1 2  1 357.67
        "Anand Vihar" 1 2  2 440.17
        "Anand Vihar" 1 2  3    385
        "Anand Vihar" 1 2  4 378.33
        "Anand Vihar" 1 2  5 368.33
        "Anand Vihar" 1 2  6 369.17
        "Anand Vihar" 1 2  7  418.5
        "Anand Vihar" 1 2  8 437.33
        "Anand Vihar" 1 2  9 433.17
        "Anand Vihar" 1 2 10 364.33
        "Anand Vihar" 1 2 11 281.18
        "Anand Vihar" 1 2 12 179.55
        "Anand Vihar" 1 2 13 158.17
        "Anand Vihar" 1 2 14    132
        "Anand Vihar" 1 2 15 141.67
        "Anand Vihar" 1 2 16 128.33
        "Anand Vihar" 1 2 17 129.83
        "Anand Vihar" 1 2 18  180.5
        "Anand Vihar" 1 2 19  192.5
        "Anand Vihar" 1 2 20 224.83
        "Anand Vihar" 1 2 21 280.33
        "Anand Vihar" 1 2 22 309.83
        "Anand Vihar" 1 2 23 294.33
        "Anand Vihar" 1 2 24  270.5
        "Anand Vihar" 1 3  1 264.83
        "Anand Vihar" 1 3  3 295.09
        "Anand Vihar" 1 3  4 254.91
        "Anand Vihar" 1 3  5 292.27
        "Anand Vihar" 1 3  6 295.27
        "Anand Vihar" 1 3  7 368.18
        "Anand Vihar" 1 3  8 391.18
        "Anand Vihar" 1 3  9 418.36
        "Anand Vihar" 1 3 10 452.73
        "Anand Vihar" 1 3 11 431.64
        "Anand Vihar" 1 3 12  377.4
        "Anand Vihar" 1 3 13 267.25
        "Anand Vihar" 1 3 14 185.09
        "Anand Vihar" 1 3 15  192.6
        "Anand Vihar" 1 3 16  148.2
        "Anand Vihar" 1 3 17    208
        "Anand Vihar" 1 3 18  275.1
        "Anand Vihar" 1 3 19 583.33
        "Anand Vihar" 1 3 20 381.83
        "Anand Vihar" 1 3 21    485
        "Anand Vihar" 1 3 22 582.33
        "Anand Vihar" 1 3 23 600.67
        "Anand Vihar" 1 3 24 640.83
        "Anand Vihar" 1 4  1 686.27
        "Anand Vihar" 1 4  3 548.09
        "Anand Vihar" 1 4  4 469.09
        "Anand Vihar" 1 4  5 460.36
        "Anand Vihar" 1 4  6  504.6
        "Anand Vihar" 1 4  7 591.73
        "Anand Vihar" 1 4  8 542.82
        "Anand Vihar" 1 4  9 561.36
        "Anand Vihar" 1 4 10 590.36
        "Anand Vihar" 1 4 11    473
        "Anand Vihar" 1 4 12  304.2
        "Anand Vihar" 1 4 13  299.5
        "Anand Vihar" 1 4 14 221.33
        "Anand Vihar" 1 4 15 164.55
        "Anand Vihar" 1 4 16 145.83
        "Anand Vihar" 1 4 17    136
        "Anand Vihar" 1 4 18 177.17
        "Anand Vihar" 1 4 19  249.5
        "Anand Vihar" 1 4 20  343.5
        "Anand Vihar" 1 4 21 394.17
        "Anand Vihar" 1 4 22 371.33
        "Anand Vihar" 1 4 23  419.5
        "Anand Vihar" 1 4 24  386.5
        "Anand Vihar" 1 5  1    434
        "Anand Vihar" 1 5  2  414.2
        "Anand Vihar" 1 5  3 341.67
        "Anand Vihar" 1 5  4 380.17
        "Anand Vihar" 1 5  5 405.17
        "Anand Vihar" 1 5  6 370.17
        "Anand Vihar" 1 5  7 212.17
        "Anand Vihar" 1 5  8    187
        "Anand Vihar" 1 5  9 148.17
        "Anand Vihar" 1 5 10    103
        "Anand Vihar" 1 5 11  98.67
        "Anand Vihar" 1 5 12    100
        "Anand Vihar" 1 5 13  86.33
        "Anand Vihar" 1 5 14   64.5
        "Anand Vihar" 1 5 15  50.67
        end
        The policy variable is not misspecified, but observations with policy=1 are not showing up in the dataex output because there are only a few, and these appear way down in the data:

        Code:
         tab policy
        
             policy |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |     17,353       93.44       93.44
                  1 |      1,219        6.56      100.00
        ------------+-----------------------------------
              Total |     18,572      100.00

        The same issue is with the station id (since the data is sorted by station id) it is hard to capture all stations in the dataex example.

        Code:
         tab station_id station_name
        
                   |                                 station_name
        station_id | Anand V..     Dwarka      IHBAS  Mandir ..  Punjabi..  R K Puram   Shadipur |     Total
        -----------+-----------------------------------------------------------------------------+----------
                 1 |     2,548          0          0          0          0          0          0 |     2,548
                 3 |         0      2,830          0          0          0          0          0 |     2,830
                 7 |         0          0      1,347          0          0          0          0 |     1,347
                 8 |         0          0          0      2,705          0          0          0 |     2,705
                 9 |         0          0          0          0      2,982          0          0 |     2,982
                10 |         0          0          0          0          0      3,041          0 |     3,041
                12 |         0          0          0          0          0          0      3,119 |     3,119
        -----------+-----------------------------------------------------------------------------+----------
             Total |     2,548      2,830      1,347      2,705      2,982      3,041      3,119 |    18,572
        I have three questions:
        1) How can I use dataex better to give a proper overview of my data?
        2) I am thinking that I should average out the PM2.5 variable over the hours of the day so I have only one observation per station and per time period. Is this a good idea for the regression purposes? If so, what command should I use (if i use the collapse (mean) command I lose the rest of my data).
        3) Also, which regression is suitable in this case where the panel data is unbalanced and the time variable has observations missing for few dates?

        Thanks,
        Anisha
        Last edited by anisha arya; 26 Feb 2025, 08:30.

        Comment


        • #5
          Anisha:
          at its face-value, your example is a time-series, not a panel dataset.
          Actually, you have a single -id- repeatedly measured across time.
          It would be better if you could post an excerpt of your dataset with three stations.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Hi Carlo, so i randomly sorted the data to give you a full overview. Here you go:
            Code:
            clear
            input str12 station_name byte station_id int time byte hour float(pm25 policy)
            "R K Puram"    10 108  8  119.5 0
            "Punjabi Bagh"  9  13  9 393.33 0
            "Shadipur"     12 152 13  92.34 0
            "R K Puram"    10 176 21     68 0
            "Dwarka"        3 167 17  66.52 0
            "Dwarka"        3 108 19  96.62 0
            "Punjabi Bagh"  9 151 17  30.73 0
            "R K Puram"    10  34 13  127.5 0
            "Shadipur"     12  10 15  94.04 0
            "R K Puram"    10 147  1    209 0
            "Shadipur"     12  24 14 104.39 0
            "Punjabi Bagh"  9   9 14 124.27 0
            "Punjabi Bagh"  9  84 22  385.5 0
            "Punjabi Bagh"  9  78  3 133.33 0
            "Dwarka"        3  34  2  90.93 0
            "Anand Vihar"   1 149 18  45.45 0
            "R K Puram"    10  81 15    198 0
            "Shadipur"     12  42 18 114.41 0
            "Shadipur"     12  43 14  55.25 0
            "Mandir Marg"   8  31  3 216.83 0
            "Anand Vihar"   1  53 23 401.17 0
            "Anand Vihar"   1 141 22   69.5 0
            "Punjabi Bagh"  9  31  2    301 0
            "R K Puram"    10   9  1 214.67 0
            "R K Puram"    10  25 24 192.67 0
            "Punjabi Bagh"  9 107 12 141.33 0
            "Mandir Marg"   8 179 24     73 0
            "IHBAS"         7  44  2 112.51 0
            "Punjabi Bagh"  9 174 13  85.17 0
            "Anand Vihar"   1  77  4    161 0
            "Mandir Marg"   8  11 18     74 0
            "Anand Vihar"   1  77 17 141.17 0
            "Shadipur"     12  30 19 286.35 0
            "Shadipur"     12 162 12 104.36 0
            "Dwarka"        3 141  4 103.53 0
            "Mandir Marg"   8 106 21   93.5 0
            "Mandir Marg"   8 105  5 114.18 0
            "Mandir Marg"   8  62  9 229.83 1
            "Anand Vihar"   1  90 15 187.36 0
            "Dwarka"        3  85 20 227.29 0
            "Punjabi Bagh"  9 116  6    339 0
            "Anand Vihar"   1 162 18  64.17 0
            "Dwarka"        3  31 12 215.43 0
            "Anand Vihar"   1 111 21  177.5 0
            "Shadipur"     12  85 24 163.21 0
            "Shadipur"     12 104  3 231.17 0
            "Shadipur"     12 162 10 100.33 0
            "Shadipur"     12 106 10 186.14 0
            "IHBAS"         7  14 24 109.21 0
            "R K Puram"    10  79 24  290.2 0
            "Mandir Marg"   8  95 12 165.17 0
            "Shadipur"     12 105 13 207.08 0
            "Mandir Marg"   8  42 24 229.83 0
            "Anand Vihar"   1 160 17  43.64 0
            "R K Puram"    10  61 20 234.33 0
            "Dwarka"        3  79  1  59.24 0
            "Punjabi Bagh"  9  60 15 109.33 0
            "Mandir Marg"   8 170  3  17.17 0
            "Punjabi Bagh"  9  55 17  56.67 0
            "Mandir Marg"   8  77  4 134.67 0
            "Anand Vihar"   1   2 13 158.17 0
            "Punjabi Bagh"  9  57 22 236.67 0
            "Punjabi Bagh"  9 174 18  105.5 0
            "Dwarka"        3  40  5 111.85 0
            "Anand Vihar"   1   2  6 369.17 0
            "R K Puram"    10 141  8 118.83 0
            "Anand Vihar"   1   1 20 327.82 0
            "R K Puram"    10  56 18   93.5 0
            "IHBAS"         7 171  8 132.25 0
            "R K Puram"    10  85  6    301 0
            "Dwarka"        3   9 23  147.5 0
            "Punjabi Bagh"  9 164 12  23.17 0
            "Punjabi Bagh"  9  81 14 253.67 0
            "R K Puram"    10   9  3 218.17 0
            "Mandir Marg"   8 177 13  64.83 0
            "Dwarka"        3  87  4  218.9 0
            "Dwarka"        3 121  4 161.88 0
            "Shadipur"     12 140 21 132.17 0
            "Shadipur"     12 162 15 153.93 0
            "IHBAS"         7  12 14   94.9 0
            "R K Puram"    10  88  6 281.33 0
            "Shadipur"     12 172 20 147.75 0
            "Dwarka"        3 171 17   84.2 0
            "Mandir Marg"   8 180 21  79.17 0
            "Shadipur"     12  14  3  78.93 0
            "Punjabi Bagh"  9  42 18 211.18 0
            "Dwarka"        3 111 16 127.38 0
            "Punjabi Bagh"  9  83  4 254.17 0
            "Shadipur"     12  22 11 181.39 0
            "R K Puram"    10  39 15  180.5 0
            "Punjabi Bagh"  9 136 14  56.17 0
            "Anand Vihar"   1  28 14    264 0
            "Anand Vihar"   1 166  8  88.27 0
            "Anand Vihar"   1  22 19    276 0
            "R K Puram"    10  96 13    103 0
            "Dwarka"        3  92  1 492.56 0
            "Dwarka"        3  18  4 111.03 0
            "Dwarka"        3 167 19  64.25 0
            "Anand Vihar"   1  58 16 127.67 0
            "Dwarka"        3   3 20 247.66 0
            end
            Hope this is better. Please let me know your thoughts and your answers to my questions. However this is only a random excerpt. In the actual dataset one station has multiple observations across time, and as well as hours (subset of time).

            Comment


            • #7
              Anisha:
              I would try:
              Code:
              sort station_id time
              egen wanted=group( time hour )
              xtset station_id wanted
              xtregar pm25 i.policy i.time, fe
              quietly testparm i.time
              di r(p)
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Hi Carlo,
                First of all, thank you for your help!

                I want to keep the time variable as "daily", not "hourly." Is there a way to average pm2.5 over hours to get the average for whole days.

                Comment


                • #9
                  Anisha:
                  yoiur main issue is with -timevar-.
                  Unlike -xtreg-, -xtregar- can't live without -xtset-ting your dataset with it (and you have repeated repeated time values within panel).
                  That said, the only fix is creating a more precise -timevar-.
                  Lastly, a within-panel mean value of -pm2.5- will corrupt the idea of a panel data regression, due to lack of within-pabel variation.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment

                  Working...
                  X