I have high standar error in my DID panel data

Hanni Wirawan

Join Date: May 2019
Posts: 12

I have high standar error in my DID panel data

17 Aug 2019, 02:15

hello,

i am doing some research in DID with panel data. i want to see the effect of renewable energy in reducing welfare-recipient or people who receive social security insurance from the government.
my data is :

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long welfare_recipient float(renewable nongrid renewablexnongrid jr_sch mid_sch infra_health industry TVsignal Phonesignal)
 469 0 0 0 0 0 1 1 0 0
 157 0 0 0 1 0 1 0 0 1
 474 0 0 0 1 0 1 0 0 1
 661 0 0 0 1 0 1 1 0 0
 342 0 0 0 1 0 1 0 0 1
 100 0 0 0 1 0 1 0 0 1
 279 0 0 0 0 1 1 1 0 0
 301 0 0 0 1 0 1 0 0 1
 376 0 0 0 1 0 1 0 0 1
 249 0 0 0 0 1 1 1 0 0
 285 0 0 0 0 1 1 1 0 1
 287 0 0 0 0 1 1 0 0 1
 115 0 0 0 0 0 0 0 1 0
 228 0 0 0 0 0 1 1 0 1
 210 0 0 0 0 0 1 0 0 1
 185 0 0 0 1 0 1 1 1 0
 262 0 0 0 1 0 1 1 0 1
 200 0 0 0 1 0 1 0 0 1
 160 0 0 0 1 0 0 0 1 0
 423 0 0 0 1 0 1 1 0 1
 419 0 0 0 1 0 1 1 0 1
 510 0 0 0 1 0 1 1 0 0
 583 0 0 0 1 0 1 0 0 0
 120 0 0 0 1 0 1 0 0 1
 527 0 0 0 1 0 1 1 0 0
 218 0 0 0 1 1 1 0 0 1
 308 0 0 0 1 1 1 0 0 1
 198 0 0 0 0 0 1 1 0 0
 202 0 0 0 0 0 1 0 0 1
 209 0 0 0 0 0 1 0 0 1
 269 0 0 0 1 0 1 1 1 0
1126 0 0 0 1 0 1 1 0 0
1000 0 0 0 1 1 1 1 0 1
   0 0 0 0 1 1 1 0 0 1
1336 0 0 0 1 1 1 0 0 1
 239 0 0 0 1 1 1 0 1 0
 451 0 0 0 1 1 1 0 0 0
 250 0 0 0 1 1 1 1 0 1
   0 0 0 0 1 1 1 0 0 1
1311 0 0 0 1 1 1 0 0 1
 458 0 0 0 1 1 1 0 1 0
1051 0 0 0 1 1 1 1 0 1
 800 0 0 0 1 1 1 1 0 1
   0 0 0 0 1 1 1 0 0 1
1440 0 0 0 1 1 1 0 0 1
 277 0 0 0 1 0 1 0 1 0
 334 0 0 0 1 0 1 1 0 1
  85 0 0 0 1 0 1 1 0 1
   0 0 0 0 1 0 1 0 0 1
 585 0 0 0 1 0 1 0 0 1
 356 0 0 0 1 0 1 1 1 1
   0 0 0 0 1 0 1 1 0 1
 400 0 0 0 1 0 1 1 0 1
 750 0 0 0 1 0 1 1 0 1
 900 0 0 0 1 0 1 0 0 1
  52 0 0 0 1 1 1 0 1 1
   0 0 0 0 1 1 1 1 0 1
1161 0 0 0 0 0 1 1 1 1
  24 0 0 0 1 1 1 0 0 1
2000 0 0 0 1 1 1 1 0 1
1489 0 0 0 0 0 1 1 0 1
   0 0 0 0 0 0 1 1 0 1
 400 0 0 0 0 0 1 1 1 1
 150 0 0 0 1 0 1 0 0 1
1950 0 0 0 1 0 0 1 0 1
 617 0 0 0 1 1 1 0 1 1
   0 0 0 0 1 1 1 1 0 1
2730 0 0 0 1 1 1 1 1 1
   0 0 0 0 1 1 1 1 0 1
3100 0 0 0 1 1 1 1 0 1
1821 0 0 0 1 0 1 0 1 1
   0 0 0 0 1 1 1 1 0 1
 200 0 0 0 1 1 1 1 1 1
   0 0 0 0 1 1 1 0 0 1
1640 0 0 0 1 1 1 1 0 1
 218 0 1 0 1 1 1 0 1 1
   0 0 0 0 1 1 1 1 0 1
1200 0 0 0 1 1 1 1 1 1
  50 0 0 0 1 1 1 0 0 1
2100 0 0 0 1 1 1 1 0 1
 164 0 1 0 1 0 1 1 1 0
1024 0 0 0 1 0 1 1 0 1
 730 0 0 0 1 0 1 1 1 1
 393 0 0 0 1 0 1 0 0 1
 948 0 0 0 1 0 1 1 0 1
 385 0 1 0 1 0 1 1 1 0
 426 0 0 0 1 0 1 0 0 1
 499 0 0 0 1 0 1 0 0 1
 372 0 0 0 1 0 1 1 0 0
 520 0 0 0 1 0 1 0 0 1
 390 0 0 0 1 0 1 0 0 1
 709 0 0 0 1 1 1 0 1 0
1501 0 0 0 1 0 1 1 0 1
 608 0 0 0 1 0 1 1 0 1
 750 0 0 0 1 0 1 0 0 1
 194 0 0 0 1 0 1 0 0 1
 364 0 0 0 1 0 1 1 1 0
 426 0 0 0 1 0 1 1 0 0
   6 0 0 0 1 0 1 1 0 1
   0 0 0 0 1 0 1 0 0 1
end

and yes, all my independent variable is dummy.

and my output is :

Code:

.
xtset id_desa year
       panel variable:  id_desa (unbalanced)
        time variable:  year, 2006 to 2018, but with gaps
                delta:  1 unit

. xtreg Pen_Bantuan d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes i.year, fe cluster ( id_desa)

Fixed-effects (within) regression               Number of obs     =     18,062
Group variable: id_desa                         Number of groups  =      5,205

R-sq:                                           Obs per group:
     within  = 0.0428                                         min =          1
     between = 0.0006                                         avg =        3.5
     overall = 0.0102                                         max =          5

                                                F(13,5204)        =      34.85
corr(u_i, Xb)  = -0.0797                        Prob > F          =     0.0000

                            (Std. Err. adjusted for 5,205 clusters in id_desa)
------------------------------------------------------------------------------
             |               Robust
 Pen_Bantuan |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  d_year_ebt |   320.5818   84.82165     3.78   0.000     154.2957    486.8678
       d_PLN |   65.16589   17.55079     3.71   0.000     30.75897    99.57281
          DD |  -346.1853    84.1762    -4.11   0.000     -511.206   -181.1646
          sd |  -24.81595    12.6424    -1.96   0.050    -49.60035   -.0315438
         smp |   63.10132   18.13105     3.48   0.001     27.55684     98.6458
    industri |   25.61956   9.136575     2.80   0.005     7.708033    43.53108
    sinyalTV |  -34.88289   21.91833    -1.59   0.112    -77.85202    8.086237
    sinyalHP |  -18.75071   10.69288    -1.75   0.080    -39.71326     2.21183
   infra_kes |   .9282455   10.58314     0.09   0.930    -19.81915    21.67564
             |
        year |
       2008  |  -36.05412   14.96352    -2.41   0.016     -65.3889   -6.719336
       2011  |  -25.89031   15.40423    -1.68   0.093    -56.08907    4.308454
       2014  |   131.7223    16.8642     7.81   0.000     98.66139    164.7832
       2018  |   164.8634   18.79985     8.77   0.000     128.0078     201.719
             |
       _cons |   301.1723   20.57644    14.64   0.000     260.8339    341.5108
-------------+----------------------------------------------------------------
     sigma_u |   501.7345
     sigma_e |  513.91592
         rho |  .48800801   (fraction of variance due to u_i)
------------------------------------------------------------------------------

where : pen_bantuan is welfare recipient, d_year_ebt is year when renewable energy plant is operated, d_pln is non-grid area, DD is the interaction between renewable and non-grid area, and the other variable is my control variable.

like we can see that i have high standard error. it is okay or not?

than i try to make it become balance panel, and the result, my standard error became more higher

Code:

. xtset id_desa year
       panel variable:  id_desa (strongly balanced)
        time variable:  year, 2006 to 2018, but with gaps
                delta:  1 unit

. xtreg Pen_Bantuan d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes i.year, fe cluster ( id_desa)

Fixed-effects (within) regression               Number of obs     =      9,260
Group variable: id_desa                         Number of groups  =      1,852

R-sq:                                           Obs per group:
     within  = 0.0418                                         min =          5
     between = 0.0099                                         avg =        5.0
     overall = 0.0241                                         max =          5

                                                F(13,1851)        =      18.00
corr(u_i, Xb)  = 0.0005                         Prob > F          =     0.0000

                            (Std. Err. adjusted for 1,852 clusters in id_desa)
------------------------------------------------------------------------------
             |               Robust
 Pen_Bantuan |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  d_year_ebt |   413.0277   127.3231     3.24   0.001     163.3158    662.7396
       d_PLN |    101.005   25.91229     3.90   0.000     50.18461    151.8254
          DD |   -462.132   127.1197    -3.64   0.000    -711.4451   -212.8189
          sd |  -50.50605   25.00024    -2.02   0.044    -99.53767   -1.474426
         smp |    77.5842   24.66175     3.15   0.002     29.21642     125.952
    industri |   43.06753   13.63826     3.16   0.002     16.31953    69.81552
    sinyalTV |  -14.28977   31.86309    -0.45   0.654    -76.78115     48.2016
    sinyalHP |   -17.9242   16.27599    -1.10   0.271    -49.84542    13.99702
   infra_kes |   23.03077   21.26598     1.08   0.279    -18.67706     64.7386
             |
        year |
       2008  |  -57.11102   19.97784    -2.86   0.004     -96.2925   -17.92955
       2011  |  -24.54144   19.23714    -1.28   0.202    -62.27022    13.18734
       2014  |   125.6676   21.58866     5.82   0.000     83.32692    168.0083
       2018  |    187.306   24.15648     7.75   0.000     139.9292    234.6828
             |
       _cons |   388.3949   36.05687    10.77   0.000     317.6785    459.1114
-------------+----------------------------------------------------------------
     sigma_u |  612.57098
     sigma_e |  604.19824
         rho |   .5068808   (fraction of variance due to u_i)
------------------------------------------------------------------------------

it is okay with my standard error? or i have spurious regression problem with my model?

many thanks with the help

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#2

17 Aug 2019, 03:25

Your example data contains almost none of the variables used in your regression, so it is impossible to troubleshoot this. I will make a few general remarks here about what might be going on, but if you want more concrete advice, you will have to post back with a data example that can actually run the code you are showing.

A high standard error means that the estimate of the coefficient is not very precise. This can be due to a number of reasons. Perhaps the commonest reason is simply a small sample size, but that is not your situation here.

Another possibility is that the outcome variable itself is noisy. That is often the case with variables that are counts of things--as is the case here. The variable in your example data called welfare_recipient matches this description: the mean is close to 540 and the standard deviation is nearly 600--so unless your other predictors have a very large effect, the unexplained variance will be high, and that will in turn lead the standard errors to be high. The within-R² in your output is only about 0.04, so that also seems to fit your situation. So I think this is the most likely explanation.

Another possibility is if there is a high level of inter-correlation among your predictor variables ("multicolinearity") you can see larger standard errors--this is a problem for which there is no good solution other than getting better data.

As for whether this is a problem, that's a value judgment that only you can make. The 95% confidence interval on your estimated treatment effect (DD) is from -711.4451 to -212.8189. So, if, from a practical perspective, it matters whether the effect is really -711 or really -213, then you don't have a precise enough estimate--which means you need better data or a better model. But if, from a practical perspective, -711 vs -213 makes no difference, then your estimate is good enough and the high standard errors are not a problem.
Comment

Hanni Wirawan

Join Date: May 2019
Posts: 12

17 Aug 2019, 08:14

thank you very much for the response Prof. Clyde

Originally posted by Clyde Schechter View Post

Your example data contains almost none of the variables used in your regression, so it is impossible to troubleshoot this. I will make a few general remarks here about what might be going on, but if you want more concrete advice, you will have to post back with a data example that can actually run the code you are showing.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double id_desa float year long Pen_Bantuan float(d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes)
1101020001 2006  469 0 0 0 0 0 1 0 0 1
1101020001 2008  157 0 0 0 1 0 0 0 1 1
1101020001 2011  474 0 0 0 1 0 0 0 1 1
1101020002 2006  661 0 0 0 1 0 1 0 0 1
1101020002 2008  342 0 0 0 1 0 0 0 1 1
1101020002 2011  100 0 0 0 1 0 0 0 1 1
1101020003 2006  279 0 0 0 0 1 1 0 0 1
1101020003 2008  301 0 0 0 1 0 0 0 1 1
1101020003 2011  376 0 0 0 1 0 0 0 1 1
1101020004 2006  249 0 0 0 0 1 1 0 0 1
1101020004 2008  285 0 0 0 0 1 1 0 1 1
1101020004 2011  287 0 0 0 0 1 0 0 1 1
1101020005 2006  115 0 0 0 0 0 0 1 0 0
1101020005 2008  228 0 0 0 0 0 1 0 1 1
1101020005 2011  210 0 0 0 0 0 0 0 1 1
1101020006 2006  185 0 0 0 1 0 1 1 0 1
1101020006 2008  262 0 0 0 1 0 1 0 1 1
1101020006 2011  200 0 0 0 1 0 0 0 1 1
1101020007 2006  160 0 0 0 1 0 0 1 0 0
1101020007 2008  423 0 0 0 1 0 1 0 1 1
1101020007 2011  419 0 0 0 1 0 1 0 1 1
1101020008 2006  510 0 0 0 1 0 1 0 0 1
1101020008 2008  583 0 0 0 1 0 0 0 0 1
1101020008 2011  120 0 0 0 1 0 0 0 1 1
1101020009 2006  527 0 0 0 1 0 1 0 0 1
1101020009 2008  218 0 0 0 1 1 0 0 1 1
1101020009 2011  308 0 0 0 1 1 0 0 1 1
1101020010 2006  198 0 0 0 0 0 1 0 0 1
1101020010 2008  202 0 0 0 0 0 0 0 1 1
1101020010 2011  209 0 0 0 0 0 0 0 1 1
1101020022 2006  269 0 0 0 1 0 1 1 0 1
1101020022 2008 1126 0 0 0 1 0 1 0 0 1
1101020022 2011 1000 0 0 0 1 1 1 0 1 1
1101020022 2014    0 0 0 0 1 1 0 0 1 1
1101020022 2018 1336 0 0 0 1 1 0 0 1 1
1101020023 2006  239 0 0 0 1 1 0 1 0 1
1101020023 2008  451 0 0 0 1 1 0 0 0 1
1101020023 2011  250 0 0 0 1 1 1 0 1 1
1101020023 2014    0 0 0 0 1 1 0 0 1 1
1101020023 2018 1311 0 0 0 1 1 0 0 1 1
1101020024 2006  458 0 0 0 1 1 0 1 0 1
1101020024 2008 1051 0 0 0 1 1 1 0 1 1
1101020024 2011  800 0 0 0 1 1 1 0 1 1
1101020024 2014    0 0 0 0 1 1 0 0 1 1
1101020024 2018 1440 0 0 0 1 1 0 0 1 1
1101020025 2006  277 0 0 0 1 0 0 1 0 1
1101020025 2008  334 0 0 0 1 0 1 0 1 1
1101020025 2011   85 0 0 0 1 0 1 0 1 1
1101020025 2014    0 0 0 0 1 0 0 0 1 1
1101020025 2018  585 0 0 0 1 0 0 0 1 1
1101020026 2006  356 0 0 0 1 0 1 1 1 1
1101020026 2008    0 0 0 0 1 0 1 0 1 1
1101020026 2011  400 0 0 0 1 0 1 0 1 1
1101020026 2014  750 0 0 0 1 0 1 0 1 1
1101020026 2018  900 0 0 0 1 0 0 0 1 1
1101020027 2006   52 0 0 0 1 1 0 1 1 1
1101020027 2008    0 0 0 0 1 1 1 0 1 1
1101020027 2011 1161 0 0 0 0 0 1 1 1 1
1101020027 2014   24 0 0 0 1 1 0 0 1 1
1101020027 2018 2000 0 0 0 1 1 1 0 1 1
1101020028 2006 1489 0 0 0 0 0 1 0 1 1
1101020028 2008    0 0 0 0 0 0 1 0 1 1
1101020028 2011  400 0 0 0 0 0 1 1 1 1
1101020028 2014  150 0 0 0 1 0 0 0 1 1
1101020028 2018 1950 0 0 0 1 0 1 0 1 0
1101020029 2006  617 0 0 0 1 1 0 1 1 1
1101020029 2008    0 0 0 0 1 1 1 0 1 1
1101020029 2011 2730 0 0 0 1 1 1 1 1 1
1101020029 2014    0 0 0 0 1 1 1 0 1 1
1101020029 2018 3100 0 0 0 1 1 1 0 1 1
1101020030 2006 1821 0 0 0 1 0 0 1 1 1
1101020030 2008    0 0 0 0 1 1 1 0 1 1
1101020030 2011  200 0 0 0 1 1 1 1 1 1
1101020030 2014    0 0 0 0 1 1 0 0 1 1
1101020030 2018 1640 0 0 0 1 1 1 0 1 1
1101020031 2006  218 0 1 0 1 1 0 1 1 1
1101020031 2008    0 0 0 0 1 1 1 0 1 1
1101020031 2011 1200 0 0 0 1 1 1 1 1 1
1101020031 2014   50 0 0 0 1 1 0 0 1 1
1101020031 2018 2100 0 0 0 1 1 1 0 1 1
1101020032 2006  164 0 1 0 1 0 1 1 0 1
1101020032 2008 1024 0 0 0 1 0 1 0 1 1
1101020032 2011  730 0 0 0 1 0 1 1 1 1
1101020032 2014  393 0 0 0 1 0 0 0 1 1
1101020032 2018  948 0 0 0 1 0 1 0 1 1
1101020033 2006  385 0 1 0 1 0 1 1 0 1
1101020033 2008  426 0 0 0 1 0 0 0 1 1
1101020033 2011  499 0 0 0 1 0 0 0 1 1
1101020034 2006  372 0 0 0 1 0 1 0 0 1
1101020034 2008  520 0 0 0 1 0 0 0 1 1
1101020034 2011  390 0 0 0 1 0 0 0 1 1
1101020035 2006  709 0 0 0 1 1 0 1 0 1
1101020035 2008 1501 0 0 0 1 0 1 0 1 1
1101020035 2011  608 0 0 0 1 0 1 0 1 1
1101020035 2014  750 0 0 0 1 0 0 0 1 1
1101020035 2018  194 0 0 0 1 0 0 0 1 1
1101020036 2006  364 0 0 0 1 0 1 1 0 1
1101020036 2008  426 0 0 0 1 0 1 0 0 1
1101020036 2011    6 0 0 0 1 0 1 0 1 1
1101020036 2014    0 0 0 0 1 0 0 0 1 1
end

sorry for the previous sample data, here I post the sample data again.

i have another question, did I use the fixed effect correctly in this research? i can't doing hausman test with my data.

and about the balanced and unbalanced panel data, it is okay if i use unbalance panel data with my DID model? or i just make it balance panel data?

thank you

Comment

Hanni Wirawan

Join Date: May 2019

Posts: 12
#4

17 Aug 2019, 08:34

Originally posted by Clyde Schechter View Post

As for whether this is a problem, that's a value judgment that only you can make. The 95% confidence interval on your estimated treatment effect (DD) is from -711.4451 to -212.8189. So, if, from a practical perspective, it matters whether the effect is really -711 or really -213, then you don't have a precise enough estimate--which means you need better data or a better model. But if, from a practical perspective, -711 vs -213 makes no difference, then your estimate is good enough and the high standard errors are not a problem.

i think it is matters for the effect is -711 vs -213, so i assumed that i need better data or better model?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#5

17 Aug 2019, 09:11

Well, thank you, but this example is also unsuitable. In this example, d_year_ebt is always 0, and, consequently, so is DD. So DD gets omitted when -xtreg, fe- runs. In order to provide a more detailed analysis than I did in #2, you need to post an example in which the -xtreg- command you used produces a usable result.

i have another question, did I use the fixed effect correctly in this research? i can't doing hausman test with my data.

First, I don't understand why you can't do the Hausman test with your data (once you have data in which you can actually run the -xtreg- and not have the main variables omitted from colinearity). Second, I am not among those who believe that model selection should be based on significance tests of anything (which is what the Hausman test is). In general, for DID models, we are trying to estimate an effect of a treatment within panels (within id_desa), and the fixed effects estimator is the most direct way to do that. If the Hausman test were to say that random effects estimation is OK, all it is telling you is that it thinks the results of the random effects estimator aren't statistically significantly different from those of the fixed effects estimator. The advantage to using random effects in that situation is that the estimation may have a smaller standard error--which I suppose would be of some importance to you, but usually the difference is not enough to matter practically.

and about the balanced and unbalanced panel data, it is okay if i use unbalance panel data with my DID model? or i just make it balance panel data?

I see this kind of question here frequently, and I don't understand where it comes from. A long time ago, when these analyses were first becoming available in commercial software, they only worked with balanced data. But those days are really long, long gone. I'm not sure why anybody is teaching this as an issue anymore. There is no problem using unbalanced data here. And, the only way to "balance" the data is either to fabricate data to fill in gaps, or to drop some observations. Fabricating data is obviously a very bad idea. Dropping observations is also a bad idea, if less obviously so, because it is very likely to introduce bias into the analysis. So stop worrying about balanced vs unbalanced: it's not an issue. There are perhaps still a few analyses where unbalanced data is not acceptable: I am quite sure that if you encounter one, Stata will at least warn you that there is a problem when you try to run it. More likely, Stata will refuse to run the analysis if it cannot properly be done that way.
Comment
Hanni Wirawan

Join Date: May 2019

Posts: 12
#6

17 Aug 2019, 19:52

Originally posted by Clyde Schechter View Post

Well, thank you, but this example is also unsuitable. In this example, d_year_ebt is always 0, and, consequently, so is DD. So DD gets omitted when -xtreg, fe- runs. In order to provide a more detailed analysis than I did in #2, you need to post an example in which the -xtreg- command you used produces a usable result.

okay i noted it, do you have any suggest command that i can run in stata so i can reproduce my data set that can provide usable example data set? i used dataex command but it automated pick the top of my list data. thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#7

18 Aug 2019, 04:09

Try running this after loading your full dataset into memory:

Code:

tempfile copy save `copy' keep id_desa d_year_ebt d_PLN duplicates drop set seed 1234 sample 15, by(d_year_ebt d_PLN) count keep id_desa duplicates drop merge 1:m id_desa using `copy', assert(match using) keep(match) nogenerate quietly count dataex, count(`r(N)')

This will pull a random sample of 15 id_desa's representing each combination of d_year_ebt and d_PLN, and then extract all observations for those id_desa's, and then output them through -dataex-. Then post that. This will assure that the full factorial design of d_year_ebt and d_PLN is represented with at least a few id_desa's. The resulting -dataex- may be long since there could be up to 60 id_desa's sampled, and all observations on those id_desa's are being captured.
Comment

Hanni Wirawan

Join Date: May 2019
Posts: 12

18 Aug 2019, 07:59

Originally posted by Clyde Schechter View Post

Try running this after loading your full dataset into memory:

Code:

tempfile copy
save `copy'

keep id_desa d_year_ebt d_PLN
duplicates drop
set seed 1234
sample 15, by(d_year_ebt d_PLN) count
keep id_desa
duplicates drop
merge 1:m id_desa using `copy', assert(match using) keep(match) nogenerate
quietly count
dataex, count(`r(N)')

This will pull a random sample of 15 id_desa's representing each combination of d_year_ebt and d_PLN, and then extract all observations for those id_desa's, and then output them through -dataex-. Then post that. This will assure that the full factorial design of d_year_ebt and d_PLN is represented with at least a few id_desa's. The resulting -dataex- may be long since there could be up to 60 id_desa's sampled, and all observations on those id_desa's are being captured.

thank you very much for the code, and here is the sample data.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double id_desa float year long Pen_Bantuan float(d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes)
1202020048 2006   32 0 1 0 0 0 0 0 0 1
1202020048 2008  102 0 1 0 0 0 1 0 1 1
1202020048 2011   84 0 1 0 0 0 0 0 1 1
1202020048 2014  259 0 1 0 0 0 1 0 1 1
1202020048 2018  152 0 0 0 0 0 0 0 1 1
1202020052 2006   41 0 1 0 0 0 0 0 0 0
1202020052 2008    0 0 1 0 1 0 1 0 0 1
1202020052 2011    0 0 1 0 1 0 0 0 1 1
1202020052 2014  261 0 0 0 1 0 0 0 1 1
1202020052 2018   40 0 1 0 1 0 0 0 1 0
1203110046 2006    0 0 0 0 0 0 0 1 0 0
1203110046 2008    0 0 0 0 0 0 1 0 0 0
1203110046 2011    0 0 0 0 1 0 1 0 1 1
1203110046 2014  400 0 0 0 1 1 0 0 1 1
1203110046 2018   50 0 0 0 1 1 0 0 1 0
1203160112 2006    8 0 0 0 1 0 1 1 0 0
1203160112 2008    0 0 0 0 1 0 0 0 1 0
1204041003 2006  375 0 0 0 1 0 0 1 0 0
1204041003 2008  440 0 0 0 1 0 0 0 1 1
1204041003 2011  562 0 0 0 1 0 0 0 1 1
1204041003 2014  527 0 0 0 1 0 0 0 1 1
1204041003 2018   30 1 0 0 1 0 0 0 1 1
1206060007 2006    0 0 0 0 1 0 1 1 1 1
1206060007 2008   68 0 0 0 1 0 1 1 1 1
1206060007 2011  234 0 0 0 1 0 0 1 1 1
1206060007 2014  236 0 0 0 1 0 1 0 1 1
1206060007 2018  230 1 0 0 1 0 1 0 1 1
1214022006 2014  187 1 1 1 1 0 0 0 1 1
1214022006 2018    0 1 1 1 1 0 0 0 1 0
1301011002 2008    0 0 1 0 1 0 0 0 0 1
1301011002 2011    0 0 1 0 1 0 0 0 0 1
1301011002 2014    0 0 1 0 1 1 1 0 0 1
1301011002 2018  873 1 1 1 1 1 0 0 0 1
1301021003 2008  902 0 1 0 1 0 0 0 0 1
1301021003 2011  930 0 1 0 1 0 0 0 1 1
1301021003 2014   30 0 1 0 1 0 0 0 0 1
1301021003 2018  520 1 1 1 1 0 0 0 1 1
1301022001 2008 1001 0 1 0 1 0 0 0 0 1
1301022001 2011 1000 0 1 0 1 0 0 0 0 1
1301022001 2014  861 0 1 0 1 0 0 0 0 1
1301022001 2018  680 1 1 1 1 1 0 0 1 1
1406011001 2006  582 0 1 0 1 1 0 0 0 1
1406011001 2008  535 0 1 0 1 1 0 0 0 1
1406011001 2011  362 0 1 0 1 1 0 0 0 1
1406011001 2014  351 0 1 0 1 1 0 0 0 1
1406011001 2018   90 1 1 1 1 1 0 0 0 1
1406011018 2008   56 0 1 0 1 0 0 0 0 0
1406011018 2011  303 0 1 0 1 0 0 0 1 1
1406011018 2014  375 0 1 0 1 0 0 0 0 1
1406011018 2018   25 1 1 1 1 0 0 0 0 1
1410051004 2011  130 0 1 0 1 0 0 1 1 1
1410051004 2014   80 1 1 1 1 1 0 0 1 1
1410051004 2018   13 1 1 1 1 1 1 0 1 1
1503030016 2006  209 0 0 0 1 0 1 1 1 1
1503030016 2008   89 0 0 0 1 0 0 1 1 1
1503030016 2011  120 0 0 0 1 0 1 0 1 1
1503030016 2014  320 0 0 0 1 0 1 0 1 1
1503030016 2018  659 0 0 0 1 0 1 0 1 1
1507032003 2011    0 0 1 0 1 1 1 0 1 1
1507032003 2014  142 0 1 0 1 1 1 0 1 1
1507032003 2018  121 1 0 0 1 1 1 0 1 1
1705044005 2006  286 0 1 0 1 0 1 0 1 1
1705044005 2008 1482 0 1 0 1 0 1 1 1 1
1705044005 2011  800 0 1 0 1 0 0 0 1 1
1705044005 2014  811 0 0 0 1 0 0 0 1 1
1705044005 2018  460 0 0 0 1 0 1 1 0 1
1706050018 2008    0 0 0 0 1 1 1 0 1 1
1706050018 2011  217 0 0 0 1 1 1 0 1 1
1706050018 2014  381 0 0 0 1 1 1 0 1 1
1706050018 2018   47 0 0 0 1 1 1 0 1 1
1803111001 2006  107 0 1 0 1 0 1 1 1 1
1803111001 2008  686 0 0 0 1 1 1 1 0 1
1803111001 2011    0 0 0 0 1 1 1 1 1 1
1803111001 2014  604 0 0 0 1 1 1 1 1 1
1803111001 2018  326 1 0 0 1 1 1 0 1 1
1806033005 2008    0 0 1 0 1 0 1 1 1 1
1806033005 2011  800 0 1 0 1 0 0 1 1 1
1806033005 2014  480 1 1 1 1 0 0 0 1 1
1806033005 2018  417 1 0 0 1 0 0 0 1 1
2104020009 2006  152 0 0 0 1 0 1 0 0 1
3203040001 2006 1070 0 0 0 1 1 1 0 1 1
3203040001 2008 2647 0 0 0 1 1 1 1 1 1
3203040001 2011    0 0 0 0 1 1 0 0 1 1
3203040001 2014 2761 1 0 0 1 1 1 0 1 1
3203040001 2018 2770 1 0 0 1 1 1 0 1 1
3304180007 2006  912 0 0 0 1 1 1 1 1 1
3304180007 2008 2268 0 0 0 1 1 1 1 1 1
3304180007 2011 2231 0 0 0 1 1 1 1 1 1
3304180007 2014 2553 1 0 0 1 1 1 0 1 1
3304180007 2018  388 1 0 0 1 1 1 1 1 1
3502200005 2006 1837 0 0 0 1 1 0 1 1 1
3502200005 2008 1032 0 0 0 1 1 1 1 1 1
3502200005 2011 1215 0 0 0 1 1 1 1 1 1
3502200005 2014  901 0 0 0 1 1 1 1 1 1
3502200005 2018 2505 0 0 0 1 1 1 1 1 1
5201010003 2006 7929 0 1 0 1 0 0 1 1 1
5201010003 2008 6858 0 1 0 1 1 0 1 1 1
5201010003 2011 2841 0 1 0 1 1 1 1 1 1
5201010003 2014 5400 1 0 0 1 1 1 0 1 1
5201010003 2018 2731 1 0 0 1 1 1 1 1 1
5203011006 2011  350 0 0 0 1 1 1 0 1 1
5203011006 2014  300 1 0 0 1 1 1 0 1 1
5203011006 2018 2357 1 0 0 1 1 0 0 1 1
5203040012 2011  768 0 1 0 1 1 1 1 1 1
5203040012 2014 1096 0 0 0 1 1 1 1 1 1
5203040012 2018 1356 0 0 0 1 1 1 1 1 1
5307013003 2008  421 0 1 0 1 0 1 0 0 1
5307013003 2011    0 0 1 0 1 0 0 0 0 1
5307013003 2014  310 0 1 0 1 1 1 0 1 1
5307013003 2018  200 1 1 1 1 1 0 0 1 1
5307014002 2008    0 0 1 0 1 1 0 0 0 1
5307014002 2011    0 0 1 0 1 1 1 0 1 1
5307014002 2014  625 0 0 0 1 1 0 0 0 1
5307014002 2018  750 1 1 1 1 1 1 0 1 1
5311062013 2006  298 0 1 0 1 0 1 0 0 1
5311062013 2008    0 0 1 0 1 1 1 0 1 1
5311062013 2011   30 0 1 0 1 0 1 0 1 1
5311062013 2014  213 0 0 0 1 0 1 0 1 1
5311062013 2018   17 0 0 0 1 0 1 0 1 0
5320030003 2011    0 0 1 0 1 0 1 0 1 1
5320030003 2014  785 0 1 0 1 1 1 0 0 1
5320030003 2018    0 0 1 0 1 1 0 0 0 1
6105120005 2008 1487 0 0 0 1 1 1 0 1 1
6105120005 2011  281 0 0 0 1 1 1 1 1 1
6105120005 2014    0 0 0 0 1 1 1 0 1 1
6105120005 2018  624 1 0 0 1 1 1 0 1 1
6108200005 2006    5 0 0 0 1 0 0 0 0 1
6108200005 2008    0 0 0 0 1 0 0 0 1 1
6108200005 2011  230 0 0 0 1 0 0 0 1 1
6108200005 2014   38 0 0 0 1 1 0 0 1 1
6108200005 2018  145 0 0 0 1 1 1 0 1 1
6207013004 2008    0 0 1 0 1 1 0 0 0 1
6207013004 2011  107 0 1 0 1 1 0 0 1 1
6207013004 2014   25 0 1 0 1 1 0 0 1 1
6207013004 2018  128 1 1 1 1 1 0 0 1 1
6209090012 2006  165 0 1 0 1 1 1 0 0 1
6209090012 2008    0 0 1 0 0 0 1 0 0 1
6209090012 2011    9 0 1 0 0 0 0 0 0 1
6209090012 2014    0 0 1 0 0 0 0 0 0 1
6209090012 2018   15 1 1 1 0 0 0 0 0 1
6301080008 2006  201 0 0 0 1 0 0 1 1 1
6301080008 2008  321 0 0 0 1 0 1 1 1 1
6301080008 2011   34 0 0 0 1 0 1 1 1 1
6301080008 2014  320 0 0 0 1 0 1 0 1 1
6301080008 2018  317 0 0 0 1 0 1 0 1 1
6405041004 2006  422 0 1 0 1 0 0 0 1 1
6405041004 2008   60 0 1 0 1 0 0 0 1 1
6405041004 2011  321 0 1 0 1 0 0 0 1 1
6405041004 2014  312 1 1 1 1 0 0 0 1 1
6405041004 2018  713 1 1 1 1 0 1 0 1 1
6502090016 2014   61 0 0 0 0 0 0 0 1 1
6502090016 2018   56 0 0 0 0 0 0 0 1 1
7206020002 2006    0 0 0 0 1 1 1 0 0 1
7206020002 2008 1358 0 0 0 1 1 1 0 1 1
7206020002 2011 2302 0 0 0 1 1 1 0 1 1
7206020002 2014 1966 1 0 0 1 1 0 0 1 1
7206020002 2018  352 1 0 0 1 1 0 0 1 1
7301030011 2006  240 0 1 0 1 0 1 0 0 1
7301030011 2008  477 0 1 0 1 0 1 0 1 1
7301030011 2011  293 0 1 0 1 0 1 0 1 1
7301030011 2014  187 0 1 0 1 0 0 0 1 1
7301030011 2018 1018 0 0 0 1 0 1 0 1 1
7309010005 2006  153 0 0 0 1 0 1 0 0 1
7309010005 2008    0 0 1 0 1 0 0 0 0 1
7309010005 2011 1017 0 1 0 1 0 0 0 0 1
7309010005 2014  997 0 1 0 1 1 0 0 0 1
7309010005 2018 1150 1 1 1 1 1 1 0 0 1
7309031005 2011  198 0 1 0 1 1 0 1 1 1
7309031005 2014  618 0 1 0 1 1 0 1 1 1
7309031005 2018    0 1 1 1 1 1 0 1 1 1
7315070016 2006  418 0 0 0 1 0 1 1 1 1
7315070016 2008  479 0 0 0 1 0 1 1 1 1
7315070016 2011  291 0 0 0 1 0 1 0 1 1
7315070016 2014 1122 1 0 0 1 0 1 0 1 1
7315070016 2018  332 1 0 0 1 0 1 0 1 1
7325040005 2006  818 0 1 0 1 0 1 0 0 1
7325040005 2008  696 0 1 0 1 0 1 0 1 1
7325040005 2011 1173 0 1 0 1 1 1 0 1 1
7325040005 2014 1855 0 0 0 1 1 1 0 1 1
7325040005 2018 1116 0 0 0 1 1 1 0 1 1
7325070011 2006  115 0 1 0 1 0 0 0 1 1
7406043009 2008    0 0 1 0 1 0 0 0 0 1
7406043009 2011    0 0 1 0 1 0 1 0 1 1
7406043009 2014   83 1 1 1 1 0 1 0 1 1
7406043009 2018  463 1 1 1 1 0 1 0 1 1
7503051004 2008  845 0 1 0 1 1 1 0 1 1
7503051004 2011  920 0 1 0 1 1 0 0 1 1
7503051004 2014  683 1 0 0 1 1 1 0 1 1
7503051004 2018  676 1 0 0 1 1 0 0 1 1
7504032012 2011  154 0 0 0 1 0 0 0 1 1
7504032012 2014  357 0 0 0 0 0 1 0 1 1
7504032012 2018  419 0 0 0 0 0 1 0 1 1
7504040003 2006  780 0 0 0 1 0 0 0 0 1
7504040003 2008    0 0 0 0 1 1 0 0 1 1
7504040003 2011   73 0 0 0 1 1 1 0 1 1
7504040003 2014  448 0 0 0 1 1 1 0 1 1
7504040003 2018  287 0 0 0 1 1 0 0 1 1
7601041005 2008 2560 0 0 0 1 1 0 1 0 1
7601041005 2011  318 0 1 0 1 1 0 0 1 1
7601041005 2014  611 1 1 1 1 1 0 0 1 1
7601041005 2018 1200 1 0 0 1 1 0 0 0 1
7603090007 2008    0 0 1 0 1 0 1 0 0 1
7603090007 2011  150 0 1 0 1 0 0 0 0 1
7603090007 2014  250 0 0 0 1 1 0 0 1 1
7603090007 2018  117 0 0 0 1 1 1 0 0 0
7603090015 2008    0 0 1 0 1 0 0 0 0 1
7603090015 2011  250 0 1 0 1 0 0 0 0 1
7603090015 2014  192 1 1 1 1 0 0 0 0 1
7603090015 2018  178 1 0 0 1 1 0 0 0 1
7604040003 2008    0 0 1 0 1 0 1 0 0 1
7604040003 2011  183 0 1 0 1 0 1 0 0 1
7604040003 2014  238 0 1 0 1 0 1 0 0 1
7604040003 2018  302 0 1 0 1 1 1 0 0 1
7604040007 2008  248 0 1 0 1 1 1 0 0 0
7604040007 2011  154 0 1 0 1 1 1 0 0 1
7604040007 2014  173 0 1 0 1 1 1 0 0 1
7604040007 2018  766 0 1 0 1 1 1 0 0 1
9102021008 2008    0 0 1 0 1 0 0 0 0 0
9102021008 2011    0 0 1 0 1 0 0 0 0 1
9102021008 2014  159 1 0 0 1 0 0 0 0 1
9102021008 2018   40 1 1 1 1 0 0 0 0 0
9110040010 2011   15 0 1 0 1 0 0 0 1 0
9110040010 2014   58 0 1 0 1 0 0 0 0 0
9110040010 2018  205 1 1 1 1 0 0 0 0 0
9403151002 2008    0 0 1 0 1 0 0 0 0 1
9403151002 2011    0 0 1 0 1 0 0 0 0 1
9403151002 2014    0 0 1 0 1 0 0 0 1 1
9403151002 2018  100 1 1 1 1 0 0 0 1 1
9418025007 2008    0 0 1 0 0 0 0 0 0 0
9418025007 2011    0 0 1 0 0 0 0 0 1 0
9418025007 2014    0 0 1 0 0 0 0 0 0 0
9418025007 2018  200 0 1 0 0 0 0 0 1 0
9428032004 2014  200 1 1 1 1 0 0 0 0 1
9428032004 2018   80 1 1 1 1 0 0 0 0 1
9432030033 2014    0 0 1 0 0 0 1 0 0 0
9432030033 2018   78 0 1 0 0 0 0 0 0 0
9432040034 2014    5 0 1 0 0 0 0 0 1 0
9432040034 2018   86 0 1 0 0 0 0 0 1 0
end

Last edited by Hanni Wirawan; 18 Aug 2019, 08:03.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29964

18 Aug 2019, 11:37

Great, thanks. So now we have data representing the full factorial design, and enough of it to get a sense of what is going on.

Code:

. xtset id_desa year
       panel variable:  id_desa (unbalanced)
        time variable:  year, 2006 to 2018, but with gaps
                delta:  1 unit

. xtreg Pen_Bantuan d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes i.year, fe cluster ( id_desa)

Fixed-effects (within) regression               Number of obs     =        238
Group variable: id_desa                         Number of groups  =         60

R-sq:                                           Obs per group:
     within  = 0.0451                                         min =          1
     between = 0.0327                                         avg =        4.0
     overall = 0.0002                                         max =          5

                                                F(13,59)          =       2.09
corr(u_i, Xb)  = -0.1701                        Prob > F          =     0.0279

                               (Std. Err. adjusted for 60 clusters in id_desa)
------------------------------------------------------------------------------
             |               Robust
 Pen_Bantuan |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  d_year_ebt |  -107.0792   273.2612    -0.39   0.697    -653.8737    439.7152
       d_PLN |  -1.049357   179.7881    -0.01   0.995    -360.8046    358.7059
          DD |   72.88529    271.372     0.27   0.789    -470.1289    615.8995
          sd |   72.75433    150.776     0.48   0.631    -228.9478    374.4565
         smp |  -107.8575   197.9669    -0.54   0.588    -503.9883    288.2733
    industri |  -158.2476   191.4262    -0.83   0.412    -541.2906    224.7954
    sinyalTV |   62.81325   230.1561     0.27   0.786    -397.7281    523.3546
    sinyalHP |   52.42588   135.4796     0.39   0.700    -218.6681    323.5199
   infra_kes |   133.3408   86.56163     1.54   0.129    -39.86864    306.5502
             |
        year |
       2008  |   109.8893   148.8871     0.74   0.463    -188.0332    407.8117
       2011  |  -107.5469    255.105    -0.42   0.675    -618.0109     402.917
       2014  |   150.1515   171.0694     0.88   0.384    -192.1575    492.4606
       2018  |   129.6446   157.3657     0.82   0.413    -185.2434    444.5325
             |
       _cons |   376.4368   281.5392     1.34   0.186    -186.9219    939.7955
-------------+----------------------------------------------------------------
     sigma_u |   769.9259
     sigma_e |  559.67823
         rho |  .65427057   (fraction of variance due to u_i)
------------------------------------------------------------------------------

replicates the finding of high standard errors. Note that every single variable is affected. This alone makes it likely that the problem is simply a noisy outcome accompanied by a model that explains very little. The low values for the various R² statistics confirm that the model has little explanatory value. The high value of sigma_e confirms that the results are noisy. So this is all consistent with that cause of the problem.

The other thing to rule out is that a difficult situation is being exacerbated by multicolinearity. To get a quantitative estimate of that, we can re-run the analysis using -regress- followed by the -estat vif- command:

Code:

. regress Pen_Bantuan d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes i.year

      Source |       SS           df       MS      Number of obs   =       238
-------------+----------------------------------   F(13, 224)      =      5.18
       Model |  48753174.8        13  3750244.22   Prob > F        =    0.0000
    Residual |   162046422       224  723421.528   R-squared       =    0.2313
-------------+----------------------------------   Adj R-squared   =    0.1867
       Total |   210799597       237  889449.777   Root MSE        =    850.54

------------------------------------------------------------------------------
 Pen_Bantuan |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  d_year_ebt |   770.3769   224.8511     3.43   0.001      327.283    1213.471
       d_PLN |    221.471   144.3544     1.53   0.126    -62.99537    505.9374
          DD |  -893.0114   278.4489    -3.21   0.002    -1441.726   -344.2969
          sd |    72.2963   208.7826     0.35   0.729     -339.133    483.7256
         smp |   388.6104   124.9349     3.11   0.002     142.4123    634.8085
    industri |  -68.83346   120.8586    -0.57   0.570    -306.9987    169.3318
    sinyalTV |   684.2749   160.4062     4.27   0.000     368.1766    1000.373
    sinyalHP |   235.2733   134.7995     1.75   0.082    -30.36415    500.9107
   infra_kes |   230.3979   199.7551     1.15   0.250    -163.2416    624.0374
             |
        year |
       2008  |  -49.86455    210.942    -0.24   0.813    -465.5492    365.8201
       2011  |  -315.9277   216.0218    -1.46   0.145    -741.6227    109.7674
       2014  |  -132.5651   218.4145    -0.61   0.545     -562.975    297.8448
       2018  |  -272.9776   237.6612    -1.15   0.252    -741.3154    195.3602
             |
       _cons |  -155.0511   288.3389    -0.54   0.591    -723.2549    413.1527
------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
  d_year_ebt |      2.76    0.362285
       d_PLN |      1.70    0.588493
          DD |      2.57    0.389790
          sd |      1.30    0.769050
         smp |      1.25    0.797184
    industri |      1.20    0.832905
    sinyalTV |      1.28    0.783919
    sinyalHP |      1.35    0.740718
   infra_kes |      1.28    0.782823
        year |
       2008  |      2.24    0.445525
       2011  |      2.62    0.381467
       2014  |      2.86    0.349826
       2018  |      3.38    0.295460
-------------+----------------------
    Mean VIF |      1.98

None of these VIF's is remotely near high enough to implicate multicolinearity as a cause of your high standard errors.

So my original hunch turns out to be correct: you have a noisy outcome variable that is not very well predicted by your model. To get a substantially more precise estimate of the DD effect, you will need better data, a better model, or both.

Comment

Hanni Wirawan

Join Date: May 2019

Posts: 12
#10

18 Aug 2019, 19:51

thank you so much Prof. Clyde for the superb advice. it helps me a lot.
Comment
Hanni Wirawan

Join Date: May 2019

Posts: 12
#11

19 Aug 2019, 10:16

Originally posted by Clyde Schechter View Post

So my original hunch turns out to be correct: you have a noisy outcome variable that is not very well predicted by your model. To get a substantially more precise estimate of the DD effect, you will need better data, a better model, or both.

Prof. Clyde, can i use propensity score matching with DID in this data?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#12

20 Aug 2019, 05:46

Yes, in principle, you can. But I don't recommend it, and I doubt it will be helpful. Here are the reasons I don't recommend it:

1. I think propensity matching in panel data is a dubious practice. Each entity has multiple observations, and the variables are changing over time. Which time do you select to do the matching? Any choice is arbitrary. And is a matching based on one selected time really an adequate match? If you try to match on "all times" then you risk being unable to find any usable matches at all, or, at any rate, too few for a meaningful analysis.

2. The purpose of matching is really to turn a between-subjects analysis into a within subjects analysis and thereby reduce variance. So that sounds like it's just what you need: reduced variance. Except that the analysis you have is already a within subjects analysis. It is unlikely that propensity score matching will do a better job at variance reduction. The analysis you have already extracts the between-subjects variance completely into sigma_u. In fact, you can think of this kind of within-subjects analysis as a matched-pairs analysis in which each entity is matched with itself: there is no more effective way to match!

3. If you use propensity matching (or any other kind of matching) here, you turn your two-level data structure into a three-level structure. You now have observations repeated over time nested in id_desa, which, in turn, is nested within matched pairs. To get correct standard errors you must now move from the -xt- commands to the mixed model commands. You would need to use -mixed- for your analysis. Now, from my perspective, that is not a problem. But in some fields, random effects models are viewed with extreme skepticism. So you'll have to see if that's acceptable in your discipline and community. I have seen people try to get around this by sticking with -xtreg- and changing the group variable to the matched pair and then use a cluster-robust variance estimate clustered on id_desa. I suppose that is better than nothing, but it's not really reflecting the data structure correctly and, personally, I don't put much credence in that approach.
Comment

Hanni Wirawan

Join Date: May 2019
Posts: 12

#13

20 Aug 2019, 22:32

Originally posted by Clyde Schechter View Post

Yes, in principle, you can. But I don't recommend it, and I doubt it will be helpful.

ok, thank you for the insight Prof. Clyde. i will try to learn about -mixed- analysis. but before that, with the same data i am already doing -xtpoisson- with my data, and the result :

Code:

.  xtpoisson Pen_Bantuan d_year_ebt d_PLN DD sd smp industri sinyalTV sinyalHP infra_kes i.year, fe
note: 590 groups (590 obs) dropped because of only one obs per group
note: 115 groups (260 obs) dropped because of all zero outcomes

Iteration 0:   log likelihood = -2382642.4  
Iteration 1:   log likelihood = -2221873.8  
Iteration 2:   log likelihood = -2221598.8  
Iteration 3:   log likelihood = -2221598.8  

Conditional fixed-effects Poisson regression    Number of obs     =     17,212
Group variable: id_desa                         Number of groups  =      4,500

                                                Obs per group:
                                                              min =          2
                                                              avg =        3.8
                                                              max =          5

                                                Wald chi2(13)     =  318776.09
Log likelihood  = -2221598.8                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
 Pen_Bantuan |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  d_year_ebt |   .1802498   .0027819    64.79   0.000     .1747973    .1857023
       d_PLN |   .0883472   .0019914    44.36   0.000     .0844441    .0922502
          DD |  -.1314883   .0042152   -31.19   0.000    -.1397499   -.1232266
          sd |  -.0084518    .004514    -1.87   0.061    -.0172991    .0003954
         smp |    .145249   .0017318    83.87   0.000     .1418547    .1486432
    industri |   .0669007   .0011969    55.89   0.000     .0645548    .0692467
    sinyalTV |  -.0118641   .0014632    -8.11   0.000    -.0147319   -.0089963
    sinyalHP |   .0298319    .001676    17.80   0.000      .026547    .0331168
   infra_kes |    .018724   .0032565     5.75   0.000     .0123413    .0251066
             |
        year |
       2008  |  -.1503171   .0015821   -95.01   0.000    -.1534179   -.1472162
       2011  |   -.137418   .0015987   -85.95   0.000    -.1405515   -.1342845
       2014  |   .2641626    .001586   166.56   0.000     .2610541     .267271
       2018  |   .3312602   .0016595   199.62   0.000     .3280077    .3345127
------------------------------------------------------------------------------

the reason i am doing -xtpoisson- is because my dependent variable ( Pen_Bantuan ) is count data, the number of people who receive social security from government, is this reason acceptable?

Code:

 tabstat Pen_Bantuan , stat(N mean sd var semean)

    variable |         N      mean        sd  variance  se(mean)
-------------+--------------------------------------------------
 Pen_Bantuan |     18062  388.8556  688.2208  473647.9  5.120883
----------------------------------------------------------------

but in -poisson- i have read that dependent variable mean must be same with it's variance? so i can't use -poisson- because over-dispersion? so i must chose -xtnbreg-? is this correct?

Last edited by Hanni Wirawan; 20 Aug 2019, 23:02. Reason: confused

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#14

21 Aug 2019, 06:42

The Poisson model is one of several that are available for count data. It has a number of advantages over others, including a relative robustness to violations of its assumptions.

The crude calculation of variance of the outcome and comparing it to the mean is not a relevant way to test for suitability of the Poisson model. It is the variance of the observed values around the predicted value, at each level of predicted value, that matters. There is no simple way to test for that, especially in panel data. If you do not need predicted values of observations, you need not really worry about the overdispersion issue as the Poisson model produces unbiased estimates in any event. If you do need predicted values of observations, there being no simple statistic that can guide the model selection process here, I would suggest you run both Poisson and negative binomial models and then look at scatterplots of predicted vs observed values of the outcome, and then chose the model that appears to have better fit.
Comment
Hanni Wirawan

Join Date: May 2019

Posts: 12
#15

21 Aug 2019, 21:13

Originally posted by Clyde Schechter View Post

The crude calculation of variance of the outcome and comparing it to the mean is not a relevant way to test for suitability of the Poisson model.
It is the variance of the observed values around the predicted value, at each level of predicted value, that matters. There is no simple way to test for that, especially in panel data.
If you do not need predicted values of observations, you need not really worry about the overdispersion issue as the Poisson model produces unbiased estimates in any event.

okay thank you Prof. Clyde, is there any literature reference for your statement, so i can cited it in my research.? and does this also apply to the -xtreg- model, when I don't need a predictive value from my model, so the standard error problem can be ignored. because the focus of my research is only to do an impact evaluation of a policy, not to find a predictive value. thanks Prof. Clyde
Comment

Announcement

I have high standar error in my DID panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment