Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Most explanatory variables turn from significant in FE to insignificant in PPML.

    Dear Statalist,

    I am a final year undergraduate student working on dissertation titled 'The Effect of Epidemic on International Tourism Flows: The Role of Public Healthcare Spending'. This study comprises information on bilateral tourist arrivals from 191 origin countries to 180 destination countries, forming 15,276 pairs of the countries from 1995 to 2015. The unbalanced panel dataset encompasses 206,171 observations after excluding the missing values. I am interested to find the moderating effect of the pubic healthcare spending on the relationship between international tourism flows and past epidemic outbreaks. With that said, my Y=lfow, X=epidemic_d and interaction term=Below is the description for the variables in the study:

    Code:
      1. lflow                    logarithmic of bilateral tourists arrival between origin and destination
      2. flow                     bilateral tourists arrival between origin and destination
      2. lgdp_o                 logarithmic of GDP per capita at origin (normalised by 10000)
      3. lgdp_d                 logarithmic of GDP per capita at destination (normalised by 10000)
      4. ldistw                   logarithmic of distance between origin and destination
      5. lpop_o                   logarithmic of population in origin country
      6. lpop_d                   logarithmic of population in destination country
      7. lRP_od                   logarithmic of relative price between origin and destination country
     8. epidemic_d               Share of population affected by epidemic in destination country
     9. epidemic_lagged_d        Share of population affected by epidemic in destination country (one year lagged)
     10. healthgdp_d              Public healthcare expenditure (% of GDP)
     11. healthgdp_lagged_d       Public healthcare expenditure (% of GDP) (one year lagged)
     12. epihgdp                  Interaction term between epidemic_d and healthgdp_d
     13. epihgdp_lagged_d        Interaction term between epidemic_lagged_d and healthgdp_lagged_d

    The regression methods I am going to use are FE and PPML.

    In FE estimation, I estimated for three specifications. The code are as follows:
    Code:
    eststo:xi:xtreg lflow epidemic_d lgdp_o lgdp_d lpop_o lpop_d lRP_od   i.year , fe robust
    Code:
    eststo:xi:xtreg lflow epidemic_d healthgdp_d lgdp_o lgdp_d lpop_o lpop_d lRP_od   i.year , fe robust
    Code:
    eststo:xi:xtreg lflow epidemic_d healthgdp_d epihgdp  lgdp_o lgdp_d lpop_o lpop_d lRP_od   i.year , fe robust
    The direction and significance in FE model looks fine to me. The regression result for the last specification (with epihgdp) is as follows:
    Code:
    Fixed-effects (within) regression               Number of obs     =    152,289
    Group variable: pairid                          Number of groups  =     13,283
    
    R-squared:                                      Obs per group:
         Within  = 0.2618                                         min =          1
         Between = 0.3826                                         avg =       11.5
         Overall = 0.3539                                         max =         16
    
                                                    F(23,13282)       =     418.61
    corr(u_i, Xb) = 0.3785                          Prob > F          =     0.0000
    
                                (Std. err. adjusted for 13,283 clusters in pairid)
    ------------------------------------------------------------------------------
                 |               Robust
           lflow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      epidemic_d |  -20.10204   6.929843    -2.90   0.004    -33.68552   -6.518561
     healthgdp_d |  -.0513586   .0044858   -11.45   0.000    -.0601514   -.0425659
         epihgdp |   3.198859   1.375551     2.33   0.020     .5025833    5.895135
          lgdp_o |   .3809364   .0215977    17.64   0.000     .3386018     .423271
          lgdp_d |   .4053506   .0199294    20.34   0.000     .3662861    .4444152
          lpop_o |   .1450008    .072187     2.01   0.045     .0035041    .2864976
          lpop_d |   .1948498   .0729338     2.67   0.008     .0518891    .3378106
          lRP_od |   .0843827   .0126933     6.65   0.000      .059502    .1092633
     _Iyear_1996 |          0  (omitted)
     _Iyear_1997 |          0  (omitted)
     _Iyear_1998 |          0  (omitted)
     _Iyear_1999 |          0  (omitted)
     _Iyear_2000 |  -.3337469   .0305551   -10.92   0.000    -.3936393   -.2738544
     _Iyear_2001 |  -.3341663   .0294902   -11.33   0.000    -.3919713   -.2763612
     _Iyear_2002 |  -.3272019   .0280088   -11.68   0.000    -.3821031   -.2723007
     _Iyear_2003 |  -.3627477   .0250352   -14.49   0.000    -.4118203   -.3136752
     _Iyear_2004 |  -.3262934   .0222754   -14.65   0.000    -.3699562   -.2826305
     _Iyear_2005 |  -.3073352   .0198753   -15.46   0.000    -.3462935   -.2683768
     _Iyear_2006 |  -.2940875   .0174447   -16.86   0.000    -.3282816   -.2598934
     _Iyear_2007 |  -.3162966   .0154593   -20.46   0.000     -.346599   -.2859942
     _Iyear_2008 |   -.337098   .0135715   -24.84   0.000    -.3637001   -.3104958
     _Iyear_2009 |  -.2683101    .012169   -22.05   0.000     -.292163   -.2444572
     _Iyear_2010 |  -.2578123   .0108087   -23.85   0.000    -.2789988   -.2366258
     _Iyear_2011 |  -.2565084   .0099581   -25.76   0.000    -.2760276   -.2369891
     _Iyear_2012 |   -.189837   .0086714   -21.89   0.000    -.2068342   -.1728397
     _Iyear_2013 |  -.1696128    .007937   -21.37   0.000    -.1851704   -.1540552
     _Iyear_2014 |  -.1530084   .0067724   -22.59   0.000    -.1662833   -.1397336
     _Iyear_2015 |          0  (omitted)
           _cons |  -.1052812   .3171258    -0.33   0.740     -.726893    .5163306
    -------------+----------------------------------------------------------------
         sigma_u |  2.8881156
         sigma_e |   .6123985
             rho |  .95697322   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    The problem, however, is with the PPML estimation.
    I repeated the three specifications above with PPML as a robustness check to FE. Majority of variables become insignificant. The code is as follows:
    Code:
    eststo:xi:ppmlhdfe flow epidemic_d lgdp_o lgdp_d lpop_o lpop_d lRP_od, a(year pairid) nolog
    Code:
    eststo:xi:ppmlhdfe flow epidemic_d healthgdp_d lgdp_o lgdp_d lpop_o lpop_d lRP_od, a(year pairid ) nolog
    Code:
    eststo:xi:ppmlhdfe flow epidemic_d healthgdp_d epihgdp lgdp_o lgdp_d lpop_o lpop_d lRP_od , a(year pairid) nolog
    The result is as follows:
    Code:
    HDFE PPML regression                              No. of obs      =    208,926
    Absorbing 2 HDFE groups                           Residual df     =    195,620
                                                      Wald chi2(8)    =     202.95
    Deviance             =  7.77752e+17               Prob > chi2     =     0.0000
    Log pseudolikelihood = -3.88876e+17               Pseudo R2       =     0.9982
    ------------------------------------------------------------------------------
                 |               Robust
            flow | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      epidemic_d |  -194.7957    659.987    -0.30   0.768    -1488.347    1098.755
     healthgdp_d |  -.3241675   .0399636    -8.11   0.000    -.4024947   -.2458403
         epihgdp |   2.260448    108.162     0.02   0.983    -209.7333    214.2542
          lgdp_o |   .2913518   .1665717     1.75   0.080    -.0351228    .6178263
          lgdp_d |   .0382627   .1125117     0.34   0.734    -.1822561    .2587816
          lpop_o |   5.494733   .7255132     7.57   0.000     4.072754    6.916713
          lpop_d |  -1.594065   .7285146    -2.19   0.029    -3.021927   -.1662026
          lRP_od |   .3781714   .1781541     2.12   0.034     .0289959     .727347
           _cons |   35.84048   3.988092     8.99   0.000     28.02396    43.65699
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
            year |        16           0          16     |
          pairid |     13283           1       13282     |
    -----------------------------------------------------+

    My questions are:
    1. I know the level of significance cannot be used to judge whether the regression is a 'good' or a 'bad' one. Instead, it reveals some information to the researchers. Viewing my case, is this possibly caused by my mistakes or its the regression trying to tell me something? What might be the reason behind?

    2. I understand PPML is efficient in solving sample selection bias caused by zero observations. Indeed, the bilateral tourism data in this study has large number of missing data. I replaced the missing data with 0 using the following command:
    Code:
    replace flow=0 if flow==.
    So, does the insignificance in PPML indicates the sensitivity to zero observations?

    3. If yes, what test/verification should I conduct next to justify this condition? Any recommendation on articles for me to refer?

    4. If no, what should I do next? I have already checked my data and it appears to be correct.


    Thank you everyone for your input!

    Best regards,
    Jacyln Hu.
    Last edited by Jacyln Hu; 05 Apr 2022, 23:17.
Working...
X