How does PPML excludes zero-values observations?

Bruno Moreno

Join Date: Oct 2019

Posts: 9
#1

How does PPML excludes zero-values observations?

06 Nov 2019, 09:51

Dear Statalist,

I am using PPML to build a panel data model with a zero-inflated dependent variable. I am also comparing to other models. The following table are the results for one of the regressors.

Code:

OLS Two-Ways FE RE PPML ---------------------------------------------------------------------------------------------------------------------------------------------------- Accum. Var[t-1] 2.146*** 2.257*** 2.201*** 0.000 (0.38) (0.27) (0.51) (0.00) ---------------------------------------------------------------------------------------------------------------------------------------------------- N 24778 24776 24778 6890 ---------------------------------------------------------------------------------------------------------------------------------------------------- * p<0.05, ** p<0.01, *** p<0.001

This is an one-lagged accumulated variable, so that the chronological fact matters. We can observe that all models have estimated an expected coefficient, except PPML. We can see also that PPML has excluded many observations if we compare the N, because of the zero-inflated DV. What I would like to know is how are the requirements that PPML excludes observations? For exemple, can PPML exclude zeros in the middle of a individuals's time serie breaking the chronological meaning of the variable? If yes, is there a way of avoiding that in STATA specifying to exclude just entire individuals if necessary? I think this is the cause for the zero coefficient result of PPML.
I am using

Code:

ppmlhdfe

function for information.
Maybe Prof. Joao Santos Silva can give me a hint.

Thanks in advance.

Last edited by Bruno Moreno; 06 Nov 2019, 10:13.
Tags: fixed effects, panel data, PPML, zero-inflated
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#2

06 Nov 2019, 10:12

Hi Bruno:

With ppmlhdfe, you can set "sep(none)" as an option if you do not want to exclude perfectly predicted zeroes. However, econometrically speaking, this should not affect the estimation of your RHS parameters either way.

What I'm curious about is why there would be so many fewer observations using PPML versus a linear model. Typically with a linear model vs. a PPML model, the LHS variable for the linear model would be the log of the PPML LHS variable. So one would expect to see more observations included in the PPML model than the OLS model rather than vice versa, especially since you mention your dependent variable has many zeroes. I'm also concerned about the functional form of your dynamic model in the PPML case. Is Accum. Var[t-1] logged? Perhaps if you were to include some of the code you used that might help?

Regards,
Tom
1 like
Comment

Bruno Moreno

Join Date: Oct 2019
Posts: 9

06 Nov 2019, 10:43

Hi Tom Zylkin ,

Thank you for your answer.

No, Accum. Var[t-1] is not logged because it has zero values. In fact, this variable is my accumulated dependent variable lagged one time. This is a sort of imitation variable in technology adoption field. It means that, adopters from the present will imitate the adopters from the past.

The code that I used was the following. Nonetheless, I can not show the name of the variables because it is confidential.

Code:

* -------------Running models----------------
* Pooled OLS
reg ylist xlist acc_varlag1,vce(robust)
estimates store m1, title(OLS)
* Two-ways fixed effects
reghdfe ylist xlist acc_varlag1, absorb(year code) vce(robust)
estimates store m2, title(Two-Ways FE)
* Random effects
xtreg ylist xlist acc_varlag1,re vce(robust)
estimates store m3, title(RE)
* PPML
ppmlhdfe ylist xlist acc_varlag1, absorb(year code) vce(robust)
estimates store m4, title(PPML)

You are right, if I set "sep(none)", it keeps all the observations and it did not affects that parameters:

Code:

 ppmlhdfeylist xlist acc_varlag1, absorb(year code) vce(robust) sep(none)
(dropped 2 singleton observations)
note: 2 variables omitted because of collinearity: x1 x2
Iteration 1:   deviance = 7.178e+04                  itol = 1.0e-04  subiters = 5   min(eta) =  -9.81  [p  ]
Iteration 2:   deviance = 3.772e+04  eps = 9.03e-01  itol = 1.0e-04  subiters = 3   min(eta) = -12.00  [   ]
Iteration 3:   deviance = 2.683e+04  eps = 4.06e-01  itol = 1.0e-04  subiters = 3   min(eta) = -12.14  [   ]
Iteration 4:   deviance = 2.329e+04  eps = 1.52e-01  itol = 1.0e-04  subiters = 3   min(eta) = -10.63  [   ]
Iteration 5:   deviance = 2.227e+04  eps = 4.61e-02  itol = 1.0e-04  subiters = 3   min(eta) = -12.28  [   ]
Iteration 6:   deviance = 2.198e+04  eps = 1.31e-02  itol = 1.0e-04  subiters = 3   min(eta) = -13.61  [p  ]
Iteration 7:   deviance = 2.188e+04  eps = 4.32e-03  itol = 1.0e-04  subiters = 2   min(eta) = -14.67  [   ]
Iteration 8:   deviance = 2.185e+04  eps = 1.58e-03  itol = 1.0e-04  subiters = 2   min(eta) = -15.68  [   ]
Iteration 9:   deviance = 2.184e+04  eps = 5.81e-04  itol = 1.0e-04  subiters = 2   min(eta) = -16.68  [   ]
Iteration 10:  deviance = 2.183e+04  eps = 2.14e-04  itol = 1.0e-04  subiters = 2   min(eta) = -17.68  [   ]
Iteration 11:  deviance = 2.183e+04  eps = 7.86e-05  itol = 1.0e-04  subiters = 3   min(eta) = -18.68  [p  ]
Iteration 12:  deviance = 2.183e+04  eps = 2.89e-05  itol = 1.0e-06  subiters = 2   min(eta) = -19.68  [   ]
Iteration 13:  deviance = 2.183e+04  eps = 1.06e-05  itol = 1.0e-06  subiters = 2   min(eta) = -20.68  [   ]
Iteration 14:  deviance = 2.183e+04  eps = 3.91e-06  itol = 1.0e-06  subiters = 2   min(eta) = -21.68  [   ]
Iteration 15:  deviance = 2.183e+04  eps = 1.44e-06  itol = 1.0e-06  subiters = 2   min(eta) = -22.68  [   ]
Iteration 16:  deviance = 2.183e+04  eps = 5.30e-07  itol = 1.0e-06  subiters = 4   min(eta) = -23.68  [p  ]
Iteration 17:  deviance = 2.183e+04  eps = 1.95e-07  itol = 1.0e-08  subiters = 2   min(eta) = -24.68  [ s ]
Iteration 18:  deviance = 2.183e+04  eps = 7.17e-08  itol = 1.0e-08  subiters = 2   min(eta) = -25.68  [ s ]
Iteration 19:  deviance = 2.183e+04  eps = 2.64e-08  itol = 1.0e-08  subiters = 4   min(eta) = -26.68  [ps ]
Iteration 20:  deviance = 2.183e+04  eps = 9.70e-09  itol = 1.0e-08  subiters = 4   min(eta) = -27.68  [ps ]
Iteration 21:  deviance = 2.183e+04  eps = 3.57e-09  itol = 1.0e-10  subiters = 5   min(eta) = -28.68  [pso]
Iteration 22:  deviance = 2.183e+04  eps = 1.31e-09  itol = 1.0e-10  subiters = 5   min(eta) = -29.68  [pso]
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   o: epsilon below tolerance)
Converged in 22 iterations and 65 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =     24,776
Absorbing 2 HDFE groups                           Residual df     =     19,808
                                                  Wald chi2(6)    =      19.34
Deviance             =  21828.94734               Prob > chi2     =     0.0036
Log pseudolikelihood = -15610.50232               Pseudo R2       =     0.9311
--------------------------------------------------------------------------------------
                     |               Robust
            capacity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
    acc_varlag1 |     .00014   .0004254     0.33   0.742    -.0006937    .0009738
--------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |         5           0           5     |
        code |      4958           1        4957     |
-----------------------------------------------------+

This is strange because I would expect a significant positive coefficient as the linear models estimated. I do not understand why PPML fails to estimate this.

Last edited by Bruno Moreno; 06 Nov 2019, 10:59.

Comment

Tom Zylkin

Join Date: Nov 2016

Posts: 188
#4

06 Nov 2019, 11:24

Hi Bruno,
Normally one would not use PPML and OLS with the same dependent variable, since PPML assumes the linear parameters enter the model exponentially whereas OLS assumes they enter linearly. Thus the interpretations of the coefficients in your PPML and OLS regressions are very different and it is not surprising to me that you were getting very different results.
Regards,
Tom
1 like
Comment

Announcement