Hi, Guys,
I am doing a regression using ppmlhdfe command, me regression command is:
ppmlhdfe p tre_1 tre_2 tre_3 tre_4, noconstant absorb(j#hs_code_6 j#yrm_doc hs_code_6#yrm_doc) sep(none) itol(1e-5) tol(1e-5) cluster(case_id hts_code_8 )
where
p represents the import price,
tre_1 represents dummy variable that a duty imposed on one product for one country (==1)
tre_2 represents dummy variable that there is no duty imposed on one product for one country anymore (==1)
tre_3 represents dummy variable that for the same product in tre_1 BUT for other country instead of the country in tre_1 (==1)
tre_4 represents dummy variable that for the same product in tre_2 BUT for other country instead of the country in tre_2 (==1)
the fixed effect I added are: exported country (j)#hts code at 6 digits (hts_code_6), exported country (j)#time, hts_code_6#time
the cluster I included are the product-country pair variable (case_id) and hts code at 8 digits (hts_code_8)
But when I run the command, I noticed several things that is strange for me to understand, below is the regression table I have:
The first question is: the color of min(bta) value after the Iteration 3 is becoming red, is it something wrong with my regression command or my data?
The second question is for the absorbed degrees of freedom: why there is a question mark “?” behind the third fixed effect? what does the number of redundant parameters may be higher mean in here?
THEN I tried different clusters. Since there are reviews for determining whether to impose duty on the product of one country every several years, so reviews for specific country-product pair might affect each other, so I instead include the sequence of the reviews (no_review=="1" as 1st review, no_review==“2” as 2nd review, or so) for clustering, together with exported country variable(j), the regression became:
ppmlhdfe p tre_1 tre_2 tre_3 tre_4, noconstant absorb(j#hs_code_6 j#yrm_doc hs_code_6#yrm_doc) sep(none) itol(1e-5) tol(1e-5) cluster(j no_review)
BUT the regression table presents other issues:
The first question is the same as the previous table: the color of min(bta) value after the Iteration 5 is becoming red, is it something wrong with my regression command or my data? (I attached a screenshot about this red part below in case)
The second question is that my Wald chi2(4) and Prob > chi2 is missing in the table, I am not sure what is going on with it....
Also, the third question is about me fixed effect, I notice that the hs_code_6#yrm_doc have no “*” behind, which is different from other two fixed effects, what does the * actually mean? Does that indicate I should not add this hs_code_6#yrm_doc fixed effects in the model?
Thank you so much!
I am doing a regression using ppmlhdfe command, me regression command is:
ppmlhdfe p tre_1 tre_2 tre_3 tre_4, noconstant absorb(j#hs_code_6 j#yrm_doc hs_code_6#yrm_doc) sep(none) itol(1e-5) tol(1e-5) cluster(case_id hts_code_8 )
where
p represents the import price,
tre_1 represents dummy variable that a duty imposed on one product for one country (==1)
tre_2 represents dummy variable that there is no duty imposed on one product for one country anymore (==1)
tre_3 represents dummy variable that for the same product in tre_1 BUT for other country instead of the country in tre_1 (==1)
tre_4 represents dummy variable that for the same product in tre_2 BUT for other country instead of the country in tre_2 (==1)
the fixed effect I added are: exported country (j)#hts code at 6 digits (hts_code_6), exported country (j)#time, hts_code_6#time
the cluster I included are the product-country pair variable (case_id) and hts code at 8 digits (hts_code_8)
But when I run the command, I noticed several things that is strange for me to understand, below is the regression table I have:
Iteration 1: deviance = 3.1562e+13 eps = . iters = 14 tol = 1.0e-04 min(eta) = -7.58 PS
Iteration 2: deviance = 1.8249e+13 eps = 7.30e-01 iters = 10 tol = 1.0e-04 min(eta) = -9.54 S
Iteration 3: deviance = 1.5405e+13 eps = 1.85e-01 iters = 8 tol = 1.0e-04 min(eta) = -11.84 S
Iteration 4: deviance = 1.5017e+13 eps = 2.58e-02 iters = 7 tol = 1.0e-04 min(eta) = -14.27 S
Iteration 5: deviance = 1.4985e+13 eps = 2.09e-03 iters = 6 tol = 1.0e-04 min(eta) = -16.36 S
Iteration 6: deviance = 1.4982e+13 eps = 2.14e-04 iters = 5 tol = 1.0e-04 min(eta) = -18.83 S
Iteration 7: deviance = 1.4982e+13 eps = 4.37e-05 iters = 4 tol = 1.0e-04 min(eta) = -21.82 S
Iteration 8: deviance = 1.4981e+13 eps = 1.08e-05 iters = 17 tol = 1.0e-05 min(eta) = -24.81 S
Iteration 9: deviance = 1.4981e+13 eps = 2.88e-06 iters = 32 tol = 1.0e-06 min(eta) = -27.81 S O
----------------------------------------------------------------------------------------------------------
> --
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 9 iterations and 103 HDFE sub-iterations (tol = 1.0e-05)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
HDFE PPML regression No. of obs = 12704328
Absorbing 3 HDFE groups Residual df = 582
Statistics robust to heteroskedasticity Wald chi2(4) = 74.62
Deviance = 1.49813e+13 Prob > chi2 = 0.0000
Log pseudolikelihood = -7.49069e+12 Pseudo R2 = 0.6644
Number of clusters (case_id)= 583
Number of clusters (hs_code_8)= 2,005
(Std. Err. adjusted for 583 clusters in case_id hs_code_8)
------------------------------------------------------------------------------
| Robust
v | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tre_1 | .5002759 .1419406 3.52 0.000 .2220774 .7784745
tre_2 | .4012424 .1424712 2.82 0.005 .1220039 .6804808
tre_3 | .5654237 .1174388 4.81 0.000 .3352479 .7955995
tre_4 | .4747264 .1384631 3.43 0.001 .2033437 .7461091
_cons | 15.22352 .0520799 292.31 0.000 15.12145 15.3256
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
---------------------+---------------------------------------|
j#hs_code_6 | 29361 0 29361 |
j#yrm_doc | 39480 310 39170 |
hs_code_6#yrm_doc | 38994 674 38320 ?|
-------------------------------------------------------------+
? = number of redundant parameters may be higher
Iteration 2: deviance = 1.8249e+13 eps = 7.30e-01 iters = 10 tol = 1.0e-04 min(eta) = -9.54 S
Iteration 3: deviance = 1.5405e+13 eps = 1.85e-01 iters = 8 tol = 1.0e-04 min(eta) = -11.84 S
Iteration 4: deviance = 1.5017e+13 eps = 2.58e-02 iters = 7 tol = 1.0e-04 min(eta) = -14.27 S
Iteration 5: deviance = 1.4985e+13 eps = 2.09e-03 iters = 6 tol = 1.0e-04 min(eta) = -16.36 S
Iteration 6: deviance = 1.4982e+13 eps = 2.14e-04 iters = 5 tol = 1.0e-04 min(eta) = -18.83 S
Iteration 7: deviance = 1.4982e+13 eps = 4.37e-05 iters = 4 tol = 1.0e-04 min(eta) = -21.82 S
Iteration 8: deviance = 1.4981e+13 eps = 1.08e-05 iters = 17 tol = 1.0e-05 min(eta) = -24.81 S
Iteration 9: deviance = 1.4981e+13 eps = 2.88e-06 iters = 32 tol = 1.0e-06 min(eta) = -27.81 S O
----------------------------------------------------------------------------------------------------------
> --
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 9 iterations and 103 HDFE sub-iterations (tol = 1.0e-05)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
HDFE PPML regression No. of obs = 12704328
Absorbing 3 HDFE groups Residual df = 582
Statistics robust to heteroskedasticity Wald chi2(4) = 74.62
Deviance = 1.49813e+13 Prob > chi2 = 0.0000
Log pseudolikelihood = -7.49069e+12 Pseudo R2 = 0.6644
Number of clusters (case_id)= 583
Number of clusters (hs_code_8)= 2,005
(Std. Err. adjusted for 583 clusters in case_id hs_code_8)
------------------------------------------------------------------------------
| Robust
v | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tre_1 | .5002759 .1419406 3.52 0.000 .2220774 .7784745
tre_2 | .4012424 .1424712 2.82 0.005 .1220039 .6804808
tre_3 | .5654237 .1174388 4.81 0.000 .3352479 .7955995
tre_4 | .4747264 .1384631 3.43 0.001 .2033437 .7461091
_cons | 15.22352 .0520799 292.31 0.000 15.12145 15.3256
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
---------------------+---------------------------------------|
j#hs_code_6 | 29361 0 29361 |
j#yrm_doc | 39480 310 39170 |
hs_code_6#yrm_doc | 38994 674 38320 ?|
-------------------------------------------------------------+
? = number of redundant parameters may be higher
The second question is for the absorbed degrees of freedom: why there is a question mark “?” behind the third fixed effect? what does the number of redundant parameters may be higher mean in here?
THEN I tried different clusters. Since there are reviews for determining whether to impose duty on the product of one country every several years, so reviews for specific country-product pair might affect each other, so I instead include the sequence of the reviews (no_review=="1" as 1st review, no_review==“2” as 2nd review, or so) for clustering, together with exported country variable(j), the regression became:
ppmlhdfe p tre_1 tre_2 tre_3 tre_4, noconstant absorb(j#hs_code_6 j#yrm_doc hs_code_6#yrm_doc) sep(none) itol(1e-5) tol(1e-5) cluster(j no_review)
BUT the regression table presents other issues:
Iteration 1: deviance = 3.1562e+13 eps = . iters = 14 tol = 1.0e-04 min(eta) = -7.58 PS
Iteration 2: deviance = 1.8249e+13 eps = 7.30e-01 iters = 10 tol = 1.0e-04 min(eta) = -9.54 S
Iteration 3: deviance = 1.5405e+13 eps = 1.85e-01 iters = 8 tol = 1.0e-04 min(eta) = -11.84 S
Iteration 4: deviance = 1.5017e+13 eps = 2.58e-02 iters = 7 tol = 1.0e-04 min(eta) = -14.27 S
Iteration 5: deviance = 1.4985e+13 eps = 2.09e-03 iters = 6 tol = 1.0e-04 min(eta) = -16.36 S
Iteration 6: deviance = 1.4982e+13 eps = 2.14e-04 iters = 5 tol = 1.0e-04 min(eta) = -18.83 S
Iteration 7: deviance = 1.4982e+13 eps = 4.37e-05 iters = 4 tol = 1.0e-04 min(eta) = -21.82 S
Iteration 8: deviance = 1.4981e+13 eps = 1.08e-05 iters = 17 tol = 1.0e-05 min(eta) = -24.81 S
Iteration 9: deviance = 1.4981e+13 eps = 2.88e-06 iters = 32 tol = 1.0e-06 min(eta) = -27.81 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 9 iterations and 103 HDFE sub-iterations (tol = 1.0e-05)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
warning: missing F statistic; dropped variables due to collinearity or too few clusters
HDFE PPML regression No. of obs = 12704328
Absorbing 3 HDFE groups Residual df = 3
Statistics robust to heteroskedasticity Wald chi2(4) = .
Deviance = 1.49813e+13 Prob > chi2 = .
Log pseudolikelihood = -7.49069e+12 Pseudo R2 = 0.6644
Number of clusters (j) = 223
Number of clusters (no_review)= 4
(Std. Err. adjusted for 4 clusters in j no_review)
------------------------------------------------------------------------------
| Robust
v | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tre_1 | .5002759 .0716396 6.98 0.000 .3598648 .640687
tre_2 | .4012424 .1218577 3.29 0.001 .1624057 .640079
tre_3 | .5654237 .0949151 5.96 0.000 .3793936 .7514538
tre_4 | .4747264 .1139863 4.16 0.000 .2513173 .6981354
_cons | 15.22352 .037072 410.65 0.000 15.15087 15.29618
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
---------------------+---------------------------------------|
j#hs_code_6 | 29361 29361 0 *|
j#yrm_doc | 39480 39480 0 *|
hs_code_6#yrm_doc | 38994 0 38994 |
-------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
Iteration 2: deviance = 1.8249e+13 eps = 7.30e-01 iters = 10 tol = 1.0e-04 min(eta) = -9.54 S
Iteration 3: deviance = 1.5405e+13 eps = 1.85e-01 iters = 8 tol = 1.0e-04 min(eta) = -11.84 S
Iteration 4: deviance = 1.5017e+13 eps = 2.58e-02 iters = 7 tol = 1.0e-04 min(eta) = -14.27 S
Iteration 5: deviance = 1.4985e+13 eps = 2.09e-03 iters = 6 tol = 1.0e-04 min(eta) = -16.36 S
Iteration 6: deviance = 1.4982e+13 eps = 2.14e-04 iters = 5 tol = 1.0e-04 min(eta) = -18.83 S
Iteration 7: deviance = 1.4982e+13 eps = 4.37e-05 iters = 4 tol = 1.0e-04 min(eta) = -21.82 S
Iteration 8: deviance = 1.4981e+13 eps = 1.08e-05 iters = 17 tol = 1.0e-05 min(eta) = -24.81 S
Iteration 9: deviance = 1.4981e+13 eps = 2.88e-06 iters = 32 tol = 1.0e-06 min(eta) = -27.81 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 9 iterations and 103 HDFE sub-iterations (tol = 1.0e-05)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
warning: missing F statistic; dropped variables due to collinearity or too few clusters
HDFE PPML regression No. of obs = 12704328
Absorbing 3 HDFE groups Residual df = 3
Statistics robust to heteroskedasticity Wald chi2(4) = .
Deviance = 1.49813e+13 Prob > chi2 = .
Log pseudolikelihood = -7.49069e+12 Pseudo R2 = 0.6644
Number of clusters (j) = 223
Number of clusters (no_review)= 4
(Std. Err. adjusted for 4 clusters in j no_review)
------------------------------------------------------------------------------
| Robust
v | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tre_1 | .5002759 .0716396 6.98 0.000 .3598648 .640687
tre_2 | .4012424 .1218577 3.29 0.001 .1624057 .640079
tre_3 | .5654237 .0949151 5.96 0.000 .3793936 .7514538
tre_4 | .4747264 .1139863 4.16 0.000 .2513173 .6981354
_cons | 15.22352 .037072 410.65 0.000 15.15087 15.29618
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
---------------------+---------------------------------------|
j#hs_code_6 | 29361 29361 0 *|
j#yrm_doc | 39480 39480 0 *|
hs_code_6#yrm_doc | 38994 0 38994 |
-------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
The first question is the same as the previous table: the color of min(bta) value after the Iteration 5 is becoming red, is it something wrong with my regression command or my data? (I attached a screenshot about this red part below in case)
The second question is that my Wald chi2(4) and Prob > chi2 is missing in the table, I am not sure what is going on with it....
Also, the third question is about me fixed effect, I notice that the hs_code_6#yrm_doc have no “*” behind, which is different from other two fixed effects, what does the * actually mean? Does that indicate I should not add this hs_code_6#yrm_doc fixed effects in the model?
Thank you so much!