Counterfactual Prediction Problem with PPMLHDFE Command

Etzel Efrasiyab

Join Date: Apr 2022

Posts: 2
#1

Counterfactual Prediction Problem with PPMLHDFE Command

15 Apr 2022, 01:22

I'm sorry if there is a solution topic to this question. I cannot find it, and I want to ask a question. I encountered a problem that I could not fix for more than two weeks. I try to evaluate an impact assessment with ppmlhdfe command using a dummy variable like covid (=0 or =1) to get prediction values. But in the ppmlhdfe estimation method, I cannot reach counterfactual prediction values. Only I can get old estimation results with covid =1 scenario. Is there any solution to that problem? I'm sorry again if there is a topic about that topic.

Code:

ppmlhdfe trade covid, absorb( export-time import-time) d predict px replace covid = 0 predict pcounterfactual

Thanks.

Last edited by Etzel Efrasiyab; 15 Apr 2022, 01:25.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

15 Apr 2022, 08:11

Please read the FAQ and reformat your question in the next response. For us to help you, we'll need a better description of what the problem is, what you're expecting Stata to give you, what the broader context is, and what Stata is currently giving you. Also, ppmlhdfe is a user written command, which the FAQ specifies that you state in your question.

Welcome to Statalist, Etzel.
Comment
Etzel Efrasiyab

Join Date: Apr 2022

Posts: 2
#3

15 Apr 2022, 09:40

I'm sorry for the previous post. I have a panel dataset that consists of trade flows of 150 countries for 20 years. I try to make an impact analysis with the difference in differences methodology. In the OLS estimation, I can use it easily with

Code:

trade dist dummyvariable predict px replace dummyvariable= 0 predict counterfactual

I want to get Yhat values B0 + B1Dist +B₂dummyvariable (for =1) , which indicate the treatment; on the other hand, when replacing the dummvariable with 0, predict the counterfactual scenario of Yhat. Unfortunately, I have to work on the ppmlhdfe estimation method because of fixed effects.

I want to get prediction values of B0 + B1Dist +B₂dummyvariable (for =1) and B0 + B1Dist +B₂dummyvariable (for =0) but with d option in ppmlhdfe estimation results give me same results for both of them. Is there any solution to that problem?

I'm sorry for bothering again.

Thanks.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

15 Apr 2022, 10:10

I still need to see your example data using the dataex command, as well as the syntax you used for ppmlhdfe.

Either way, here's some syntax I've borrowed from Bernal 2021

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte month int aces byte(time smokban) float(pop stdpop)
2002  1  728  1 0 364277.4 379875.3
2002  2  659  2 0 364277.4 376495.5
2002  3  791  3 0 364277.4 377040.8
2002  4  734  4 0 364277.4 377116.4
2002  5  757  5 0 364277.4 377383.4
2002  6  726  6 0 364277.4 374113.1
2002  7  760  7 0 364277.4 379513.3
2002  8  740  8 0 364277.4 376295.5
2002  9  720  9 0 364277.4 374653.2
2002 10  814 10 0 364277.4 378485.6
2002 11  795 11 0 364277.4 375955.5
2002 12  858 12 0 364277.4 378349.7
2003  1  887 13 0 363350.8 376762.4
2003  2  766 14 0 363350.8 379032.3
2003  3  851 15 0 363350.8 379360.4
2003  4  769 16 0 363350.8   376162
2003  5  781 17 0 363350.8 377972.4
2003  6  756 18 0 363350.8 381830.7
2003  7  766 19 0 363350.8 379888.6
2003  8  752 20 0 363350.8 380872.2
2003  9  765 21 0 363350.8 380966.9
2003 10  831 22 0 363350.8 381240.4
2003 11  879 23 0 363350.8 382104.9
2003 12  928 24 0 363350.8 381802.7
2004  1  914 25 0 364700.4 381656.3
2004  2  808 26 0 364700.4   383680
2004  3  937 27 0 364700.4 383504.2
2004  4  840 28 0 364700.4 386462.9
2004  5  916 29 0 364700.4 383783.1
2004  6  828 30 0 364700.4 380836.8
2004  7  845 31 0 364700.4   383483
2004  8  818 32 0 364700.4 380906.2
2004  9  860 33 0 364700.4 382926.8
2004 10  839 34 0 364700.4 384052.4
2004 11  887 35 0 364700.4 384449.6
2004 12  886 36 0 364700.4 383428.4
2005  1  831 37 1 364420.8 388153.2
2005  2  796 38 1 364420.8 388373.2
2005  3  833 39 1 364420.8 386470.1
2005  4  820 40 1 364420.8 386033.2
2005  5  877 41 1 364420.8 383686.4
2005  6  758 42 1 364420.8 385509.3
2005  7  767 43 1 364420.8 385901.9
2005  8  738 44 1 364420.8 386516.6
2005  9  781 45 1 364420.8 388436.5
2005 10  843 46 1 364420.8 383255.2
2005 11  850 47 1 364420.8 390148.7
2005 12  908 48 1 364420.8 385874.9
2006  1 1021 49 1 363832.6 391613.6
2006  2  859 50 1 363832.6 391750.4
2006  3  976 51 1 363832.6 394005.6
2006  4  888 52 1 363832.6 391364.9
2006  5  962 53 1 363832.6 391664.6
2006  6  838 54 1 363832.6 389022.3
2006  7  810 55 1 363832.6 391878.5
2006  8  876 56 1 363832.6 388575.3
2006  9  843 57 1 363832.6   392989
2006 10  936 58 1 363832.6 390018.8
2006 11  912 59 1 363832.6 390712.3
end
cls
gen rate = aces/stdpop*10^5


*log transform the standardised population:
gen logstdpop = log(stdpop)


*Poisson with the outcome (aces), intervention (smokban) and time as well as the population offset offset
glm aces smokban time, family(poisson) link(log) offset(logstdpop) eform


*We generate predicted values based on the model in order to create a plot of the model:
predict pred, nooffset

*This can then be plotted along with a scatter graph:
gen rate1 = aces/stdpop /*to put rate in same scale as count in model */
twoway (scatter rate1 time) (line pred time, lcolor(red)) , title("Sicily, 2002-2006") ///
ytitle(Std rate x 10000) yscale(range(0 .)) ylabel(#5, labsize(small) angle(horizontal)) ///
xtick(0.5(12)60.5) xlabel(6"2002" 18"2003" 30"2004" 42"2005" 54"2006", noticks labsize(small)) xtitle(year) ///
xline(36.5)


*Generate the counterfactual by removing the effect of the intervention (_b[smokban]) for the post-intervention period
gen pred1 = pred/exp(_b[smokban]) if smokban==1


*Add the counterfactual to the plot
twoway (scatter rate1 time) (line pred time, lcolor(red)) (line pred1 time, lcolor(red) lpattern(dash)), title("Sicily, 2002-2006") ///
ytitle(Std rate x 10000) yscale(range(0 .)) ylabel(#5, labsize(small) angle(horizontal)) ///
xtick(0.5(12)60.5) xlabel(6"2002" 18"2003" 30"2004" 42"2005" 54"2006", noticks labsize(small)) xtitle(year) ///
xline(36.5)

So notice what they've done here. They begin by estimating their model (in this case an interrupted time series model, but it doesn't matter, this applies to any regression model pretty much).

They then generate their prediction line (how well the Poisson estimator predicts the observed data), and then, the all important step, they estimate the counterfactual with

Code:

gen pred1 = pred/exp(_b[smokban]) if smokban==1

Announcement

Counterfactual Prediction Problem with PPMLHDFE Command

Comment

Comment

Comment