Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counterfactual Prediction Problem with PPMLHDFE Command

    I'm sorry if there is a solution topic to this question. I cannot find it, and I want to ask a question. I encountered a problem that I could not fix for more than two weeks. I try to evaluate an impact assessment with ppmlhdfe command using a dummy variable like covid (=0 or =1) to get prediction values. But in the ppmlhdfe estimation method, I cannot reach counterfactual prediction values. Only I can get old estimation results with covid =1 scenario. Is there any solution to that problem? I'm sorry again if there is a topic about that topic.

    Code:
     
    ppmlhdfe trade covid, absorb( export-time import-time) d
    predict px
    replace covid = 0
    predict pcounterfactual

    Thanks.
    Last edited by Etzel Efrasiyab; 15 Apr 2022, 01:25.

  • #2
    Please read the FAQ and reformat your question in the next response. For us to help you, we'll need a better description of what the problem is, what you're expecting Stata to give you, what the broader context is, and what Stata is currently giving you. Also, ppmlhdfe is a user written command, which the FAQ specifies that you state in your question.

    Welcome to Statalist, Etzel.

    Comment


    • #3
      I'm sorry for the previous post. I have a panel dataset that consists of trade flows of 150 countries for 20 years. I try to make an impact analysis with the difference in differences methodology. In the OLS estimation, I can use it easily with

      Code:
      trade dist dummyvariable
      predict px
      
      replace dummyvariable= 0
      predict counterfactual

      I want to get Yhat values B0 + B1Dist +B2dummyvariable (for =1) , which indicate the treatment; on the other hand, when replacing the dummvariable with 0, predict the counterfactual scenario of Yhat. Unfortunately, I have to work on the ppmlhdfe estimation method because of fixed effects.

      I want to get prediction values of B0 + B1Dist +B2dummyvariable (for =1) and B0 + B1Dist +B2dummyvariable (for =0) but with d option in ppmlhdfe estimation results give me same results for both of them. Is there any solution to that problem?

      I'm sorry for bothering again.

      Thanks.



      Comment


      • #4
        I still need to see your example data using the dataex command, as well as the syntax you used for ppmlhdfe.


        Either way, here's some syntax I've borrowed from Bernal 2021
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int year byte month int aces byte(time smokban) float(pop stdpop)
        2002  1  728  1 0 364277.4 379875.3
        2002  2  659  2 0 364277.4 376495.5
        2002  3  791  3 0 364277.4 377040.8
        2002  4  734  4 0 364277.4 377116.4
        2002  5  757  5 0 364277.4 377383.4
        2002  6  726  6 0 364277.4 374113.1
        2002  7  760  7 0 364277.4 379513.3
        2002  8  740  8 0 364277.4 376295.5
        2002  9  720  9 0 364277.4 374653.2
        2002 10  814 10 0 364277.4 378485.6
        2002 11  795 11 0 364277.4 375955.5
        2002 12  858 12 0 364277.4 378349.7
        2003  1  887 13 0 363350.8 376762.4
        2003  2  766 14 0 363350.8 379032.3
        2003  3  851 15 0 363350.8 379360.4
        2003  4  769 16 0 363350.8   376162
        2003  5  781 17 0 363350.8 377972.4
        2003  6  756 18 0 363350.8 381830.7
        2003  7  766 19 0 363350.8 379888.6
        2003  8  752 20 0 363350.8 380872.2
        2003  9  765 21 0 363350.8 380966.9
        2003 10  831 22 0 363350.8 381240.4
        2003 11  879 23 0 363350.8 382104.9
        2003 12  928 24 0 363350.8 381802.7
        2004  1  914 25 0 364700.4 381656.3
        2004  2  808 26 0 364700.4   383680
        2004  3  937 27 0 364700.4 383504.2
        2004  4  840 28 0 364700.4 386462.9
        2004  5  916 29 0 364700.4 383783.1
        2004  6  828 30 0 364700.4 380836.8
        2004  7  845 31 0 364700.4   383483
        2004  8  818 32 0 364700.4 380906.2
        2004  9  860 33 0 364700.4 382926.8
        2004 10  839 34 0 364700.4 384052.4
        2004 11  887 35 0 364700.4 384449.6
        2004 12  886 36 0 364700.4 383428.4
        2005  1  831 37 1 364420.8 388153.2
        2005  2  796 38 1 364420.8 388373.2
        2005  3  833 39 1 364420.8 386470.1
        2005  4  820 40 1 364420.8 386033.2
        2005  5  877 41 1 364420.8 383686.4
        2005  6  758 42 1 364420.8 385509.3
        2005  7  767 43 1 364420.8 385901.9
        2005  8  738 44 1 364420.8 386516.6
        2005  9  781 45 1 364420.8 388436.5
        2005 10  843 46 1 364420.8 383255.2
        2005 11  850 47 1 364420.8 390148.7
        2005 12  908 48 1 364420.8 385874.9
        2006  1 1021 49 1 363832.6 391613.6
        2006  2  859 50 1 363832.6 391750.4
        2006  3  976 51 1 363832.6 394005.6
        2006  4  888 52 1 363832.6 391364.9
        2006  5  962 53 1 363832.6 391664.6
        2006  6  838 54 1 363832.6 389022.3
        2006  7  810 55 1 363832.6 391878.5
        2006  8  876 56 1 363832.6 388575.3
        2006  9  843 57 1 363832.6   392989
        2006 10  936 58 1 363832.6 390018.8
        2006 11  912 59 1 363832.6 390712.3
        end
        cls
        gen rate = aces/stdpop*10^5
        
        
        *log transform the standardised population:
        gen logstdpop = log(stdpop)
        
        
        *Poisson with the outcome (aces), intervention (smokban) and time as well as the population offset offset
        glm aces smokban time, family(poisson) link(log) offset(logstdpop) eform
        
        
        *We generate predicted values based on the model in order to create a plot of the model:
        predict pred, nooffset
        
        *This can then be plotted along with a scatter graph:
        gen rate1 = aces/stdpop /*to put rate in same scale as count in model */
        twoway (scatter rate1 time) (line pred time, lcolor(red)) , title("Sicily, 2002-2006") ///
        ytitle(Std rate x 10000) yscale(range(0 .)) ylabel(#5, labsize(small) angle(horizontal)) ///
        xtick(0.5(12)60.5) xlabel(6"2002" 18"2003" 30"2004" 42"2005" 54"2006", noticks labsize(small)) xtitle(year) ///
        xline(36.5)
        
        
        *Generate the counterfactual by removing the effect of the intervention (_b[smokban]) for the post-intervention period
        gen pred1 = pred/exp(_b[smokban]) if smokban==1
        
        
        *Add the counterfactual to the plot
        twoway (scatter rate1 time) (line pred time, lcolor(red)) (line pred1 time, lcolor(red) lpattern(dash)), title("Sicily, 2002-2006") ///
        ytitle(Std rate x 10000) yscale(range(0 .)) ylabel(#5, labsize(small) angle(horizontal)) ///
        xtick(0.5(12)60.5) xlabel(6"2002" 18"2003" 30"2004" 42"2005" 54"2006", noticks labsize(small)) xtitle(year) ///
        xline(36.5)
        So notice what they've done here. They begin by estimating their model (in this case an interrupted time series model, but it doesn't matter, this applies to any regression model pretty much).

        They then generate their prediction line (how well the Poisson estimator predicts the observed data), and then, the all important step, they estimate the counterfactual with
        Code:
        gen pred1 = pred/exp(_b[smokban]) if smokban==1

        Comment

        Working...
        X