Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accounting for the skewed distribution of the dependent variable in system GMM - prediction

    Hello,

    I am estimating a model using country-level panel data with System GMM. My dataset has 96 countries covering the period 2008-2022 with data available every two years (T=8). The panel is ‘strongly balanced’.

    My dependent variable is adult per capita cigarette consumption (proxied by the total value of legal cigarette retail sales on an annual basis divided by the number of adults aged 15 and older in the population in a given year). All values are positive, there are no zeros, and the distribution of this variable, which I have called pccons, is skewed (graph directly below).




    The convention in my field is to log per capita consumption and since the coefficients I am interested in are also typically interpreted as elasticities, the relationship between my dependent variable and the variable I am interested in is log-log. I show the logged distribution of my dependent variable (lnpccons) and the logged distribution of the main independent variable I am interested in, lnTTI below. lnTTI is the the log of the total cigarette tax incidence (share of all taxes in the price of a 20-pack of the most sold cigarette brand).

    Code:
     hist lnpccons
    (bin=27, start=3.3852437, width=.18245643)
    Click image for larger version

Name:	lnpccons graph.png
Views:	2
Size:	53.0 KB
ID:	1755847


    Code:
    hist lnTTI
    Click image for larger version

Name:	lnTTI.png
Views:	1
Size:	50.7 KB
ID:	1755848



    I am especially interested in getting predicted values from the dynamic model that I fit. However, - predict y_hat, xb - gives me the prediction of log per capita consumption, which doesn’t mean anything to me. Example of my situation below.

    Code:
     xtset id year, delta(2)
    
    Panel variable: id (strongly balanced)
     Time variable: year, 2008 to 2022
             Delta: 2 units
    
    .
    . xi: xtabond2 lnpccons L.lnpccons L.highPOWE lnTTI lnGDPPC unem wap i.year, gmmstyle(L.lnpccons) ivstyle(L.highPOWE lnTTI lnGDPPC unem wap i.year) tw
    > ostep robust small orthogonal
    i.year            _Iyear_2008-2022    (naturally coded; _Iyear_2008 omitted)
    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
    _Iyear_2010 dropped due to collinearity
    Warning: Two-step estimated covariance matrix of moments is singular.
      Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
      Difference-in-Sargan/Hansen statistics may be negative.
    
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: id                              Number of obs      =       668
    Time variable : year                            Number of groups   =        96
    Number of instruments = 39                      Obs per group: min =         5
    F(12, 95)     = 141108.55                                      avg =      6.96
    Prob > F      =     0.000                                      max =         7
    ------------------------------------------------------------------------------
                 |              Corrected
        lnpccons | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
        lnpccons |
             L1. |   1.030507   .0184528    55.85   0.000     .9938735     1.06714
                 |
        highPOWE |
             L1. |  -.0098811   .0107595    -0.92   0.361    -.0312413    .0114791
                 |
           lnTTI |     -.0694   .0136406    -5.09   0.000      -.09648     -.04232
         lnGDPPC |  -.0024295   .0023829    -1.02   0.311    -.0071601     .002301
            unem |  -.0016306   .0016559    -0.98   0.327     -.004918    .0016568
             wap |  -.0010921   .0013602    -0.80   0.424    -.0037924    .0016082
     _Iyear_2012 |   .0167286   .0183754     0.91   0.365    -.0197512    .0532084
     _Iyear_2014 |    -.02187   .0170215    -1.28   0.202    -.0556619     .011922
     _Iyear_2016 |  -.0006167   .0179358    -0.03   0.973    -.0362238    .0349904
     _Iyear_2018 |  -.0205002   .0163593    -1.25   0.213    -.0529775    .0119772
     _Iyear_2020 |  -.0246162   .0175492    -1.40   0.164    -.0594559    .0102234
     _Iyear_2022 |   .0297032    .021193     1.40   0.164    -.0123701    .0717765
           _cons |   .1223026   .0643969     1.90   0.061    -.0055415    .2501466
    ------------------------------------------------------------------------------
    Instruments for orthogonal deviations equation
      Standard
        FOD.(L.highPOWE lnTTI lnGDPPC unem wap _Iyear_2010 _Iyear_2012 _Iyear_2014
        _Iyear_2016 _Iyear_2018 _Iyear_2020 _Iyear_2022)
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(1/7).L.lnpccons
    Instruments for levels equation
      Standard
        L.highPOWE lnTTI lnGDPPC unem wap _Iyear_2010 _Iyear_2012 _Iyear_2014
        _Iyear_2016 _Iyear_2018 _Iyear_2020 _Iyear_2022
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        D.L.lnpccons
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -3.68  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -0.97  Pr > z =  0.330
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(26)   =  71.61  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(26)   =  27.74  Prob > chi2 =  0.372
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(20)   =  18.80  Prob > chi2 =  0.535
        Difference (null H = exogenous): chi2(6)    =   8.93  Prob > chi2 =  0.177
      iv(L.highPOWE lnTTI lnGDPPC unem wap _Iyear_2010 _Iyear_2012 _Iyear_2014 _Iyear_2016 _Iyear_2018 _Iyear_2020 _Iyear_2022)
        Hansen test excluding group:     chi2(15)   =  20.67  Prob > chi2 =  0.148
        Difference (null H = exogenous): chi2(11)   =   7.07  Prob > chi2 =  0.794
    
    
    .
    . predict lnpccons_hat, xb
    (100 missing values generated)
    
    .
    . sum lnpccons_hat
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
    lnpccons_hat |        668    6.547359    1.006406   3.251544   8.253994

    I have read about xtpoisson, fe with vce(robust) being used even with data that are not count strictly count data in this forum, which is useful because you don’t need to do a log transform to get ‘better’ predictions of the dependent variable in levels. While this may be appropriate for a static model (no lagged dependent variables), I am wondering if there is a way to account for my skewed dependent variable in the System GMM framework so that I can avoid logging the dependent variable?

    Thank you!

    Sam


    Attached Files

  • #2
    Check this, but I think e(sigma) is the RMSE.

    predict yfit, xb
    g yfit2 = (exp(yfit)*exp((`e(sigma)'^2)/2))

    Comment


    • #3
      Thank you for your response, George - I appreciate it!

      Can I please clarify with you whether it is correct to use the same RMSE from the fitted model when running trying to obtain a prediction of a counterfactual scenario (using “fake” data)?

      Sam

      Comment


      • #4
        Hmm. I suspect so. When you exp a prediction from a log DV, you have to adjust for the RMSE. That's all this does. It's just a transformation of the prediction.

        Comment


        • #5
          Thank you, George!

          Comment


          • #6
            You ran [lnY = b*X], then predicted the DV and got predictions of lnY, which you didn't want. You wanted predictions of Y. Typically, the reverse the log you'd just do exp(lnY). But, since it's a prediction from a regression, you make the adjustment.

            HTML Code:
            https://davegiles.blogspot.com/2014/12/s.html

            Comment


            • #7
              Thanks, George. I was hopeful that there was an alternate like using glm with a log link, or xtpoisson as I have seen elsewhere on this forum, but for the case when you want to do an IV approach designed for a dynamic model with small T like I have, but I'll stick with the adjusted exponential conversion.

              Sam

              Comment


              • #8
                glm or xtpoisson might work. hard to say without the data.

                but if you need xtabond2, probably not.
                Last edited by George Ford; 12 Jun 2024, 13:47.

                Comment

                Working...
                X