Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First difference estimator for panel data using the sem command

    Dear Stata forum,

    I would like to estimate a mediation model using the first difference estimator. I know that the user-written xtdpdml command from Williams, Allison and Moral Benito exists but as far as I understand, it is not possible to use this command for mediation analysis. Therefore, I need to specify the first difference estimator using Stata's sem command.

    However, I am not sure how to code the first difference estimator using Stata's sem command. I have not found any examples in the manual, nor on the Statalist forum. In particular, the first difference estimator assumes sequential exogeneity. I tried to correctly specify this assumption based on the code and using the same dataset as explained in Allison, P. D., Williams, R., & Moral-Benito, E. (2017). Maximum likelihood for cross-lagged panel models with fixed effects. Socius, 3, 2378023117710578 in which they code a cross-lagged panel data model (which also assumes sequential exogeneity). However, my first difference estimator does not converge.

    This is my code for the first difference estimator using the online available dataset. I also estimated the model using the reg command for comparison (both estimations should be in the same range of magnitude).

    Code:
    /*==========================================================================================*\
    | Implementation of ML-SEM for the first difference estimator
    | Data described by Cornwell and Rupert (1988) for 595 household heads who reported a non-zero
    | wage in each of 7 years from 1976 to 1982
    | Dependent variable: wks (number of weeks employed in each year)
    | Independent variable: union (dummy equalling 1 if wage set by union contract in each year)
    | Control variable: lwage (log of wage in each year)
    | xtdpdml and sem: method mlmv to equal the outcome of the reg command
    | Stata version 17.0
    \*==========================================================================================*/    
    ************************** estimations using sem *******************************
    ** import the dataset
    use https://www3.nd.edu/~rwilliam/statafiles/wages, clear
    keep wks lwage union id t
    
    ** specify the first differences manually
    foreach x of varlist wks lwage union {
        gen d1`x'=D.`x'      /* x_i,t - x_i,t-1 */
    }
    
    ** reshape the data to wide format
    reshape wide wks d1wks lwage d1lwage union d1union, i(id) j(t)
    
    ** first difference estimation: this model does not converge 
    sem (d1wks1 <- d1lwage1@b2 d1union1@b3 Alpha@1 E1@1) ///
        (d1wks2 <- d1lwage2@b2 d1union2@b3 Alpha@1 E2@1) ///
        (d1wks3 <- d1lwage3@b2 d1union3@b3 Alpha@1 E3@1) ///
        (d1wks4 <- d1lwage4@b2 d1union4@b3 Alpha@1 E4@1) ///
        (d1wks5 <- d1lwage5@b2 d1union5@b3 Alpha@1 E5@1) ///
        (d1wks6 <- d1lwage6@b2 d1union6@b3 Alpha@1 E6@1) ///
        (d1wks7 <- d1lwage7@b2 d1union7@b3 Alpha@1), ///
        var(e.d1wks1@0 e.d1wks2@0 e.d1wks3@0 e.d1wks4@0 e.d1wks5@0 e.d1wks6@0) ///sequential exogeneity assumption: error term may be correlated with future values of time-dependent predictors. In sem, the only solution to do this is to suppress the original error terms by setting their variances equal to zero and introducing new latent error terms E1-E6. There is no need to do that at time 7 because there is no future value of union in the model.
        cov(Alpha*(E1 E2 E3 E4 E5 E6)@0) ///sets the covariance between the unit-specific time-constant error term (Alpha) and the time-varying error term (E*) to zero
        cov(_OEx*(E1 E2 E3 E4 E5 E6)@0) ///sets the covariances between all of the observed exogenous variables (_OEx*) and the new error terms (E*) equal to zero
        cov(E1*(E2 E3 E4 E5 E6)@0) cov(E2*(E3 E4 E5 E6)@0) cov(E3*(E4 E5 E6)@0) cov(E4*(E5 E6)@0) cov(E5*(E6)@0) ///constrains all the new error terms (E*) to be uncorrelated with each other
        cov((d1lwage2 d1union2)*(E1)) cov((d1lwage3 d1union3)*(E1 E2)) cov((d1lwage4 d1union4)*(E1 E2 E3)) cov((d1lwage5 d1union5)*(E1 E2 E3 E4)) cov((d1lwage6 d1union6)*(E1 E2 E3 E4 E5)) cov((d1lwage7 d1union7)*(E1 E2 E3 E4 E6)) ///sequential exogeneity assumption: allow the new error terms to be correlated with future values of the predetermined predictors, union and lwage
        method(mlmv) 
        
    ************** estimations using reg for comparison ************************
    ** import the dataset
    use https://www3.nd.edu/~rwilliam/statafiles/wages, clear
    keep wks lwage union id t
    
    ** specify the data as panel data
    xtset id t
     
    ** first difference estimation 
    gen year1=1 if t==1
    replace year1 = 0 if missing(year1)
    gen year2=1 if t==2
    replace year2 = 0 if missing(year2)
    gen year3=1 if t==3
    replace year3 = 0 if missing(year3)
    gen year4=1 if t==4
    replace year4 = 0 if missing(year4)
    gen year5=1 if t==5
    replace year5 = 0 if missing(year5)
    gen year6=1 if t==6
    replace year6 = 0 if missing(year6)
    gen year7=1 if t==7
    replace year7 = 0 if missing(year7)
    reg D.wks D.lwage D.union D.year2 D.year3 D.year4 D.year5 D.year6 D.year7, nocons
    Does anyone know if the above code of the first difference estimator with Stata's sem command is correct? I have also tried the code with other datasets, which also do not converge, so I think the problem is with an incorrect specification rather than a pure non-convergence problem.

    Any suggestions would be appreciated.

    Kind regards,

    Eva

  • #2
    Code:
    webuse nlswork, clear
    keep if year<73
    xtset idcode year
    xi: reg D.(ln_wage age hours i.year), nocons cluster(idcode)
    xi: sem (D.ln_wage <- D.(age hours i.year)), nocons vce(cluster idcode) nolog
    Res.:

    Code:
    . xi: reg D.(ln_wage age hours i.year), nocons cluster(idcode)
    i.year            _Iyear_68-72        (naturally coded; _Iyear_68 omitted)
    
    Linear regression                               Number of obs     =      4,381
                                                    F(6, 1926)        =      56.09
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0558
                                                    Root MSE          =     .28823
    
                                 (Std. Err. adjusted for 1,927 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
       D.ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |
             D1. |  -.0240623   .0261473    -0.92   0.358    -.0753423    .0272178
                 |
           hours |
             D1. |  -.0014857   .0009341    -1.59   0.112    -.0033176    .0003462
                 |
       _Iyear_69 |
             D1. |   .1260502   .0265693     4.74   0.000     .0739427    .1781578
                 |
       _Iyear_70 |
             D1. |    .177628   .0529456     3.35   0.001     .0737912    .2814648
                 |
       _Iyear_71 |
             D1. |   .2795753   .0793518     3.52   0.000     .1239508    .4351997
                 |
       _Iyear_72 |
             D1. |   .3566226   .1049793     3.40   0.001     .1507375    .5625076
    ------------------------------------------------------------------------------
    
    . 
    . xi: sem (D.ln_wage <- D.(age hours i.year)), nocons vce(cluster idcode) nolog
    i.year            _Iyear_68-72        (naturally coded; _Iyear_68 omitted)
    (3456 observations with missing values excluded)
    
    Endogenous variables
    
    Observed:  D.ln_wage
    
    Exogenous variables
    
    Observed:  D.age D.hours D._Iyear_69 D._Iyear_70 D._Iyear_71 D._Iyear_72
    
    Structural equation model                       Number of obs     =      4,381
    Estimation method    = ml
    Log pseudolikelihood = -32517.077
    
     ( 1)  [D.ln_wage]_cons = 0
                                   (Std. Err. adjusted for 1,927 clusters in idcode)
    --------------------------------------------------------------------------------
                   |               Robust
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    Structural     |
      D.ln_wage    |
               age |
               D1. |  -.0240623   .0261324    -0.92   0.357    -.0752809    .0271563
             hours |
               D1. |  -.0014857   .0009335    -1.59   0.112    -.0033154     .000344
         _Iyear_69 |
               D1. |   .1260502   .0265541     4.75   0.000     .0740052    .1780953
         _Iyear_70 |
               D1. |    .177628   .0529154     3.36   0.001     .0739157    .2813403
         _Iyear_71 |
               D1. |   .2795753   .0793065     3.53   0.000     .1241374    .4350131
         _Iyear_72 |
               D1. |   .3566226   .1049194     3.40   0.001     .1509844    .5622607
             _cons |          0  (constrained)
    ---------------+----------------------------------------------------------------
    var(De.ln_wage)|   .0829637   .0045727                      .0744684     .092428
    --------------------------------------------------------------------------------
    
    .
    Last edited by Andrew Musau; 17 Feb 2022, 07:02.

    Comment


    • #3
      Andrew Musau, thanks a lot. This is exactly what I was looking for!

      I have an additional question: Is it true that the fixed effects and random effects estimator can only be coded with Stata's sem command if the data are in wide format and not in long format? In my research I have a very large number of time periods, which makes the code very complex and error prone.

      Code:
      ** import the dataset and declare the data as panel data
      webuse nlswork, clear
      keep if year<73
      xtset idcode year
      
      ** fixed effects estimation using xtreg
      xi: xtreg ln_wage age hours i.year, fe cluster(idcode)
      
      ** random effects estimation using xtreg
      xi: xtreg ln_wage age hours i.year, re cluster(idcode)
      
      ** fixed effects estimation using sem
      keep ln_wage age hours idcode year
      reshape wide ln_wage age hours, i(idcode) j(year)
      xi: sem (ln_wage68 <- age68@a hours68@b Alpha@1) ///
              (ln_wage69 <- age69@a hours69@b Alpha@1) ///
              (ln_wage70 <- age70@a hours70@b Alpha@1) ///
              (ln_wage71 <- age71@a hours71@b Alpha@1) ///
              (ln_wage72 <- age72@a hours72@b Alpha@1), ///
              var((e.ln_wage*)@c) ///constrain the error variances to be the same
              method(mlmv)
              
      ** random effects estimation using sem
      xi: sem (ln_wage68 <- age68@a hours68@b Alpha@1) ///
              (ln_wage69 <- age69@a hours69@b Alpha@1) ///
              (ln_wage70 <- age70@a hours70@b Alpha@1) ///
              (ln_wage71 <- age71@a hours71@b Alpha@1) ///
              (ln_wage72 <- age72@a hours72@b Alpha@1), ///
              var((e.ln_wage*)@c) //////constrain the error variances to be the same
              cov(Alpha*(age* hours*)@0) ///the random effects assumption
              method(mlmv)
      Thank you in advance

      Kind regards,

      Eva

      Comment


      • #4
        You can estimate a linear fixed effects model using indicators for the fixed effects, but this is impractical or inefficient if you have too many indicators. I hardly use sem, but your assertion is consistent with the tutorials that I have seen. On the first statement:


        Code:
        webuse grunfeld, clear
        xtset company year
        xtreg invest mvalue kstock, fe
        xi: sem (invest <- mvalue kstock i.company), nolog
        Res.:

        Code:
        . xtreg invest mvalue kstock, fe
        
        Fixed-effects (within) regression               Number of obs     =        200
        Group variable: company                         Number of groups  =         10
        
        R-sq:                                           Obs per group:
             within  = 0.7668                                         min =         20
             between = 0.8194                                         avg =       20.0
             overall = 0.8060                                         max =         20
        
                                                        F(2,188)          =     309.01
        corr(u_i, Xb)  = -0.1517                        Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
              invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              mvalue |   .1101238   .0118567     9.29   0.000     .0867345    .1335131
              kstock |   .3100653   .0173545    17.87   0.000     .2758308    .3442999
               _cons |  -58.74393   12.45369    -4.72   0.000    -83.31086     -34.177
        -------------+----------------------------------------------------------------
             sigma_u |  85.732501
             sigma_e |  52.767964
                 rho |  .72525012   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(9, 188) = 49.18                     Prob > F = 0.0000
        
        . xi: sem (invest <- mvalue kstock i.company), nolog
        i.company         _Icompany_1-10      (naturally coded; _Icompany_1 omitted)
        
        Endogenous variables
        
        Observed:  invest
        
        Exogenous variables
        
        Observed:  mvalue kstock _Icompany_2 _Icompany_3 _Icompany_4 _Icompany_5 _Icompany_6 _Icompany_7 _Icompany_8
                   _Icompany_9 _Icompany_10
        
        Structural equation model                       Number of obs     =        200
        Estimation method  = ml
        Log likelihood     =  -4129.597
        
        --------------------------------------------------------------------------------
                       |                 OIM
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        ---------------+----------------------------------------------------------------
        Structural     |
          invest       |
                mvalue |   .1101238   .0114955     9.58   0.000      .087593    .1326545
                kstock |   .3100653   .0168258    18.43   0.000     .2770874    .3430433
           _Icompany_2 |   172.2025   30.21196     5.70   0.000     112.9882    231.4169
           _Icompany_3 |  -165.2751   30.80755    -5.36   0.000    -225.6568   -104.8935
           _Icompany_4 |    42.4874    42.5722     1.00   0.318    -40.95259    125.9274
           _Icompany_5 |  -44.32013   48.95406    -0.91   0.365    -140.2683    51.62806
           _Icompany_6 |   47.13539   45.38464     1.04   0.299    -41.81687    136.0877
           _Icompany_7 |   3.743212   49.02452     0.08   0.939    -92.34307     99.8295
           _Icompany_8 |   12.75103    42.7106     0.30   0.765    -70.96021    96.46228
           _Icompany_9 |  -16.92558   46.97718    -0.36   0.719    -108.9992      75.148
          _Icompany_10 |   63.72884   48.79697     1.31   0.192    -31.91146    159.3691
                 _cons |  -70.29669   48.19365    -1.46   0.145    -164.7545    24.16114
        ---------------+----------------------------------------------------------------
          var(e.invest)|   2617.391   261.7391                      2151.535    3184.115
        --------------------------------------------------------------------------------
        LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .
        
        .
        Last edited by Andrew Musau; 18 Feb 2022, 07:03.

        Comment


        • #5
          @Andrew Musau, thank you so much for your insights! I understand it much better now.

          Kind regards,
          Eva

          Comment

          Working...
          X