Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Structural equation model (SEM): degree of freedom and bootstrapping

    Dear Statalist,

    First of all, Happy holidays to the people on this forum!
    I have 2 questions concerning structural equation modelling (SEM) using Stata SE 14 on Mac OS 10.13.

    The first question is basically "how do I calculate the degree of freedom (DoF) for an SEM model". I am aware that the definition of DoF is:
    number of information ( k(k+1/2) where k is the number of variables ) minus number of parameters one wishes to estimate.

    However, this formula does not seem to work for my case, where I aim to run a very simple mediation model to test if loneliness mediates the association between stigma and depression in my sample (n=350) using the command:
    Code:
    sem (lonely -> depress, ) (stigma -> depress, ) (stigma -> lonely, ),  nocapslatent
    There are three variables here (k=3) so the number of information should be 6 (3(3+1)/2). I am only estimating 5 parameters (3 pathways shown in the sem code and 2 error variances). This should give a DoF of 1. I am confused why the results show that there is a 0 degree of freedom?

    The second question concerns Bootstrap failures. Here I used another data (n=120, no missing value) to test the same mediation effect mentioned above, but as can be seen in my codes below, a measurement component is included so that there are 3 indicators for stigma, which is now represented by a latent variable. I have also adjusted for employment and education level.

    Code:
    . sem (latentstigma -> gih_m, ) (latentstigma -> lih_r, ) (latentstigma -> lih_atol, ) (latentstigma loneliness employment education -> depression, ) (latentstigma employment education -> loneliness, ), latent(latentih ) nocapslatent vce(bootstrap, reps(10) seed(1234))
    (running sem on estimation sample)
    
    Bootstrap replications (10)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    .x.x..xx..
    
    Structural equation model                       Number of obs     =        120
    Log likelihood = -1926.6482                     Replications      =          6
    The results showed a high number of bootstrap failures (4 out of 10). It also took unusually long to run compared to my other, more complex, sem models (approx. 30 minutes for just 10 reps)
    There has been previous report of bootstrap failure on this forum where the -noisily option is suggested to diagnose the bootstrap execution. I have done so with the following codes:

    Code:
    program bootsem1, rclass
         sem (latentstigma -> gih_m, ) (latentstigma -> lih_r, ) (latentstigma -> lih_atol, ) (latentstigma loneliness employment education -> depression, ) (latentstigma employment education-> loneliness, ), latent(latentstigma ) nocapslatent
         estat teffects, compact
         mat ind = r(indirect)
         mat dir = r(direct)
         mat tot = r(total)
         return scalar ind = ind[1,2] 
         return scalar dirih = dir[1,2]
         return scalar dirlonely = dir[1,1]
         return scalar tot = tot[1,2]
    end
    
    set seed 1234
    bootstrap r(ind) r(dirih) r(dirlonely) r(tot), noisily reps(10) : bootsem1
    The results suggested the failed bootstraps ran more than 15,000 iterations and yielded the error message:
    Code:
    Convergence not achieved
    an error occurred when bootstrap executed bootsem1, posting missing values
    These bootstraps are also the main reason it took so long.
    As mentioned, this dataset has no missing value, I am therefore not sure how this came to be and would very much like to know what you think may have gone wrong.
    I hope the above question has been presented clearly and following the correct formats.
    Please kindly let me know if I can provide any additional information.
    Any and all help is very deeply appreciated. Thank you in advance.

    Kai-Yuan

  • #2
    Happy holidays and new year to you too. I have not run SEM in Stata yet, but I can address your first question. You have zero degrees of freedom because you are estimating 2 error variances (the mediator and outcome), as well as one unconditional variance (the exogenous predictor). Thus your model is saturated, unless you remove a path or have information to somehow impose a constraint on one or more variances (conditional or unconditional).
    Good luck,
    Brian

    Comment


    • #3
      Hi Brian,

      Thank you very much for your reply. However, I am not sure I fully understood. It will be very kind of you if you will elaborate.
      Isn't unconditional variance of the exogenous variable simply the variable's own variance, which is actually one piece of known information from the data (as far as I am concerned, information from the data are the variances of the variables and the covariances between the variables), rather than something the model needs to estimate?
      Furthermore, reading the outputs from Stata, I have trouble seeing how and where the unconditional variance is being estimated:

      Code:
      Endogenous variables
      
      Observed:  loneliness depression
      
      Exogenous variables
      
      Observed:  stigma
      
      Fitting target model:
      
      Iteration 0:   log likelihood =  -1313.622  
      Iteration 1:   log likelihood =  -1313.622  
      
      Structural equation model                       Number of obs     =        120
      Estimation method  = ml
      Log likelihood     =  -1313.622
      
      ----------------------------------------------------------------------------------
                       |                 OIM
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
      Structural       |
        lonely     <-  |
                stigma |   .1884954   .0862113     2.19   0.029     .0195244    .3574664
                 _cons |   33.01353   3.380739     9.77   0.000     26.38741    39.63966
        ---------------+----------------------------------------------------------------
        depression <-  |
            lonely     |   .5583737   .0644325     8.67   0.000     .4320883    .6846591
                stigma |   .1156213   .0620501     1.86   0.062    -.0059947    .2372373
                 _cons |  -17.04876   3.196672    -5.33   0.000    -23.31412   -10.78339
      -----------------+----------------------------------------------------------------
          var(e.lonely)|   105.0193   13.55794                      81.54168    135.2568
         var(e.depress)|   52.31918   6.754377                      40.62293    67.38303
      ----------------------------------------------------------------------------------
      LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .
      Again, thank you for your help!

      Kai-Yuan
      Last edited by Kai-Yuan Cheng; 23 Dec 2019, 05:22.

      Comment


      • #4
        Hello Kai-Yuan,
        Sorry for the delay replying. Yes, the variance of the exogenous variable is often simply its unconditional variance. But it is still estimated and still an element in your observed data covariance matrix that is being reproduced by the model. Let's say your exogenous variable was a treatment/control dummy variable with equal n's per group. In that case, the exogenous variance could be fixed to the known quantity, thereby saving a df. Hope this helps.
        Brian

        Comment


        • #5
          Originally posted by Brian Flaherty View Post
          Hello Kai-Yuan,
          Sorry for the delay replying. Yes, the variance of the exogenous variable is often simply its unconditional variance. But it is still estimated and still an element in your observed data covariance matrix that is being reproduced by the model. Let's say your exogenous variable was a treatment/control dummy variable with equal n's per group. In that case, the exogenous variance could be fixed to the known quantity, thereby saving a df. Hope this helps.
          Brian
          Hi Brian,

          I hope you have been well during this difficult time.
          I am sorry for leaving this post unattended for some time. After a few days of inactivity, I have assumed that this post would not be replied and stopped checking it.
          Thank you very much for helping with my follow-up question; revisiting this topic again now I seem to be more capable to understand your explanations!

          All the bests,
          Kai-Yuan

          Comment

          Working...
          X