Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "sem" with categorical covariates?

    Hi Statalist,

    I wad trying to do a path analysis: 1) controlled for some categorical variables (e.g. gender); 2) and output standardized coefficients.
    I wonder if there is a way to do BOTH 1) and 2)? I did "gsem" for one and "sem" for 2). I also checked STATA's sem manual, seems like categorical examples are all under "gsem".

    Thanks for any help in advance!

    Yingyi

  • #2
    sem assumes normality. categorical variables can't be normal. You can usually get standardized coefficients by standardizing your variables before analysis.

    Comment


    • #3
      Originally posted by Phil Bromiley View Post
      sem assumes normality. categorical variables can't be normal. You can usually get standardized coefficients by standardizing your variables before analysis.
      Thanks for your reply Phil. I never thought of that approach! I will give it a try.

      Also, I tried another way, i.e. re-form the categorical variables into dummy variables. For example, for gender (1=female, 2=male), then I created another 2 variables female (0=no, 1=yes) and male (0=No, 1=yes); then I put just female into the model (so male is the REF). In that case, the sem command ran without error and outputed standardized results. I guess this is an alternative to standardizing variables before analysis.,

      Comment


      • #4
        Originally posted by Yingyi Lin View Post
        I tried another way, i.e. re-form the categorical variables into dummy variables. For example, for gender (1=female, 2=male), then I created another 2 variables female (0=no, 1=yes) and male (0=No, 1=yes); then I put just female into the model (so male is the REF). In that case, the sem command ran without error and outputed standardized results. I guess this is an alternative to standardizing variables before analysis.,

        Yingyi et al. - I wrote this chunk of code below to show and convince myself that this approach would work -- I guess for any categorical (nominal, ordinal) with >=3 levels, one could recode them into binary indicators and include (k-1) of them into the SEM syntax.

        Code:
        ***********************************************************************
        sysuse auto, clear
        sum
        
        *create an arbitrary 3-level categorical var
        gen color = rep78    
        label var color "Color of vehicle"
            tab color, m
                recode color (1=1) (2=1) (3=2) (4=3) (5=3)
                replace color =1 if color==.
            tab color, m
        
        label define clab  1 "1 red"  2 "2 blue"  3 "3 white"
        label values color clab
        tab color, m
        tab color, m nolab
        
        *create binary dummies for the 3-level cateogorical variable
        tab color, m
        
            gen red =0 if color~=.
            gen blue =0 if color~=.
            gen white =0 if color~=.
            
            replace   red =1 if color==1
            replace  blue =1 if color==2
            replace white =1 if color==3
            
            tab color red, m
            tab color blue, m
            tab color white, m
            tab1 red blue white, m
        
        *simple regression, where Y=price, X=length
        *additional covariates (controls) incl: weight and color
        reg price length weight i.color, allbase
            *note: t-statistics are reported
        
            *verify the indicator approach produces the same estimates (reg)
            reg price length weight white blue
        
        *using the sem command
        *note: z-statistics are reported
        xi: sem (price <- length weight i.color), nolog 
        
            *verify that indicator approach produces the same estimates (sem)
            sem (price <- length weight blue white), nolog
        
        *using the sem command to verify incorrect specifications
        sem (price <- length weight red blue white)    // I don't expect this to converge bc all 3 indicators are included
            *confirmed: "convergence not achieved"
        
        sem (price <- length weight color), nolog    // this is definitely incorrect 
            *Stata is treating "color" as a continuous var (which is wrong)

        Comment


        • #5
          Hey Michael, thanks so much for the follow-up. And YES! I ended up with using the same strategy as what you did.

          Comment


          • #6
            Hi Yingyi Lin

            I read this thread between you and Michael Chen with great interest, since I am trying to overcome the same issue myself.

            I am using the sem command to fit my model and have 2-3 categorical demographic variables that I plan to enter into the model as dummy variables. I just wondered whether either of you were able to sign-post me to any additional resources (like Stata documentation or peer-reviewed articles) which endorse the use of dummy variables within the SEM framework? I've not been able to find any specific guidance on this and I'm a little wary about whether or not this is permissible, statistically speaking.

            Many thanks in advance for your steers - I realise this is a long shot since your conversation was 5 years ago now...!

            Thanks,
            Tania

            Comment


            • #7
              Hi everyone:

              I'm trying to use ordinal variables as independents variables in a Regression SEM. I tried to include in the model that variables with " i. ". But, in the SEM Builder, Stata said me: "On the "Variable" tab, the "Variable" name you entered is invalid".

              Do you know how can I use ordinal variables without re-form into dummy's variables?

              Thank you very much,
              Octavio.

              Comment


              • #8
                Originally posted by Michael Chen View Post


                Yingyi et al. - I wrote this chunk of code below to show and convince myself that this approach would work -- I guess for any categorical (nominal, ordinal) with >=3 levels, one could recode them into binary indicators and include (k-1) of them into the SEM syntax.

                Code:
                ***********************************************************************
                sysuse auto, clear
                sum
                
                *create an arbitrary 3-level categorical var
                gen color = rep78
                label var color "Color of vehicle"
                tab color, m
                recode color (1=1) (2=1) (3=2) (4=3) (5=3)
                replace color =1 if color==.
                tab color, m
                
                label define clab 1 "1 red" 2 "2 blue" 3 "3 white"
                label values color clab
                tab color, m
                tab color, m nolab
                
                *create binary dummies for the 3-level cateogorical variable
                tab color, m
                
                gen red =0 if color~=.
                gen blue =0 if color~=.
                gen white =0 if color~=.
                
                replace red =1 if color==1
                replace blue =1 if color==2
                replace white =1 if color==3
                
                tab color red, m
                tab color blue, m
                tab color white, m
                tab1 red blue white, m
                
                *simple regression, where Y=price, X=length
                *additional covariates (controls) incl: weight and color
                reg price length weight i.color, allbase
                *note: t-statistics are reported
                
                *verify the indicator approach produces the same estimates (reg)
                reg price length weight white blue
                
                *using the sem command
                *note: z-statistics are reported
                xi: sem (price <- length weight i.color), nolog
                
                *verify that indicator approach produces the same estimates (sem)
                sem (price <- length weight blue white), nolog
                
                *using the sem command to verify incorrect specifications
                sem (price <- length weight red blue white) // I don't expect this to converge bc all 3 indicators are included
                *confirmed: "convergence not achieved"
                
                sem (price <- length weight color), nolog // this is definitely incorrect
                *Stata is treating "color" as a continuous var (which is wrong)
                Especially when the number of groups is higher than 3, the "tabulate" command might help: https://www.stata.com/support/faqs/d...mmy-variables/ For example, this:

                Code:
                gen red =0 if color~=.
                gen blue =0 if color~=.
                gen white =0 if color~=.
                
                replace red =1 if color==1
                replace blue =1 if color==2
                replace white =1 if color==3
                could become:

                Code:
                tabulate color, generate(colour)
                rename colour1 red
                rename colour2 blue
                rename colour3 white

                Comment


                • #9
                  Originally posted by Tania Clarke View Post
                  I just wondered whether either of you were able to sign-post me to any additional resources (like Stata documentation or peer-reviewed articles) which endorse the use of dummy variables within the SEM framework? I've not been able to find any specific guidance on this and I'm a little wary about whether or not this is permissible, statistically speaking.
                  I think you're right in being doubtful about using categorical predictors with SEM. My understanding (also experiencing how likely it is for models with such covariates not to converge) is that categorical variables are such a strong violation of the multivariate normality assumption to create issues even with large datasets. However, to have a more informed opinion than mine, you can read this document:
                  http://www.stata.com/meeting/germany...y19_Langer.pdf
                  Last edited by Federico Tedeschi; 17 Nov 2022, 09:20.

                  Comment

                  Working...
                  X