Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • latent class analysis with ordinal data - does stata only need -ologit- to determine it's ordinal data

    hello I've been reading about latent class analysis with ordinal data

    I've read a lot of resources but they all seem to deal with either binary categorical data or continuous data

    The stata help says that it can work with ordinal data

    If I have a variable which is a subset of a patient-reported outcome questionnaire in the post-operative period, I have 4 selected questions and therefore 4 variables
    -varq1-
    -varq2-
    -varq3-
    -varq4-

    For each varq* 5 points is the best outcome, 0 points is the worst outcome

    I want to check the trajectory from the pre-operative state to the post-operative state in terms of 3 - 6 classes which hopefully will reflect the early and quick improvers all the way to late slow improvers

    I am using this code

    Code:
    ///takes into consideration only postoperative questions
    gsem (varq1 varq2 var3 var4 <-ologit), lclass(C 3)
    
    /// compare aic bic
    estimates storethreeclass 
    
    estimates lcmean

    Firstly, how does stata just need the 'ologit' to assume it's ordinal data?
    Secondly, how can one take into consideration and incorporate into the code the 'preoperative state questions' if I have equally the same variables -varq*- for the preopstate: -varqpreop*-

  • #2
    I don't think this is the right syntax.

    Code:
    gsem (varq1 varq2 var3 var4 <-ologit), lclass(C 3)
    Looks like you've put ologit where I would expect your latent class to be in the path expression, or you've missed a comma. You are also missing the q in var3 and var4. If you ran that line with the correct variable names, you should get a variable ologit not found error. Ologit needs to be an option. If this is the equation you want to estimate, then:

    Code:
    gsem (varq1 varq2 varq3 varq4 <- ), ologit lclass(C 3)
    Or alternatively:

    Code:
    gsem (varq1 varq2 varq3 varq4 <- , ologit), lclass(C 3)
    With respect to your first question: In the SEM context, everything the arrow points to in the path diagram is treated as an outcome. The ologit option specifies that the ordered logistic regression should be used to predict each outcome. In this case, your model says that your respondents belong to an unobserved set of latent classes, and your respondents' class determines their response on each of these measured ordinal variables. Since each of these variables is the outcome of an ordered logistic regression, it will be treated as an ordered set of categories in the same way that:

    Code:
    ologit varq1 varq2 varq3 varq4
    Will treat varq1 as ordinal. If that doesn't make sense, then let me ask you: What else do you think Stata would need to know to treat each outcome as ordinal?

    Your second question depends entirely on what, exactly, you are modeling. Do you think each preop question is determined by exactly the same set of latent classes? Do you think responses on each preop question is determined by a different, unrelated set of latent classes? Do you think the latent classes for pre and post op are separate, but related? Do you think responses on each preop question partially determine answers to post-op questions alongside the latent classes? Do you think responses on each preop question determine entry into your set of latent classes, and those latent classes determine postop answers? My point is that there isn't one straightforward way to do this.
    Last edited by Daniel Schaefer; 06 Feb 2024, 14:27.

    Comment


    • #3
      Thank you very much Daniel for your really detailed explanation

      Wow with respect to question 2, I didn’t think one could think about it in so many ways…

      however, I would have thought that exploring

      -preop- score levels in relation to their post operative trajectory may or may not be related.

      For example someone with a low preop score , I would expect to be an early and fast improver.

      so clearly, i would go with this question you’ve put forward.

      Do you think responses on each preop question determine entry into your set of latent classes, and those latent classes determine postop answers?

      my question, if I did want to explore this, should I include the -preop- variable scores into the model, something similar to this:

      Code:
      gsem (varq1 varq2 varq3 varq4 PREopq1 PREopq2 PREopq3 PREopq4<- ), ologit lclass(C 3)

      Comment


      • #4
        What you have at the end of #3 is what I thought of as the first question. In your model at the end of #3, you are saying preop and postop questions are caused by membership in the same set of latent classes. If you think something measured by the preop questions determines a respondents entry into the latent classes, then the latent classes should be endogenous, lying between the pre and post op questions, with an arrow going from the pre op questions to the latent class, and an arrow from the latent class to the post op questions. So the latent classes would be the outcome of the pre op questions and the predictor of the post op questions. Does that make sense? That is not the model you have in #3. In #3 you have a model where the latent class is a predictor and every response on each variable is an outcome of those latent classes.

        Many different models are possible with SEM. It's not just that "one can think about it in so many ways" it is that each way of thinking about it corresponds to a different model that you would code differently. What you have written above is certainly one way of modeling your data, but it is far from the only way. You need a specific theoretically motivated model to test here.

        Comment


        • #5
          Dear Daniel,

          I'm working on this again

          With regards to

          'the preop questions determines a respondents entry into the latent classes, then the latent classes should be endogenous, lying between the pre and post op questions, with an arrow going from the pre op questions to the latent class, and an arrow from the latent class to the post op questions. '

          I thought of writing my code in this format:

          Post-op varq1 - are the post op questions which are ordinal questions
          Equally the preopvarq1 are the same questions are the post-operative questions but in the pre-operative period.

          Code:
          gsem (postopvarq1 postopvarq2 postopvarq3 <- , ologit) (C <- preopqvar1 preopqvar2 preopqvar3), lclass(3)
          ///This in my opinion shows that membership into a class depends on the pre-operative question scores

          However, this is different from what you mentioned in the text I quoted in this post (bold text) . I wonder if this code below instead reflects better what you are referring to in bold

          Code:
          gsem (postopvarq1 postopvarq2 postopvarq3 <- , ologit), lclass(C 3) ///
          gsem (preopvarq1 preopvarq2 preopvarq3 <- , ologit), lclass(C 3)

          I look forward to your insight, many thanks

          Comment


          • #6
            Looks like your variable names have changed since #3. I think the first model in 5 better describes the situation highlighted in bold, though I think you might be missing the C in lclass(C 3).

            Code:
            gsem (postopvarq1 postopvarq2 postopvarq3 <- , ologit) (C <- preopqvar1 preopqvar2 preopqvar3), lclass(C 3)
            The syntax is not correct for the second model, but it seems like you might be looking for something like this:

            Code:
            gsem (postopvarq1 postopvarq2 postopvarq3 <- , ologit) ///
                 (preopqvar1 preopqvar2 preopqvar3 <- , ologit), lclass(C 3)
            That should be equivalent to this model:

            Code:
            gsem (postopvarq1 postopvarq2 postopvarq3 preopqvar1 preopqvar2 preopqvar3 <- , ologit), lclass(C 3)
            Which is similar to the model at the end of #3 (with different variables). Notice in the second model your latent class is entirely exogenous, whereas in the first, preop variables predict the latent class.

            Comment


            • #7
              Firstly thank you

              in order to clarify syntax

              Code:
              gsem (postopvarq1 postopvarq2 postopvarq3 <- , ologit) (C <- preopqvar1 preopqvar2 preopqvar3), lclass(C 3)
              ‘arrow from the latent class to the post op questions. '

              Just a comment, I KNOW there are rules but I would have thought that stata would want to place -lclass(c3) adjacent to the arrow (postopvarq3) instead it’s placed last.

              it’s really an observation..

              Comment


              • #8
                I don't see any syntax errors in #7.

                lclass appears at the end because it is an option, so it must come after the comma. the option tells stata that there is a latent set of 3 classes called C. You would have to do the same with continuous latent variables as well. Personally, I don't understand why this isn't valid syntax:

                Code:
                gsem (postopvarq1 postopvarq2 postopvarq3 <- C, ologit) (C <- preopqvar1 preopqvar2 preopqvar3), lclass(C 3)
                That's not to say there isn't a good reason, I just don't happen to know why that isn't valid syntax when this is valid:

                Code:
                gsem (postopvarq1 postopvarq2 postopvarq3 <- C, ologit) (C <- preopqvar1 preopqvar2 preopqvar3, regress), latent(C)

                Comment

                Working...
                X