Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Longitudinal IRT model

    Hi,

    I am trying to estimate a longitudinal IRT model using GSEM in Stata 17. Using example data from https://www.stata-press.com/data/r17/gsem_cfa, I can estimate a cross-sectional IRT model accounting for the fact that students are nested within school (example 30g from the SEM manual)
    Code:
    gsem (q1-q8 <- MathAb M1[school]), logit
    Now, let’s say data are longitudinal and that students have repeated measurements of q1-q8 over time:
    Code:
    gen time = school
    drop id
    rename school id
    To model the fact that Math Ability may depend on time, we can write:
    Code:
    gsem (q1-q8 <- MathAb@1 M1[id]@1) (MathAb<-time), logit
    In this model, M1[id] is a random intercept (i.e. a latent variable mean 0 and variance var(M1[id])) which captures the between-student variability.
    What I would to do is adding a random slope to this model, to specify that the effect of time on Math Ability may vary across students. I was thinking of doing it this way
    Code:
    gsem (q1-q8 <- MathAb@1 M1[id]@1) (MathAb<-time M2[id]) , logit
    but this generates the following error message “MathAb may not be the destination of a path from M1[id]”. I understand that is not possible to estimate the variance of both the random effects without setting additional constraints, but even when constraining the variance of the random intercept to be equal to 1 with the option var(M1[id]@1), I still get the same message.
    Is there any way to add such a random slope?

    Best,

  • #2
    Hi,
    I don't have a solution for your specific GSEM problem, as I am not very familiar with GSEM in Stata. But I have done multiple longitudinal IRT analyses with my package, UIRT, and from your description it looks like it is something that can be handled with it. You would however end up with a set of plausible values (PVs) conditioned with a latent regression according to the model you specify. That would require additional analysis of these PVs.

    Firstly, just a small thing I noticed in you code. With this:
    Code:
    gen time = school
    drop id
    rename school id
    ...you end up with both the time and the id variables being defined as the same thing (the original school variable). Was it really your intention?

    Now to the point. From your description I understand you have multiple measurements of student abilities with some linked tests. And you want to fit a two-level latent IRT model, with the second level accounting for students being nested within measurements. Then, you want to see the effect of the time variable on the ability. But you are not only interested in the fixed effect but also assume that there might be a random slope. This seems like a following latent model:
    \theta_ij=\beta_0+\beta_1*time+e_j +\ksi_j*time+e_ij

    Where \beta_0 and \beta_1 are fixed effects and e_j is the random intercept, \ksi_j is the random slope and e_ij is the residual random part of student ability.
    I think that in order for this model to be identified you have to have more than two measurement occasions (if time takes only two values e_ij is completely determined by previous terms). You did not mention how many measurement occasions you have, but I assume that more than two.

    The code to obtain a set of, say, 10 plausible values conditioned according to such a latent regression under 2PL model would require running this:
    Code:
    ssc install uirt // in case not installed
    local npv = 10 // number of plausible values
    uirt q* ,theta(, pv(`npv') pvreg(time || id: time) )
    
    *Analyzing the coefficients for each pv:
    foreach pv of numlist 1/`npv'{
        mixed pv_`pv' time || id: time
    *    some syntax to grab estimates for averaging here
    }
    You would then have to average the results according to Rubin's rule to obtain the final estimates and their errors. Also, if standard errors of random effects are somehow especially important to you, there might be additional steps necessary in computing these.

    Maybe all this is not what you are looking for, nevertheless, it is an option.

    Good luck!

    Comment


    • #3
      Thank you very much for your reply. You are right about the portion of code for the longitudinal design as I meant something more like this:
      Code:
      drop id
      rename school id
      set seed 1234
      gen time=runiform(1,10)
      I am definitely going to take a look at UIRT.
      Best,

      Comment


      • #4
        So, your time is a continuous variable? I was expecting a few levels, like students measured at couple of occasions...

        Anyhow. If you decide to go for generating conditioned PVs with UIRT, I recommend that you first start with point estimates of theta, to make sure that your multilevel conditioning statement of MIXED converges to a reasonable output. It may take time to generate PVs conditioned with a complicated multilevel model if your dataset is large or the multilevel model does not converge well. So something like:
        Code:
        uirt q*, theta(, eap)
        mixed theta time || id: time
        And if MIXED gives reasonable output go for the PVs.
        Let me know if you encounter any problems. I am happy to help.

        Comment


        • #5
          Originally posted by Bartosz Kondratek View Post
          Hi,
          I don't have a solution for your specific GSEM problem, as I am not very familiar with GSEM in Stata. But I have done multiple longitudinal IRT analyses with my package, UIRT, and from your description it looks like it is something that can be handled with it. You would however end up with a set of plausible values (PVs) conditioned with a latent regression according to the model you specify. That would require additional analysis of these PVs.

          Firstly, just a small thing I noticed in you code. With this:
          Code:
          gen time = school
          drop id
          rename school id
          ...you end up with both the time and the id variables being defined as the same thing (the original school variable). Was it really your intention?

          Now to the point. From your description I understand you have multiple measurements of student abilities with some linked tests. And you want to fit a two-level latent IRT model, with the second level accounting for students being nested within measurements. Then, you want to see the effect of the time variable on the ability. But you are not only interested in the fixed effect but also assume that there might be a random slope. This seems like a following latent model:
          \theta_ij=\beta_0+\beta_1*time+e_j +\ksi_j*time+e_ij

          Where \beta_0 and \beta_1 are fixed effects and e_j is the random intercept, \ksi_j is the random slope and e_ij is the residual random part of student ability.
          I think that in order for this model to be identified you have to have more than two measurement occasions (if time takes only two values e_ij is completely determined by previous terms). You did not mention how many measurement occasions you have, but I assume that more than two.

          The code to obtain a set of, say, 10 plausible values conditioned according to such a latent regression under 2PL model would require running this:
          Code:
          ssc install uirt // in case not installed
          local npv = 10 // number of plausible values
          uirt q* ,theta(, pv(`npv') pvreg(time || id: time) )
          
          *Analyzing the coefficients for each pv:
          foreach pv of numlist 1/`npv'{
          mixed pv_`pv' time || id: time
          * some syntax to grab estimates for averaging here
          }
          You would then have to average the results according to Rubin's rule to obtain the final estimates and their errors. Also, if standard errors of random effects are somehow especially important to you, there might be additional steps necessary in computing these.

          Maybe all this is not what you are looking for, nevertheless, it is an option.

          Good luck!
          Point of information: If you go this route, I'd recommend manually mi setting the data and using the multiple imputation framework. You'd declare the plausible values as imputations. I haven't done this myself yet, so I have no specific syntax to offer, but the mi commands have the facility to import data as mi data.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Thank you for joining the thread. Regarding:
            Point of information: If you go this route, I'd recommend manually mi setting the data and using the multiple imputation framework. You'd declare the plausible values as imputations. I haven't done this myself yet, so I have no specific syntax to offer, but the mi commands have the facility to import data as mi data.
            Could you please provide some guidance on how to set up mi to use the PVs provided by a user. Lets us take a simple example:
            Code:
            webuse masc2, clear
            qui uirt q*, theta(,pv(5) pvreg(female))
            regress pv_1 female
            How to convince mi to run the regression in the third line of the code above, with plausible values pv_1-pv_5 treated as multiple imputations of the missing theta variable? I am asking mainly for myself, never figured it out and used either https://ideas.repec.org/c/boc/bocode/s456951.html or programmed something by myself. But if Bastien decides to go along this route it will surely be useful example for him as well. And I suppose lots of other people have similar datasets in wide format to analyze (PISA, TIMSS, PIRLS etc).
            Last edited by Bartosz Kondratek; 04 Apr 2022, 10:27.

            Comment


            • #7
              Originally posted by Bartosz Kondratek View Post
              Thank you for joining the thread. Regarding:

              Could you please provide some guidance on how to set up mi to use the PVs provided by a user. Lets us take a simple example:
              Code:
              webuse masc2, clear
              qui uirt q*, theta(,pv(5) pvreg(female))
              regress pv_1 female
              How to convince mi to run the regression in the third line of the code above, with plausible values pv_1-pv_5 treated as multiple imputations of the missing theta variable? I am asking mainly for myself, never figured it out and used either https://ideas.repec.org/c/boc/bocode/s456951.html or programmed something by myself. But if Bastien decides to go along this route it will surely be useful example for him as well. And I suppose lots of other people have similar datasets in wide format to analyze (PISA, TIMSS, PIRLS etc).
              Again, I've never done mi import myself. However, building off the example given in the manual, try this:

              Code:
              mi import wide, imputed(pv = pv_1 pv_2 pv_3 pv_4 pv_5)
              mi estimate: regress pv i.female
              Then, for Bastien's information, I haven't explored how to build the syntax for an explanatory IRT model with random effects (I think we all know what you mean, but I think this is a more precise title). I think the correct SEM example to base the syntax off would be example 38.
              Last edited by Weiwen Ng; 04 Apr 2022, 17:08.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Originally posted by Weiwen Ng View Post

                Again, I've never done mi import myself. However, building off the example given in the manual, try this:

                Code:
                mi import wide, imputed(pv = pv_1 pv_2 pv_3 pv_4 pv_5)
                mi estimate: regress pv i.female
                Brilliant. I was stuck in using mi because I was trying to figure out how to define it with mi set. Somehow never noticed mi import wide. The only thing that your example is missing is that an empty pv variable should be present in the dataset before mi import wide is called. But I will call this unobserved variable theta, and the complete working example is:
                Code:
                webuse masc2, clear
                set seed 31415
                uirt q*, theta(,pv(5) pvreg(female))
                gen theta=.
                mi import wide, imputed(theta = pv_1 pv_2 pv_3 pv_4 pv_5) clear
                mi estimate: regress theta i.female
                The output is exactly what it should be. This is a game changer. Thanks!

                Comment

                Working...
                X