Longitudinal IRT model

Bastien Perrot

Join Date: Jun 2019

Posts: 2
#1

Longitudinal IRT model

29 Mar 2022, 07:03

Hi,

I am trying to estimate a longitudinal IRT model using GSEM in Stata 17. Using example data from https://www.stata-press.com/data/r17/gsem_cfa, I can estimate a cross-sectional IRT model accounting for the fact that students are nested within school (example 30g from the SEM manual)

Code:

gsem (q1-q8 <- MathAb M1[school]), logit

Now, let’s say data are longitudinal and that students have repeated measurements of q1-q8 over time:

Code:

gen time = school drop id rename school id

To model the fact that Math Ability may depend on time, we can write:

Code:

gsem (q1-q8 <- MathAb@1 M1[id]@1) (MathAb<-time), logit

In this model, M1[id] is a random intercept (i.e. a latent variable mean 0 and variance var(M1[id])) which captures the between-student variability.
What I would to do is adding a random slope to this model, to specify that the effect of time on Math Ability may vary across students. I was thinking of doing it this way

Code:

gsem (q1-q8 <- MathAb@1 M1[id]@1) (MathAb<-time M2[id]) , logit

but this generates the following error message “MathAb may not be the destination of a path from M1[id]”. I understand that is not possible to estimate the variance of both the random effects without setting additional constraints, but even when constraining the variance of the random intercept to be equal to 1 with the option var(M1[id]@1), I still get the same message.
Is there any way to add such a random slope?

Best,
Tags: None
Bartosz Kondratek

Join Date: Oct 2016

Posts: 10
#2

02 Apr 2022, 11:43

Hi,
I don't have a solution for your specific GSEM problem, as I am not very familiar with GSEM in Stata. But I have done multiple longitudinal IRT analyses with my package, UIRT, and from your description it looks like it is something that can be handled with it. You would however end up with a set of plausible values (PVs) conditioned with a latent regression according to the model you specify. That would require additional analysis of these PVs.

Firstly, just a small thing I noticed in you code. With this:

Code:

gen time = school drop id rename school id

...you end up with both the time and the id variables being defined as the same thing (the original school variable). Was it really your intention?

Now to the point. From your description I understand you have multiple measurements of student abilities with some linked tests. And you want to fit a two-level latent IRT model, with the second level accounting for students being nested within measurements. Then, you want to see the effect of the time variable on the ability. But you are not only interested in the fixed effect but also assume that there might be a random slope. This seems like a following latent model:

\theta_ij=\beta_0+\beta_1*time+e_j +\ksi_j*time+e_ij

Where \beta_0 and \beta_1 are fixed effects and e_j is the random intercept, \ksi_j is the random slope and e_ij is the residual random part of student ability.
I think that in order for this model to be identified you have to have more than two measurement occasions (if time takes only two values e_ij is completely determined by previous terms). You did not mention how many measurement occasions you have, but I assume that more than two.

The code to obtain a set of, say, 10 plausible values conditioned according to such a latent regression under 2PL model would require running this:

Code:

ssc install uirt // in case not installed local npv = 10 // number of plausible values uirt q* ,theta(, pv(`npv') pvreg(time || id: time) ) *Analyzing the coefficients for each pv: foreach pv of numlist 1/`npv'{ mixed pv_`pv' time || id: time * some syntax to grab estimates for averaging here }

You would then have to average the results according to Rubin's rule to obtain the final estimates and their errors. Also, if standard errors of random effects are somehow especially important to you, there might be additional steps necessary in computing these.

Maybe all this is not what you are looking for, nevertheless, it is an option.

Good luck!
Comment
Bastien Perrot

Join Date: Jun 2019

Posts: 2
#3

04 Apr 2022, 07:01

Thank you very much for your reply. You are right about the portion of code for the longitudinal design as I meant something more like this:

Code:

drop id rename school id set seed 1234 gen time=runiform(1,10)

I am definitely going to take a look at UIRT.
Best,
Comment
Bartosz Kondratek

Join Date: Oct 2016

Posts: 10
#4

04 Apr 2022, 07:29

So, your time is a continuous variable? I was expecting a few levels, like students measured at couple of occasions...

Anyhow. If you decide to go for generating conditioned PVs with UIRT, I recommend that you first start with point estimates of theta, to make sure that your multilevel conditioning statement of MIXED converges to a reasonable output. It may take time to generate PVs conditioned with a complicated multilevel model if your dataset is large or the multilevel model does not converge well. So something like:

Code:

uirt q*, theta(, eap) mixed theta time || id: time

And if MIXED gives reasonable output go for the PVs.
Let me know if you encounter any problems. I am happy to help.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

04 Apr 2022, 08:17

Originally posted by Bartosz Kondratek View Post

Hi,
I don't have a solution for your specific GSEM problem, as I am not very familiar with GSEM in Stata. But I have done multiple longitudinal IRT analyses with my package, UIRT, and from your description it looks like it is something that can be handled with it. You would however end up with a set of plausible values (PVs) conditioned with a latent regression according to the model you specify. That would require additional analysis of these PVs.

Firstly, just a small thing I noticed in you code. With this:

Code:

gen time = school drop id rename school id

...you end up with both the time and the id variables being defined as the same thing (the original school variable). Was it really your intention?

Now to the point. From your description I understand you have multiple measurements of student abilities with some linked tests. And you want to fit a two-level latent IRT model, with the second level accounting for students being nested within measurements. Then, you want to see the effect of the time variable on the ability. But you are not only interested in the fixed effect but also assume that there might be a random slope. This seems like a following latent model:

\theta_ij=\beta_0+\beta_1*time+e_j +\ksi_j*time+e_ij

Where \beta_0 and \beta_1 are fixed effects and e_j is the random intercept, \ksi_j is the random slope and e_ij is the residual random part of student ability.
I think that in order for this model to be identified you have to have more than two measurement occasions (if time takes only two values e_ij is completely determined by previous terms). You did not mention how many measurement occasions you have, but I assume that more than two.

The code to obtain a set of, say, 10 plausible values conditioned according to such a latent regression under 2PL model would require running this:

Code:

ssc install uirt // in case not installed local npv = 10 // number of plausible values uirt q* ,theta(, pv(`npv') pvreg(time || id: time) ) *Analyzing the coefficients for each pv: foreach pv of numlist 1/`npv'{ mixed pv_`pv' time || id: time * some syntax to grab estimates for averaging here }

You would then have to average the results according to Rubin's rule to obtain the final estimates and their errors. Also, if standard errors of random effects are somehow especially important to you, there might be additional steps necessary in computing these.

Maybe all this is not what you are looking for, nevertheless, it is an option.

Good luck!

Point of information: If you go this route, I'd recommend manually mi setting the data and using the multiple imputation framework. You'd declare the plausible values as imputations. I haven't done this myself yet, so I have no specific syntax to offer, but the mi commands have the facility to import data as mi data.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Bartosz Kondratek

Join Date: Oct 2016

Posts: 10
#6

04 Apr 2022, 09:12

Thank you for joining the thread. Regarding:

Point of information: If you go this route, I'd recommend manually mi setting the data and using the multiple imputation framework. You'd declare the plausible values as imputations. I haven't done this myself yet, so I have no specific syntax to offer, but the mi commands have the facility to import data as mi data.

Could you please provide some guidance on how to set up mi to use the PVs provided by a user. Lets us take a simple example:

Code:

webuse masc2, clear qui uirt q*, theta(,pv(5) pvreg(female)) regress pv_1 female

How to convince mi to run the regression in the third line of the code above, with plausible values pv_1-pv_5 treated as multiple imputations of the missing theta variable? I am asking mainly for myself, never figured it out and used either https://ideas.repec.org/c/boc/bocode/s456951.html or programmed something by myself. But if Bastien decides to go along this route it will surely be useful example for him as well. And I suppose lots of other people have similar datasets in wide format to analyze (PISA, TIMSS, PIRLS etc).

Last edited by Bartosz Kondratek; 04 Apr 2022, 09:27.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

04 Apr 2022, 16:02

Originally posted by Bartosz Kondratek View Post

Thank you for joining the thread. Regarding:

Could you please provide some guidance on how to set up mi to use the PVs provided by a user. Lets us take a simple example:

Code:

webuse masc2, clear qui uirt q*, theta(,pv(5) pvreg(female)) regress pv_1 female

How to convince mi to run the regression in the third line of the code above, with plausible values pv_1-pv_5 treated as multiple imputations of the missing theta variable? I am asking mainly for myself, never figured it out and used either https://ideas.repec.org/c/boc/bocode/s456951.html or programmed something by myself. But if Bastien decides to go along this route it will surely be useful example for him as well. And I suppose lots of other people have similar datasets in wide format to analyze (PISA, TIMSS, PIRLS etc).

Again, I've never done mi import myself. However, building off the example given in the manual, try this:

Code:

mi import wide, imputed(pv = pv_1 pv_2 pv_3 pv_4 pv_5) mi estimate: regress pv i.female

Then, for Bastien's information, I haven't explored how to build the syntax for an explanatory IRT model with random effects (I think we all know what you mean, but I think this is a more precise title). I think the correct SEM example to base the syntax off would be example 38.

Last edited by Weiwen Ng; 04 Apr 2022, 16:08.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Bartosz Kondratek

Join Date: Oct 2016

Posts: 10
#8

04 Apr 2022, 16:52

Originally posted by Weiwen Ng View Post

Again, I've never done mi import myself. However, building off the example given in the manual, try this:

Code:

mi import wide, imputed(pv = pv_1 pv_2 pv_3 pv_4 pv_5) mi estimate: regress pv i.female

Brilliant. I was stuck in using mi because I was trying to figure out how to define it with mi set. Somehow never noticed mi import wide. The only thing that your example is missing is that an empty pv variable should be present in the dataset before mi import wide is called. But I will call this unobserved variable theta, and the complete working example is:

Code:

webuse masc2, clear set seed 31415 uirt q*, theta(,pv(5) pvreg(female)) gen theta=. mi import wide, imputed(theta = pv_1 pv_2 pv_3 pv_4 pv_5) clear mi estimate: regress theta i.female

The output is exactly what it should be. This is a game changer. Thanks!
1 like
Comment

Announcement

Longitudinal IRT model

Comment

Comment

Comment

Comment

Comment

Comment

Comment