Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Item Response Theory - calculating individual theta scores

    I'm looking for some advice on calculating individual theta scores for 900 people who've responded to a healthcare questionnaire, as part of a factor analysis/multidimensional item response theory analysis I'm trying to carry out - if anyone can help me?

    We've run rotated (promax oblique) factor analyses, and found 2 factors, each that load ~10 questions well.
    I've run the multidimensional IRT for each of the factors, and have results on discrimination and item difficulty, and looking now to calculate each individual participants' total theta score for each factor. We have individuals' total numerical score (for the questions that correspond to each factor), but I'm not sure how to calculate each persons theta score?

    If anyone has any suggestions, would be really appreciated. Thanks for your thought!

  • #2
    What are you using to estimate the IRT model? If it's irt or gsem, you want to predict the latent trait. In irt, you would use
    Code:
    predict theta, latent se(thetaSE)
    If using gsem, you would also use predict:
    Code:
    predict theta, latent se(thetaSE*)
    The gsem predict code might need to be massaged a bit, but you want the empirical Bayes estimates of the latent trait and ideally, the standard errors as well.

    Comment


    • #3
      Dear Erik,

      Thanks so much for your consideration.

      I'm using an IRT graded response model; my code is as follows:

      (i) For the first factor identified in rotated promax oblique factor analysis:

      irt grm q2 q3 q6 q7 q8 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24 (<< these questions load on the first factor)
      predict theta_1, latent


      (ii) for the second factor, similar:

      irt grm q15 q16 q28 q29 q30 q31
      predict theta_2, latent



      I seem to be getting some values; but I'm not sure if they're accurate.
      The code looks similar to yours; what is the "
      se(theta(SE)" command?

      Thanks again! William

      Comment


      • #4
        Dear Erik,

        I've attached a screenshot of my results:

        You can see the theta_1, and the theta_2 values, over in the right columns.

        My concern is that, on the left side, the mean_f1 and mean_f2 (the mean calculated scores for the questions loading on that factor), and the total_f1_avg and total_f2_avg (the total scores for questions loading on that factor, based on the mean*no. of questions) columns don't necessarily correspond perfectly to the theta estimates;

        For example: note the values for total_f1_avg are identical for row 1, and row2; but there is a difference in their theta_1 value..
        This is worrisome for me, for someone new to this style of research! Thanks so much again - Happy Easter
        Attached Files

        Comment


        • #5
          ** CorrectionL For example: note the values for total_f1_avg are identical for row 2, and row 3; but there is a difference in their theta_1 value..

          Comment


          • #6
            Simple averages of the items in the latent trait should be strongly correlated with the latent trait, but because of how the latent traits are estimated, using empirical Bayes prediction, we would not expect two individuals to have the same trait score even if their simple averages are equivalent. The IRT model takes into account the thresholds ("difficulties" in the IRT world) in a way the simple average does not, and an individual's item response pattern relative to the model estimated thresholds will play a role in their latent trait score. I recommend reading Stata's IRT documentation carefully as it has a lot of this information covered.

            Also note that unless you are estimating all the constructs/latent traits in the same model using gsem, you technically are not doing multidimensional IRT. Instead you are estimating as a single, unidimensional latent trait in your IRT model.

            Edit: the se option gives you the standard errors of an individual's latent trait estimate. These tell you accuracy of the predicted theta (theta-hat) with respect to the person location parameter, theta. It is useful to know how much uncertainty there is in the theta-hat estimates from your IRT model, and depending upon what you do with these theta-hat scores, you might consider using the standard error as a weight in any predictive modeling.
            Last edited by Erik Ruzek; 11 Apr 2020, 13:18.

            Comment


            • #7
              That makes great sense, thanks very much.
              I suspected that there was a difference due to the "real world difficulties" accounted for in IRT, not in the mean value.

              I have correlations between the IRT and the mean value of 0.71 for factor 1, and of 0.63 for factor 2 (which I interpret as strong, and moderate/strong, respectively).

              I've been reading the STATA PDF on IRT, too - it's been great, but there is a lot in there.

              Thanks so much, again, for clarifying. William

              Comment


              • #8
                Originally posted by Erik Ruzek View Post
                What are you using to estimate the IRT model? If it's irt or gsem, you want to predict the latent trait. In irt, you would use
                Code:
                predict theta, latent se(thetaSE)
                If using gsem, you would also use predict:
                Code:
                predict theta, latent se(thetaSE*)
                The gsem predict code might need to be massaged a bit, but you want the empirical Bayes estimates of the latent trait and ideally, the standard errors as well.
                Hi Erik Ruzek,

                I also have a question on how to compute disability score using irt pcm command. Specifically, I have 10 questions asking about levels (0-no difficulty to 5-extreme difficulty) of difficulty in executing several functioning activities.

                The followings are my codes to predict a disability score using those 10 questions
                Code:
                irt pcm k1-k10
                predict disability, latent se(disability_se)
                So my question is that is disability predicted from the command above the score that I want to obtain? and should I interpret the disability score that the higher the disability score, the severe the disability is? I also notice that the disability score also takes on negative values so how should I interpret those negative values?

                Data
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(k1 k2 k3 k4 k5 k6 k7 k8 k9 k10)
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                3 3 4 2 4 3 2 2 3 4
                0 0 0 0 0 0 0 0 0 0
                0 0 2 0 2 2 0 0 0 0
                2 3 4 0 1 1 0 0 0 0
                0 2 1 1 0 1 0 0 0 0
                0 0 1 0 2 1 1 0 0 0
                4 2 4 0 2 2 0 3 0 3
                0 0 3 1 2 0 0 0 0 0
                3 3 2 0 3 1 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                3 3 3 2 2 2 3 3 1 2
                1 0 0 0 0 2 0 0 0 0
                2 2 2 0 2 2 0 2 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 2 0 2 2 0 0 0 0
                0 0 0 0 1 1 0 0 0 0
                3 4 3 0 4 2 0 0 2 2
                0 1 2 0 2 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 1 0 0 0 0 0 0 0
                2 3 0 3 3 2 2 0 2 0
                0 0 0 0 0 0 0 0 0 0
                3 3 3 3 3 3 0 2 0 0
                4 3 4 3 4 3 3 2 3 3
                0 0 1 0 2 1 0 0 0 0
                0 4 1 2 0 1 0 0 0 0
                3 0 2 0 2 2 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                2 2 1 0 2 2 0 0 0 0
                0 0 0 0 0 1 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                1 0 2 0 0 2 0 0 0 0
                3 3 3 3 3 3 3 2 3 3
                0 2 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 2 0 0 2 0 0 0 0
                0 4 0 0 2 0 0 0 0 0
                4 3 4 0 1 2 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                2 0 2 0 2 0 0 0 0 0
                0 1 3 0 2 2 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                1 0 0 0 0 0 0 0 0 0
                0 0 2 1 1 1 3 0 3 0
                2 3 2 1 3 2 0 0 0 0
                0 0 2 0 1 2 0 0 0 0
                4 4 4 4 4 4 4 4 4 4
                0 0 2 0 1 1 0 0 0 0
                1 0 1 0 1 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                1 0 0 0 2 0 0 0 0 0
                3 3 3 0 3 3 0 0 0 0
                2 3 2 2 3 2 0 0 0 0
                4 3 3 3 4 4 4 0 0 0
                4 4 4 4 4 4 4 4 4 4
                1 0 3 0 2 2 1 0 1 1
                0 0 1 0 0 0 0 0 0 0
                0 0 0 0 0 1 0 0 0 0
                0 3 3 2 2 2 1 3 2 2
                4 4 4 2 4 3 3 3 3 4
                0 0 4 1 2 1 1 0 0 0
                0 1 2 0 1 2 0 0 0 2
                4 4 4 2 4 3 0 3 4 4
                0 0 0 0 1 1 0 0 0 0
                1 0 0 0 0 1 0 0 0 0
                2 4 4 3 2 2 0 0 0 0
                0 0 2 0 2 3 1 0 0 0
                0 0 0 0 2 1 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                1 2 2 0 2 1 0 1 1 1
                1 0 1 0 2 2 0 1 0 0
                0 0 0 0 1 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                4 0 3 0 0 2 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 2 0 2 1 2 0 0 0
                0 2 2 0 2 1 0 0 0 0
                1 0 0 0 0 2 0 0 0 0
                3 3 3 0 3 2 1 0 4 4
                3 0 0 0 3 0 0 0 2 1
                4 4 4 0 4 2 0 0 0 4
                0 2 4 0 2 3 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                0 0 0 0 0 0 0 0 0 0
                3 2 3 2 2 2 2 0 0 2
                4 4 4 4 4 4 4 4 4 4
                2 4 2 1 2 3 1 3 3 2
                end
                label values k1 k1
                label def k1 0 "No difficulty", modify
                label def k1 1 "Mild", modify
                label def k1 2 "Moderate", modify
                label def k1 3 "Severe", modify
                label def k1 4 "Can not do at all", modify
                label values k2 k2
                label def k2 0 "No difficulty", modify
                label def k2 1 "Mild", modify
                label def k2 2 "Moderate", modify
                label def k2 3 "Severe", modify
                label def k2 4 "Can not do at all", modify
                label values k3 k3
                label def k3 0 "No difficulty", modify
                label def k3 1 "Mild", modify
                label def k3 2 "Moderate", modify
                label def k3 3 "Severe", modify
                label def k3 4 "Can not do at all", modify
                label values k4 k4
                label def k4 0 "No difficulty", modify
                label def k4 1 "Mild", modify
                label def k4 2 "Moderate", modify
                label def k4 3 "Severe", modify
                label def k4 4 "Can not do at all", modify
                label values k5 k5
                label def k5 0 "No difficulty", modify
                label def k5 1 "Mild", modify
                label def k5 2 "Moderate", modify
                label def k5 3 "Severe", modify
                label def k5 4 "Can not do at all", modify
                label values k6 k6
                label def k6 0 "No difficulty", modify
                label def k6 1 "Mild", modify
                label def k6 2 "Moderate", modify
                label def k6 3 "Severe", modify
                label def k6 4 "Can not do at all", modify
                label values k7 k7
                label def k7 0 "No difficulty", modify
                label def k7 1 "Mild", modify
                label def k7 2 "Moderate", modify
                label def k7 3 "Severe", modify
                label def k7 4 "Can not do at all", modify
                label values k8 k8
                label def k8 0 "No difficulty", modify
                label def k8 1 "Mild", modify
                label def k8 2 "Moderate", modify
                label def k8 3 "Severe", modify
                label def k8 4 "Can not do at all", modify
                label values k9 k9
                label def k9 0 "No difficulty", modify
                label def k9 1 "Mild", modify
                label def k9 2 "Moderate", modify
                label def k9 3 "Severe", modify
                label def k9 4 "Can not do at all", modify
                label values k10 k10
                label def k10 0 "No difficulty", modify
                label def k10 1 "Mild", modify
                label def k10 2 "Moderate", modify
                label def k10 3 "Severe", modify
                label def k10 4 "Can not do at all", modify
                Thank you.

                Comment


                • #9
                  Originally posted by Jason Bourne View Post

                  Hi Erik Ruzek,

                  I also have a question on how to compute disability score using irt pcm command. Specifically, I have 10 questions asking about levels (0-no difficulty to 5-extreme difficulty) of difficulty in executing several functioning activities.

                  The followings are my codes to predict a disability score using those 10 questions
                  Code:
                  irt pcm k1-k10
                  predict disability, latent se(disability_se)
                  So my question is that is disability predicted from the command above the score that I want to obtain? and should I interpret the disability score that the higher the disability score, the severe the disability is? I also notice that the disability score also takes on negative values so how should I interpret those negative values?
                  ...

                  Thank you.
                  It may be worth reading up on Stata's introduction to item response theory if you haven't done so already. In any case, IRT assumes that the latent trait has a standard normal distribution. That is, the mean is zero, and the standard deviation is 1. So, if someone's disability score is 0, they have an average level of disability. Here, because all the items are scored such that higher responses are more disabled and lower numbers are less, if your disability score is negative, the respondent has less than average disability. -1 means about 1 standard deviation less than the mean, so quite a lot less disabled than average. -2 would be much, much less.

                  As a side note, once you get a good understanding of IRT's basics, some readers might want to Google for unipolar item response models. The gist of these models is that while verbal or math ability probably are roughly normally distributed, some traits like psychopathology or functional limitations might not be. These traits probably are unipolar, i.e. support 0 to + infinity. This is a new theoretical area for IRT, however, and Stata's irt commands don't estimate these models (nor do any of the R packages I'm currently familiar with). In principle, you could do Bayesian estimation for these models in Stata.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment


                  • #10
                    Dear STATA-list,

                    I've just got a quick follow-up question from my earlier posts in this thread, regarding how to convert my individual theta scores back to standardised "expected" questionnaire scores, if someone could help?

                    I've run rotated promax oblique factor analysis, identified 2 factors that load ~10 questions well, and used IRT GRM models (with instruction from the IRT PDF (pg 14-18).
                    I've generated individual person theta estimates for each of the respondents using:
                    predict theta, latent se(thetaSE) We've run linear regression models, (dependent variable = theta score; independent variable = disease type, and other demographic confounders) however, found that interpreting the theta estimate might be intuitively more difficult for readers, than interpreting an estimated change i noverall score for that section. Similar to the information on TCCs that can show what theta level corresponds to what cumulative raw score; is there a way to re-calculate estimated total questionnaire scores for each participant (using the theta estimate), which we can use in our model? Any help would be really great. And happy Easter! William

                    Comment


                    • #11
                      Originally posted by William Mitchell View Post
                      Dear STATA-list,

                      I've just got a quick follow-up question from my earlier posts in this thread, regarding how to convert my individual theta scores back to standardised "expected" questionnaire scores, if someone could help?

                      I've run rotated promax oblique factor analysis, identified 2 factors that load ~10 questions well, and used IRT GRM models (with instruction from the IRT PDF (pg 14-18).
                      I've generated individual person theta estimates for each of the respondents using:
                      predict theta, latent se(thetaSE) We've run linear regression models, (dependent variable = theta score; independent variable = disease type, and other demographic confounders) however, found that interpreting the theta estimate might be intuitively more difficult for readers, than interpreting an estimated change i noverall score for that section. Similar to the information on TCCs that can show what theta level corresponds to what cumulative raw score; is there a way to re-calculate estimated total questionnaire scores for each participant (using the theta estimate), which we can use in our model? Any help would be really great. And happy Easter! William
                      You can calculate the expected total scores using the predict post-estimation command. This solution runs predict for each variable. When you do that on the stock data, it will supply 4 variables corresponding to the probability of endorsing each level of the question. Here, each of the questions is coded 0 to 3. Thus, the expected score is 0 * P(item = 0) + 1 * P(item = 1) + ... If your items are scored 1-4, you'd need to modify the bolded line calculating an expected score for each item.

                      Code:
                      webuse charity
                      irt grm ta1-ta5
                      foreach v of varlist ta1-ta5 {
                          predict p_`v'_*, outcome(`v')
                          gen score_`v' = p_`v'_2 + p_`v'_3*2 + p_`v'_4*3
                          drop p_`v'_*
                      }
                      A better programmer than I could surely come up with a more generalizable solution, probably using the levelsof command.

                      Now, I see you are predicting your IRT scores, then using the predicted scores in regression. When you do this, you ignore the uncertainty in the level of the latent trait - recall that you predicted both a mean and a standard error of each latent trait for each person. Can I perhaps introduce you to the concept of explanatory IRT? Here's an open access article that explains and applies explanatory IRT to some real data. Here is one very dense book chapter by Wilson et al. Here's a link to a (likely gated) article by Briggs.

                      In Stata, you might type something like this:

                      Code:
                      gsem (Disability -> k1-k10, ologit) (Disability <- i.disease_type), variance(Disability@1)
                      That code will inherently get you a GRM model. You can fit the other IRT models for ordered items in gsem, the generalized structural equation modeling command. They require issuing a number of constraints to do so, and unfortunately, I'm not familiar with the constraints. Note that if you do the above, the values for the cutpoints will come out looking different from the regular IRT command. As discussed in the manual, this is because they're done in a different parameterization than the usual IRT parameterization; you just divide each cutpoint by (-1 * the discrimination parameter) to convert to usual IRT parameterization.
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment


                      • #12
                        Dear Dr Ng,

                        Thanks for your reply, and for outlining the preferences not to "double post" - I apologize.

                        I'm a bit confused as to how to modify the bolded line in your code, for my 5-point Likert-scale responses?

                        Assuming my code ran:

                        Code:
                        -dataex-
                        irt grm q2 q3 q6 q7 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24
                        predict theta_1, latent
                        foreach v of varlist q2 q3 q6 q7 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24 {
                        predict p_`v'_*, outcome(`v')
                        gen score _`v' = ??
                        drop p_`v'_*
                        }
                        How would I modify the bolded line?

                        Thanks again. William

                        Comment


                        • #13
                          Originally posted by William Mitchell View Post
                          Dear Dr Ng,

                          Thanks for your reply, and for outlining the preferences not to "double post" - I apologize.

                          I'm a bit confused as to how to modify the bolded line in your code, for my 5-point Likert-scale responses?

                          Assuming my code ran:

                          Code:
                          -dataex-
                          irt grm q2 q3 q6 q7 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24
                          predict theta_1, latent
                          foreach v of varlist q2 q3 q6 q7 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24 {
                          predict p_`v'_*, outcome(`v')
                          gen score _`v' = ??
                          drop p_`v'_*
                          }
                          How would I modify the bolded line?

                          Thanks again. William
                          Actually, I don't yet have my doctorate, although this is in progress!

                          If you extract the code from within the foreach loop and run it on one question, you'll see that it predicts one probability for each response category. I encourage you to give it a shot to see how it works if you like.

                          The foreach syntax is basically a loop. Stata goes through the loop for each unique value specified in the header. You could have written out the full list of variables:

                          Code:
                          foreach v of ta1 ta2 ta3 ta4 ta5 {
                          ...
                          }
                          Alternatively, you can take advantage of the fact that these are a variable list and that you can refer to all variables from ta1 to ta5 (as long as they're adjacent) with the hyphen syntax. Each time through the loop, it replaces `v' with the current string or number that the foreach loop is referring to. For example, first time through, `v' is replaced by "ta1". The second time, it's replaced by "ta2".

                          In the case of the stock data example I used, the response categories are coded 0, 1, 2, and 3. Hence, the expected score is 0 * P(ta_1 = 0 | theta) + 1 * P(ta_1 = 1 | theta) + 2 * P(ta_1 = 2 | theta) + 3 * P(ta_1 = 3 | theta).

                          Of course, 0 * anything is 0, so that line got omitted. If your response categories are coded 1 through 5, your code will look a bit different, e.g. 1 * P(Q1 = 1) + 2 * P(Q1 = 2) + ... + 5 * P(Q1 = 5).

                          Also, above, I was generating an expected score for each question. If you haven't caught this, I completely forgot to add at the end that you want to sum up all those individual probabilities at the end to get an expected total score.
                          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                          Comment


                          • #14
                            Dear Mr Ng,

                            I apologize for getting those credentials confused!

                            Thanks so much for your explanation. I have tried experimenting using the "charity" stata dataset, and seem to get results, but having trouble with my own data.

                            Specifically, with this code:

                            Code:
                            . foreach v of varlist q2 q3 q6 q7 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24 {
                              2. predict p_`v'_*, outcome(`v')
                              3. gen score_`v' = p_`v'_2 + p_`v'_3*2 + p_`v'_4*3 + p_`v'_5*4 + p_`v'_6*5
                              4. drop p_`v'_*
                              5. }
                            (option pr assumed)
                            (option conditional(ebmeans) assumed)
                            (using 7 quadrature points)
                            I get the following error message:

                            Code:
                            p_q2_6 not found
                            r(111);
                            In particular, I don't think I understand the "1 * P(Q1 = 1) + 2 * P(Q1 = 2) + ... + 5 * P(Q1 = 5)" correctly.

                            I note that in your "charity" dataset, with options 0-3, you have the equation: "0 * P(ta_1 = 0 | theta) + 1 * P(ta_1 = 1 | theta) + 2 * P(ta_1 = 2 | theta) + 3 * P(ta_1 = 3 | theta)";
                            4 'terms' in the equation, dropping the first (*0), giving you this line of code (3 'terms' with the last term being p_`v'_(n+1)*(n) - where n = the highest number on the rating scale:

                            Code:
                            gen score_`v' = p_`v'_2 + p_`v'_3*2 + p_`v'_4*3
                            I tried applying the same logic to mine; and because my scale ranges from 1-5, have the equation "1 * P(Q1 = 1) + 2 * P(Q1 = 2) + ... + 5 * P(Q1 = 5)" (with 5 'terms'), and my code ending with p_v_(n+1)(n) (as above)

                            I'm not sure why I'm getting the error message. Perhaps my code must be written as this, instead!?

                            Code:
                            gen score_`v' = p_`v'_1 + p_`v'_2*2 + p_`v'_3*3 + p_`v'_4*4 + p_`v'_5*5
                            I just don't want to get inaccurate estimates..

                            Any clarification would be greatly appreciated. Thanks again

                            Warm regards

                            William

                            Comment


                            • #15
                              Originally posted by William Mitchell View Post
                              Dear Mr Ng,

                              I apologize for getting those credentials confused!

                              Thanks so much for your explanation. I have tried experimenting using the "charity" stata dataset, and seem to get results, but having trouble with my own data.

                              Specifically, with this code:

                              Code:
                              . foreach v of varlist q2 q3 q6 q7 q10 q11 q12 q14 q17 q18 q19 q20 q21 q23 q24 {
                              2. predict p_`v'_*, outcome(`v')
                              3. gen score_`v' = p_`v'_2 + p_`v'_3*2 + p_`v'_4*3 + p_`v'_5*4 + p_`v'_6*5
                              4. drop p_`v'_*
                              5. }
                              (option pr assumed)
                              (option conditional(ebmeans) assumed)
                              (using 7 quadrature points)
                              I get the following error message:

                              Code:
                              p_q2_6 not found
                              r(111);
                              In particular, I don't think I understand the "1 * P(Q1 = 1) + 2 * P(Q1 = 2) + ... + 5 * P(Q1 = 5)" correctly.

                              I note that in your "charity" dataset, with options 0-3, you have the equation: "0 * P(ta_1 = 0 | theta) + 1 * P(ta_1 = 1 | theta) + 2 * P(ta_1 = 2 | theta) + 3 * P(ta_1 = 3 | theta)";
                              4 'terms' in the equation, dropping the first (*0), giving you this line of code (3 'terms' with the last term being p_`v'_(n+1)*(n) - where n = the highest number on the rating scale:

                              Code:
                              gen score_`v' = p_`v'_2 + p_`v'_3*2 + p_`v'_4*3
                              I tried applying the same logic to mine; and because my scale ranges from 1-5, have the equation "1 * P(Q1 = 1) + 2 * P(Q1 = 2) + ... + 5 * P(Q1 = 5)" (with 5 'terms'), and my code ending with p_v_(n+1)(n) (as above)

                              ...
                              William,

                              We tend to go by first names alone on Statalist.

                              Back to code. Your code below is correct.

                              I'm not sure why I'm getting the error message. Perhaps my code must be written as this, instead!?

                              Code:
                              gen score_`v' = p_`v'_1 + p_`v'_2*2 + p_`v'_3*3 + p_`v'_4*4 + p_`v'_5*5
                              Perhaps my notation was confusing. Let's go back to the coding. Recall that in the stock Stata dataset, there are 5 questions, ta1, ta2, ta3, ta4, and ta5. Each are coded 0 to 3.

                              When you run this code after an IRT model (in gsem or irt grm):

                              Code:
                              predict p_ta1_*, outcome(ta1)
                              You'll end up with 4 variables: p_ta1_1, p_ta1_2, p_ta1_3, and p_ta1_4. Basically, Stata is numbering those variables regardless of the underlying coding. p_ta1_1 is the probability that ta1 = 0, or P(ta1 = 0). This is more persnickety, but the more correct notation is P(ta1i = 0 | thetai), to indicate that this is the i-th respondent's probability given their value of theta.

                              Perhaps that caused some confusion.

                              In any case, your response categories are numbered 1 to 5. So, the variables generated by predict will still read p_q2_1, p_q2_2, ... p_q2_5. However, p_q2_1 is the probability that Q2 = 1.
                              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                              Comment

                              Working...
                              X