Benefits of a Rasch Model?

Johannes Leimert

Join Date: Feb 2016

Posts: 11
#1

Benefits of a Rasch Model?

29 Feb 2016, 10:06

Hello,
my prof wants me to do a Rasch Analysis for my paper. My aim is to check if there are some factors (like gender, age, etc...) that can explain if someone is willing to answer political questions (0/1 binary code).
I got some nice data, but I'm not sure about the use of Rasch. I took about lets say 30 variables with political questions and recoded them (0 = no answer given, 1 = answer given). I read the Stata article about Rasch but I think this is rather confusing (http://www.stata.com/support/faqs/statistics/rasch-model/) In the tutorial, they want to figure out if some math questions are harder than others. Why do I need a complex model here? I'd just look at the descriptive statistics and check the percentage answered correctly. The lower the rate, the harder the math question?. Duh, isn't that kinda obvious? I stil dont get the benefits of a Rasch model here. Could someone explain? Btw, I use Stata 12.

Thank you!
Tags: None

wbuchanan

Join Date: Mar 2014
Posts: 1361

29 Feb 2016, 16:16

The problem with your simplification is that it doesn't consider the ability of the respondents, and forms a dependency on persons in doing so. For example, if you gave an item like:

2 + 2 = ?
a. 0
b. 2
c. 2i
d. 4
e. 4i

At a MENSA convention, you wouldn't learn anything about the difficulty of the item because your person sample would all be clustered at the high end of ability (theta). Now, give them same item to a room full of young children (e.g., 4-5 years old) and now you have a completely different result (e.g., many students wouldn't get it correctly). So, does this mean the item is or isn't difficult? In both of these situations looking at the descriptive statistics alone provides a heavily context dependent view of the item and the ability of the respondents. In IRT models (as well as CTT models if structured to do so) the difficulty of the item and the ability of the respondents are placed on the same scale. So the difficulty parameter provides you with the point where for the same ability the probability of a correct response is chance (e.g., if the difficulty parameter is say .875 and the respondent has theta = 0.875 the probability they answer the question correctly is chance). The bigger and more likely reason for your professors suggestion is the independence between person and item parameters. In other words, if your data satisfy the Rasch model assumptions the item parameters are independent of the persons responding and you could sample any other items from the same domain and the people would still have the same value of theta.

Given that the only parameter being estimated is difficulty it isn't that complex. While this example isn't completely pertinent to Stata 12, I think it could help to illustrate things a bit for you:

Code:

webuse masc.dta, clear
qui: egen totalsc = rowtotal(q1 - q9)
qui: g pctcorrect = 100 * (totalsc / 9)
. raschjmle q1-q9
 Iteration                 Delta           Log-likelihood
--------------------------------------------------------------
         1     0.502591842208104       -3402.304331969046
         2     0.142412255554409       -3397.822027114892
         3     0.020979991419945       -3397.719031584525
         4     0.003561687956111       -3397.716620516149
         5     0.000591506681447       -3397.716599152711
--------------------------------------------------------------
                                                                                                              
=================================================================================================        
Item           Difficulty     Std. Error        WMS       Std. WMS        UMS       Std. UMS             
-------------------------------------------------------------------------------------------------        
q1                  -0.40           0.08         0.85        -4.32         0.84        -2.86                  
q2                   0.11           0.08         1.03         1.04         1.05         1.04                  
q3                  -1.36           0.10         0.93        -1.39         0.86        -1.39                  
q4                   0.49           0.08         0.99        -0.25         1.02         0.38                  
q5                   1.66           0.09         0.93        -1.54         1.02         0.28                  
q6                   0.82           0.08         0.93        -2.05         0.95        -0.82                  
q7                   1.37           0.09         1.10         2.42         1.17         1.99                  
q8                  -1.87           0.11         0.77        -3.81         0.85        -1.14                  
q9                  -0.81           0.09         1.04         1.04         1.13         1.66                  
=================================================================================================        



SCALE QUALITY STATISTICS                          
==================================================
Statistic                  Items     Persons  
--------------------------------------------------
Observed Variance         1.3031      1.4411
Observed Std. Dev.        1.1415      1.2005
Mean Square Error         0.0080      0.7097
Root MSE                  0.0894      0.8425
Adjusted Variance         1.2951      0.7314
Adjusted Std. Dev.        1.1380      0.8552
Separation Index         12.7235      1.0151
Number of Strata         17.2980      1.6868
Reliability               0.9939      0.5075
==================================================

            SCORE TABLE                
==================================
  Score        Theta      Std. Err
----------------------------------
    0.00        -3.94         1.89     
    1.00        -2.55         1.12     
    2.00        -1.59         0.89     
    3.00        -0.89         0.80     
    4.00        -0.28         0.77     
    5.00         0.31         0.76     
    6.00         0.91         0.79     
    7.00         1.59         0.87     
    8.00         2.53         1.11     
    9.00         3.89         1.89     
==================================

. li pctcorrect theta in 1/20

     +------------------------+
     | pctcorr~t        theta |
     |------------------------|
  1. | 44.444444   -4.0678244 |
  2. | 33.333333   -.28832417 |
  3. | 22.222222   -.93143046 |
  4. | 22.222222   -1.6661277 |
  5. | 33.333333   -1.6661277 |
     |------------------------|
  6. | 22.222222   -.93143046 |
  7. | 44.444444   -1.6661277 |
  8. | 88.888889   -.28832417 |
  9. | 44.444444    2.6333477 |
 10. | 88.888889   -.28832417 |
     |------------------------|
 11. | 22.222222    2.6333477 |
 12. | 33.333333   -1.6661277 |
 13. | 33.333333   -.93143046 |
 14. | 22.222222   -.93143046 |
 15. | 77.777778   -1.6661277 |
     |------------------------|
 16. | 22.222222    1.6665872 |
 17. | 88.888889   -1.6661277 |
 18. | 33.333333    2.6333477 |
 19. | 33.333333   -.93143046 |
 20. | 55.555556   -.93143046 |
     +------------------------+

In particular, it probably helps to look at the 2nd observation in the last code chunk above. The respondent only answered a third of the questions correctly, so why would they have a higher level of theta than observation 5 who also answered 33% of the items correctly? In this case, the difference is due to the difficulty of the items that each subject answered. The second respondent answered two of the more difficult questions correctly, while the fifth respondent answered more of the simpler questions correctly.

Comment

Johannes Leimert

Join Date: Feb 2016

Posts: 11
#3

01 Mar 2016, 02:34

Hello and thank you!
So the main aspect of this model is, we not only see how difficult an item is, but we also learn something about the ability of a person. Not only the number of correct items is used, but also the difficulity of the correct items to compute a final ability score. Is that a correct summary?
I did some research for Stata 12 and found the working ado raschtest (http://www.stata-journal.com/sjpdf.h...iclenum=st0119, p. 12-24). I guess I could use that.

Now my second question: what is the best way to combine the difficulty and the ability? In the end, it would be nice if I could compare for some variables, lets say gender, age or education. In the raschtest ado is the command "genlt" that "creates a new variable containing, for each individual, the estimated value of the latent trait. The replace option allows replacing an existing variable." Would it then be possible to compare that score across some groups to check for differences? That would be really cool.

Third: Is there a rule of thumb for the distribution of answers given for each item? Most questions I looked at are like 100 people = 0, 3500 people = 1. Would that be OK or could there be problems because of this very big differences?

Thank you for help. Im sorry for asking stupid questions, but I've never done that before (same for my prof. I guess he also just wants so see if this rasch model is a good thing ).
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#4

01 Mar 2016, 03:53

Johannes Leimert a naïve question yes, but everyone has to start somewhere. It does seem counter intuitive at first glance, so I wouldn't think of it as a stupid question.

The ability score isn't "computed" per se. Instead it is estimated from the the keyed item responses. In the case of a Rasch model, the sum score across items is a sufficient statistic for identification purposes, so Rasch models fitted with the Joint MLE exploit this by estimating the person parameter (theta) and difficulty parameter (beta) simultaneously by iterating back and forth between updating person and item parameter estimates until the convergence criteria is satisfied. That said, you are able to get both person and item parameter estimates from the data. So, only someone with higher levels of ability should be able to answer items correctly if they are more difficult and should also answer easier questions incorrect less frequently. Subjects with lower levels of theta are more likely to answer difficult questions inccorectly. When the opposite happens (e.g., an easy item is answered incorrectly by subjects with higher levels of theta or difficult items are answered correctly by subjects with lower levels of theta) it is an indication that the item may have some structural issues leading to reverse functioning.

In your particular case the first step would be determining the item characteristics (e.g., are there any items that are reversed, etc....). As long as you don't have any reversed items, then you would likely want to test for Differential Item Functioning (e.g., given the same level of theta is the probability of a correct response between two groups of respondents the same). This starts to get at the purpose you first mentioned. Ideally if a male and female both have theta of 0.8 you it would make intuitive that they would both have the same probability of a correct response. When that probability differs significantly the item stem may contain words or contexts that are more familiar or provide more information to one group vs the other. If the DIF (Maentel Hantzel statistic) is high enough, it is usually an item that would be eliminated before the final scoring (e.g., you would only keep items that have good quality characteristics in the final form).

You wouldn't combine difficulty and ability since they are measurements on different units. They are just on the same scale, which allows you to determine the level of theta at which the respondent's probability of a correct response is chance or better.

With regards to your last question, chance are that the items/test will not be informative for anything beyond the lowest levels of theta. If you had ordinal scale response sets originally, it might be worth looking at GLLAMM (http://www.gllamm.org) and turning to a Partial Credit Model (a Rasch extension to polytomous response items). Assuming the distribution isn't that highly dichotomized in your original response data, this could help a bit since you could use different models that would give you parameter estimates for the individual thresholds. In general, if you're able to get your hands on a copy of Stata 14 it would make your life much easier for this type of work, otherwise you might want to consider looking at jMetrik (www.itemanalysis.com) so you could have a bit more flexibility with estimating the item and person parameters and then importing the results into your Stata 12 for further processing, visualization, etc...
1 like
Comment
Johannes Leimert

Join Date: Feb 2016

Posts: 11
#5

09 Mar 2016, 03:50

Hello and thank you. I tried to test all this in Stata now, but there is a strange problem. I used the Raschtest ado in Stata 12, but I dont get the same output as the author in his tutorial.
I used the article here: http://www.stata-journal.com/sjpdf.h...iclenum=st0119
On page 36 there is the tab with the item coefficients. My problem is that I have much less output, even though I enter the same commands as the author. I only get the columns "Items", "Difficulty parameters" and "Std. Err." but no p-values! Any idea what could be wrong?
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#6

09 Mar 2016, 04:33

Johannes Leimert are you using the same version of the program that the author used? What data and command did you use? Without knowing the details of what you did and what Stata returned all anyone can do is take shots in the dark as to the reason why you got different results.
Comment
Johannes Leimert

Join Date: Feb 2016

Posts: 11
#7

09 Mar 2016, 05:47

Well, its not really about the results, but the output in general. Im just missing a lot of important information and I cant tell why.
I installed raschtest ado in Stata 12 via "ssc install raschtest", source: http://fmwww.bc.edu/repec/bocode/r , Distribution-Date: 20130601.
I used the ALLBUS Dataset (http://www.gesis.org/allbus/studienprofile/2014/) and recoded about 10 variables to binary ones. Then I used the command:
raschtest item1-item10, id(V2) autogroup
where V2 is the identification variable.
I tried some variations of this, with different or less items, without autogroup and other suggestions, based on the examples in the pdf tutorial (http://www.stata-journal.com/sjpdf.h...iclenum=st0119).
Well it worked, I got some output, but if you look at the example result on page 36, here its just as Stata would cut out the columns after "Std. err.", so I am missing the important p-values. Usually my Stata works fine, like with any logistic regression or so.

-----------------------------

With trial and error I finally found the problem. If I use one certain variable, the tables are gone. If I drop this item... ta da, the tables appear. I have no idea why. The item is coded like all others as well. Maybe this is a bug? I will try to figure out if there could be anything wrong.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#8

09 Mar 2016, 07:32

Did you assert that the variables only contain zeros and ones? It would be infinitely easier for people to help you if your example process was more explicit (e.g., which variables were recoded, what syntax did you use to recode them, etc...).
Comment

Announcement