Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Erik Ruzek View Post
    Everything proposed in the discussion so far is reasonable under the assumption that the LSAT scores are invariant to method of administration.
    I have suggested how to model both mean differences in the LSAT scores due to method of administration (in this case location: remote vs. in person), and differences in the variance of the LSAT scores (heteroskedasticity) due to method of administration. Is there another variation we should be concerned about?

    Alfonso Sanchez-Penalver

    Comment


    • #17
      Alfonso Sánchez-Peñalver The key thing to remember is that those LSAT scores are themselves the product of a statistical model. I don't know for sure, but my guess is that for any true-false and multiple choice items, a 1PL or Rasch model is probably used. Forget for the moment that there are also essay items that probably don't get modeled similarly. In the Rasch model, each item is constrained to have the same loading onto the latent "ability" factor but allowed to have a freely varying intercept (note that -1 * intercept = item "difficulty" if using gsem). A proper assessment of measurement invariance would require that the psychometricians examined whether the difficulty parameters vary by test administration (among other person factors). To the extent that they do vary, that puts in doubt the equivalence of the scores and their meaning for the two groups.

      Your model rests on the assumption that the the LSAT score itself is invariant to administration type. Put differently, it assumes that the scores from the two groups measure the same latent (ability) construct. The LSAT people have to prove that through invariance (or diff) testing.

      Now, imagine that they did show that the LSAT was invariant to administration type. There could nonetheless be valid mean differences and residual heteroskedasticity due to selection issues - who takes which version of the test. Your model may be able to help with that because we know that the measurement properties of the instrument do not depend on whether a test taker was sitting on their couch or in a LSAT approved sterile room full of cubicles.

      Comment


      • #18
        Erik Ruzek makes excellent points. It assumes, however, that Jim wants to use the LSAT scores for what they were designed for (some measure of "ability", whatever that may be). As a sociologist, let me put forward an alternative interpretation of the LSAT scores: it is some score assigned by a gatekeeper, which co-determines the academic path of a (prospective) student. In that interpretation, the LSAT score does not need to measure anything. In an ideal world it would reliably measure ability, and relying on LSAT scores makes admission decisions more meritocratic, but who says that we are living in an ideal world? The LSAT score has real consequences for (prospective) students, so it deserves to be studies regardless of how good or bad that measurement is.

        We don't need to assume bad intend for the LSAT score to be a bad measurement. As soon as measurements have real consequences for individuals, those individuals will adept and thus (unintentionally) invalidate those measurements. For example, as soon as one measures the productivity of workers and pay them accordingly, the workers will maximize that measurement of productivity rather than the productivity itself. For example, if you pay police officers by the number of tickets they write, they will find ways to maximize the number of tickets they write, but will that make them better police officers? Same with LSAT: some people have more resources to prepare for that test than others, some people know which administration type is more advantageous and have the possibility to influence that, etc. What do differences in LSAT scores between students mean in that situation? Is it a difference in ability or a difference in resources?

        So how much Jim should care about the measurement characteristics of the LSAT score depends on what Jim wants it to measure: does he want a measure of ability (now he should care), or does he want the score assigned by a gatekeeper (now he shouldn't care). Regardless, Jim should be explicit about what the LSAT score measures in his study.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #19
          Originally posted by Erik Ruzek View Post
          A proper assessment of measurement invariance would require that the psychometricians examined whether the difficulty parameters vary by test administration (among other person factors). To the extent that they do vary, that puts in doubt the equivalence of the scores and their meaning for the two groups.
          Simplifying, you mean that they may administer exams with different level of difficulty to those that take them at home than to those that take them in person? And, thus, there would be confounding bias in the simple use of a binary variable location because it is capturing some of the effects of the variation of difficulty in the taken exams? This is very difficult to control for, unless you have data on the type of tests that are available, and which one is administered to whom. I expand a bit on this.

          I would think that the LSAT administrators would have a set of tests prepared, and then they assign one to each taker. Assuming this is the case, let's consider that there is an unknown distribution of difficulty amongst that set of tests. If the assignation of the tests is completely random, then it should not bias the estimates of the effects of taking the tests at home. Inasmuch as there is a higher share of the easier (or harder) tests assigned into one of the two groups (home / in person) the estimates would be biased, unless you had the data for which test they actually took in each circumstance. Including a categorical (factor) variable for the tests would control for the variation of the level of difficulty among groups. However, I am unaware of the presence of such variable, or whether it is unreasonable to assume that the assignment of which test to take is not purely random.
          Alfonso Sanchez-Penalver

          Comment


          • #20
            Maarten Buis if you care about measurement, you would want to predict LSAT score itself, and/or be interested in the direct effect on the measurement itself. If you care about the gatekeeping, you would like to predict the probability that it gets you somewhere, and the effects on that probability. Woudn't you? Otherwise, why bother at all with LSAT scores and how well the data fit them? In either case there is something to be estimated, and the problem of variation across test administration persists. Doesn't it?
            Alfonso Sanchez-Penalver

            Comment


            • #21
              Originally posted by Alfonso Sánchez-Peñalver View Post
              Simplifying, you mean that they may administer exams with different level of difficulty to those that take them at home than to those that take them in person?
              No, the exact same questions will get you different answers depending how you ask them. I mainly come across this phenomenon in case of surveys: asking the same questions in person, over the telephone, or web survey will often elicit radically different answers. This is not just selection: numerous experiments have shown that there is a real mode effect. If you want reliable measurements you need to take that much more seriously than just add mode dummy.

              The psychometric literature is a huge world, probably of similar size as the econometric literature. A big chunk of it deals with measurement. Just like econometricians see selection problems everywhere and have elaborate and ingenious ways of dealing with those, so do psychometricians see measurement problems everywhere and have elaborate and ingenious ways of dealing with those.
              Last edited by Maarten Buis; 02 Oct 2024, 13:21.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #22
                Originally posted by Alfonso Sánchez-Peñalver View Post
                Maarten Buis if you care about measurement, you would want to predict LSAT score itself, and/or be interested in the direct effect on the measurement itself. If you care about the gatekeeping, you would like to predict the probability that it gets you somewhere, and the effects on that probability.
                Not necessarily, the LSAT score has become something that has become valuable in its own right, independent of what it measures. So you can study who gets more and who gets less of that valuable resource.

                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #23
                  Maarten Buis thanks for clarifying in #21. It's interesting to see how something that must have been devised to give a standard against which to measure individuals' capabilities is perverted when implemented. But yes, I hadn't thought of that. It seems impossible to control for that, though without knowing which questions and how they were asked. So at some point some assumption should be made.

                  Wieht respect to #22, the fact that the original effort is in determining for which cluster the characteristics predict the LSAT score better, indicates that the score itself is somehow of interest. Whether it is to either predict it directly (or the marginal effects), or to use it indirectly as a prediction of reaching something we don't know, but that is why I think that Jim cares about the score itself.
                  Alfonso Sanchez-Penalver

                  Comment

                  Working...
                  X