Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split Samle IV

    Hi Statausers

    I’m researching whether school teachers tend to be more generous in their grading based on gender. To explore this, I have two predictors of gender: one derived from an image recognition algorithm and another from calligraphy using an optical character recognition (OCR) model. Inspired by Angrist and Krueger (1995), I’d like to perform a split-sample instrumental variables (IV) analysis.

    Is there a Stata package that facilitates split-sample IV estimation?

    What I've done is:

    Code:
    * Split the sample into two halves
    gen random = runiform()
    sort random
    gen half = (_n > _N/2)
    
    * First stage: Predict gender of the teacher using face image recognition in the first half of the sample
    reg gender_face gender_ocr other_covs if half == 1
    predict gender_pred1 if half == 1
    
    * Store the predictions
    gen gender_pred = .
    replace gender_pred = gender_pred1 if half == 1
    
    * First stage: Predict gender using the second half of the sample
    reg gender_face gender_ocr other_covs if half == 0
    predict gender_pred2 if half == 0
    
    * Combine the predictions
    replace gender_pred = gender_pred2 if half == 0
    
    * Second stage
    reg grades gender_pred other_covs
    Does anybody have any insight about this code? I couldn't find material to do this in Stata documentation.

    Thanks a lot!
    Last edited by Jean Jacques; 04 Jun 2024, 10:57.

  • #2

    I don't think this is a SSIV. You are not estimating by instrumental variables. I think you have two measures of gender by two different methods, which may not concur. Why are you predicting anything in the first stage? You have gender_face and gender_ocr. These are already predictions of gender. The residual of the regression is going to be continuous and hard to interpret (it's like a probability, not bound on 0/1, of being a gender).. I'd run two regressions using each of the gender variables, and compare the results. They should be close. I'd also check the concurrence of the predictions, to see how different they are. It may not be much of an issue. I suppose what you are wanting to do is to use one prediction in half and the other prediction in the other half of the sample, given that they are estimates. That's easy enough, though probably something you'd want to repeat a bunch of times.




    Comment


    • #3
      Hi George, I agree that there are other strategies to follow. It was more of an interest in how that work and if it were a specific command in Stata to do it. I guess, if we assume image prediction is something similar to a ground truth, it would be possible to use a SSIV. Right? Or that would require something stronger like visual observation by an RA or a prediction based on the name of the teachers? Thanks!

      Comment


      • #4
        SSIV is an instrumental variables technique designed to control for endogenous regressors. You don't have endogenous regressors. Gender is exogenous. Why do you want to use SSIV?

        Comment


        • #5
          Code:
          search  weaktsiv

          Comment

          Working...
          X