Split Samle IV

Jean Jacques

Join Date: Sep 2020

Posts: 97
#1

Split Samle IV

04 Jun 2024, 10:55

Hi Statausers

I’m researching whether school teachers tend to be more generous in their grading based on gender. To explore this, I have two predictors of gender: one derived from an image recognition algorithm and another from calligraphy using an optical character recognition (OCR) model. Inspired by Angrist and Krueger (1995), I’d like to perform a split-sample instrumental variables (IV) analysis.

Is there a Stata package that facilitates split-sample IV estimation?

What I've done is:

Code:

* Split the sample into two halves gen random = runiform() sort random gen half = (_n > _N/2) * First stage: Predict gender of the teacher using face image recognition in the first half of the sample reg gender_face gender_ocr other_covs if half == 1 predict gender_pred1 if half == 1 * Store the predictions gen gender_pred = . replace gender_pred = gender_pred1 if half == 1 * First stage: Predict gender using the second half of the sample reg gender_face gender_ocr other_covs if half == 0 predict gender_pred2 if half == 0 * Combine the predictions replace gender_pred = gender_pred2 if half == 0 * Second stage reg grades gender_pred other_covs

Does anybody have any insight about this code? I couldn't find material to do this in Stata documentation.

Thanks a lot!

Last edited by Jean Jacques; 04 Jun 2024, 10:57.
Tags: instrumental variables
George Ford

Join Date: Aug 2014

Posts: 3038
#2

04 Jun 2024, 16:49

I don't think this is a SSIV. You are not estimating by instrumental variables. I think you have two measures of gender by two different methods, which may not concur. Why are you predicting anything in the first stage? You have gender_face and gender_ocr. These are already predictions of gender. The residual of the regression is going to be continuous and hard to interpret (it's like a probability, not bound on 0/1, of being a gender).. I'd run two regressions using each of the gender variables, and compare the results. They should be close. I'd also check the concurrence of the predictions, to see how different they are. It may not be much of an issue. I suppose what you are wanting to do is to use one prediction in half and the other prediction in the other half of the sample, given that they are estimates. That's easy enough, though probably something you'd want to repeat a bunch of times.
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#3

05 Jun 2024, 03:49

Hi George, I agree that there are other strategies to follow. It was more of an interest in how that work and if it were a specific command in Stata to do it. I guess, if we assume image prediction is something similar to a ground truth, it would be possible to use a SSIV. Right? Or that would require something stronger like visual observation by an RA or a prediction based on the name of the teachers? Thanks!
Comment
George Ford

Join Date: Aug 2014

Posts: 3038
#4

05 Jun 2024, 09:25

SSIV is an instrumental variables technique designed to control for endogenous regressors. You don't have endogenous regressors. Gender is exogenous. Why do you want to use SSIV?
Comment
George Ford

Join Date: Aug 2014

Posts: 3038
#5

05 Jun 2024, 09:29

Code:

search weaktsiv
Comment

Announcement

Comment

Comment

Comment

Comment