IV regression, binary independent and dependent variables

Neg Kha

Join Date: Jun 2022
Posts: 68

IV regression, binary independent and dependent variables

22 Jan 2024, 03:49

Hi,

I have the following data. I want to use rank (obtained via lottery) as an instrument for days (days it takes to get an apartment) or as an instrument for sep_acc (binary variable =1 if the person had housing in September) or as an instrument for nov_acc (binary variable =1 if the person had housing in November) .
The outcome variable can be either grade (which is standardized based on each course's mean and standard deviation) or top5 (binary variable =1 if the student's grade is among the top 5 in class), or pass(=1 if the student passed that course)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 course float(pass passrate grade top5 top50 rank days sep_acc nov_acc apr_acc id)
"EUHR14"  1        1   .8539126 1 1   74  17 1 1 1 23
"EUHR11"  1        1  1.0978876 1 1   74  17 1 1 1 23
"EUHR12"  1        1        .75 1 1   74  17 1 1 1 23
"HTG003F" 1 .9782609          . . . 1247   . 0 0 0 24
"SGR002F" 1 .6470588          . . . 2255   . 0 0 0 25
"SGR007F" 1 .9473684          . . . 2255   . 0 0 0 25
"SGR015F" 1 .9285714          . . . 2255   . 0 0 0 25
"COSM11"  1 .8965517  -.2489627 0 1 1630   . 0 0 0 26
"COSM10"  1 .9310345   1.403547 1 1 1630   . 0 0 0 26
"COSM12"  1 .9615384 -1.0205743 0 0 1630   . 0 0 0 26
"KBTF15"  1 .7666667  .18538883 0 1   12  17 1 1 1 27
"KBTN01"  1   .84375   .4600286 0 1   12  17 1 1 1 27
"KBTF06"  1 .9166667  -.1257388 0 1   12  17 1 1 1 27
"KBKN01"  1 .8048781   -.805823 0 1   12  17 1 1 1 27
"KMBF05"  1  .983871   .9584039 0 1   12  17 1 1 1 27
"KBTF10"  1 .8421053          . . .   12  17 1 1 1 27
"SMMV18"  1 .9263158   .3654807 0 1  181 257 0 0 1 28
"SMMV13"  1     .875   .7030613 0 1  181 257 0 0 1 28
"SMMX17"  1 .9886364   .7887334 0 1  181 257 0 0 1 28
"SMMV11"  1 .9619048  .23933636 0 1  181 257 0 0 1 28
"SMMX21"  1 .9111111   .6967559 0 1  181 257 0 0 1 28
"GNVN05"  0 .8823529          . . .  695   . 0 0 0 29
"IMEN81"  1        1  1.0499588 1 1 2072   . 0 0 0 30
"IMEN69"  1 .9807692 -1.0924842 0 0 2072   . 0 0 0 30
"IMEN60"  1        1   .4674261 1 1 2072   . 0 0 0 30
"IMEN80"  1        1  -.8451539 0 1 2072   . 0 0 0 30
"MIDA24"  1        1  -.9645888 0 0 1194  77 0 1 1 31
"MIDM12"  0     .862 -1.6872075 0 0 1194  77 0 1 1 31
"MIDA13"  1        1   .4549109 1 1 1194  77 0 1 1 31
"MIDM45"  1 .9772727 -1.2870108 0 0 1194  77 0 1 1 31
"MIDA11"  1 .9777778 -1.1127443 0 0 1194  77 0 1 1 31
"IMEN05"  1        1  1.1428761 1 1  295  17 1 1 1 32
"IMEN04"  1        1 -.04430514 0 1  295  17 1 1 1 32
"IMEN06"  1        1  -.8569977 0 0  295  17 1 1 1 32
"IMEN44"  1        1 -.25012192 0 1  295  17 1 1 1 32
end

Would the following lines of code be okay to run a good analysis on this data sample? (passrate is a variable showing the percentage of students who passed that course)

Code:

ivregress 2sls grade (days = rank), first vce(cluster id)
ivregress 2sls grade (nov_acc= rank), first vce(cluster id)

ivregress 2sls pass (days = rank) passrate, first vce(cluster id)
ivregress 2sls pass (nov_acc= rank) passrate, first vce(cluster id)

ivregress 2sls top5 (days = rank), first vce(cluster id)
ivregress 2sls top5 (nov_acc= rank), first vce(cluster id)

Is there a way that I can make this analysis better and more complete? How should I deal with the binary outcome and variables? Should I use ivprobit for when the outcome is binary?
Also, Should I use "course" fixed effect even though the grades are standardized?

Another question is, how would I interpret the coefficients for the case in which I have standardized grades as outcomes to it becomes more tangible? For example if the coefficient of days is -0.0014 in the first regression (p-value=0.027) , how should I interpret it?

Thanks in advance

Last edited by Neg Kha; 22 Jan 2024, 04:01.

Tags: None

Neg Kha

Join Date: Jun 2022

Posts: 68
#2

23 Jan 2024, 01:35

Could someone help me with this please? I would appreciate it a lot
Comment

Neg Kha

Join Date: Jun 2022
Posts: 68

23 Jan 2024, 07:23

Originally posted by Neg Kha View Post

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 course float(pass passrate grade top5 top50 rank days sep_acc nov_acc apr_acc id)
"EUHR14" 1 1 .8539126 1 1 74 17 1 1 1 23
"EUHR11" 1 1 1.0978876 1 1 74 17 1 1 1 23
"EUHR12" 1 1 .75 1 1 74 17 1 1 1 23
"HTG003F" 1 .9782609 . . . 1247 . 0 0 0 24
"SGR002F" 1 .6470588 . . . 2255 . 0 0 0 25
"SGR007F" 1 .9473684 . . . 2255 . 0 0 0 25
"SGR015F" 1 .9285714 . . . 2255 . 0 0 0 25
"COSM11" 1 .8965517 -.2489627 0 1 1630 . 0 0 0 26
"COSM10" 1 .9310345 1.403547 1 1 1630 . 0 0 0 26
"COSM12" 1 .9615384 -1.0205743 0 0 1630 . 0 0 0 26
"KBTF15" 1 .7666667 .18538883 0 1 12 17 1 1 1 27
"KBTN01" 1 .84375 .4600286 0 1 12 17 1 1 1 27
"KBTF06" 1 .9166667 -.1257388 0 1 12 17 1 1 1 27
"KBKN01" 1 .8048781 -.805823 0 1 12 17 1 1 1 27
"KMBF05" 1 .983871 .9584039 0 1 12 17 1 1 1 27
"KBTF10" 1 .8421053 . . . 12 17 1 1 1 27
"SMMV18" 1 .9263158 .3654807 0 1 181 257 0 0 1 28
"SMMV13" 1 .875 .7030613 0 1 181 257 0 0 1 28
"SMMX17" 1 .9886364 .7887334 0 1 181 257 0 0 1 28
"SMMV11" 1 .9619048 .23933636 0 1 181 257 0 0 1 28
"SMMX21" 1 .9111111 .6967559 0 1 181 257 0 0 1 28
"GNVN05" 0 .8823529 . . . 695 . 0 0 0 29
"IMEN81" 1 1 1.0499588 1 1 2072 . 0 0 0 30
"IMEN69" 1 .9807692 -1.0924842 0 0 2072 . 0 0 0 30
"IMEN60" 1 1 .4674261 1 1 2072 . 0 0 0 30
"IMEN80" 1 1 -.8451539 0 1 2072 . 0 0 0 30
"MIDA24" 1 1 -.9645888 0 0 1194 77 0 1 1 31
"MIDM12" 0 .862 -1.6872075 0 0 1194 77 0 1 1 31
"MIDA13" 1 1 .4549109 1 1 1194 77 0 1 1 31
"MIDM45" 1 .9772727 -1.2870108 0 0 1194 77 0 1 1 31
"MIDA11" 1 .9777778 -1.1127443 0 0 1194 77 0 1 1 31
"IMEN05" 1 1 1.1428761 1 1 295 17 1 1 1 32
"IMEN04" 1 1 -.04430514 0 1 295 17 1 1 1 32
"IMEN06" 1 1 -.8569977 0 0 295 17 1 1 1 32
"IMEN44" 1 1 -.25012192 0 1 295 17 1 1 1 32
end

Would the following lines of code be okay to run a good analysis on this data sample? (passrate is a variable showing the percentage of students who passed that course)

Code:

ivregress 2sls grade (days = rank), first vce(cluster id)
ivregress 2sls grade (nov_acc= rank), first vce(cluster id)

ivregress 2sls pass (days = rank) passrate, first vce(cluster id)
ivregress 2sls pass (nov_acc= rank) passrate, first vce(cluster id)

ivregress 2sls top5 (days = rank), first vce(cluster id)
ivregress 2sls top5 (nov_acc= rank), first vce(cluster id)

Would it be appropriate to use biprobit for the case of binary dependent and binary regressor? (pass as outcome and nov_cc as regressor)
After that, should I use the margins command to get the marginal effect?
Also, would it be appropriate to use ivprobit in the following line:

Code:

ivprobit pass (days=rank) , vce(robust) first

Comment

Neg Kha

Join Date: Jun 2022

Posts: 68
#4

23 Jan 2024, 07:24

Would it be appropriate to use biprobit for the case of binary dependent and binary regressor? (pass as outcome and nov_cc as regressor)
After that, should I use the margins command to get the marginal effect?
Also, would it be appropriate to use ivprobit in the following line:

Code:

ivprobit pass (days=rank) , vce(robust) first
Comment

Announcement

IV regression, binary independent and dependent variables

Comment

Comment

Comment