Hi,
I have the following data. I want to use rank (obtained via lottery) as an instrument for days (days it takes to get an apartment) or as an instrument for sep_acc (binary variable =1 if the person had housing in September) or as an instrument for nov_acc (binary variable =1 if the person had housing in November) .
The outcome variable can be either grade (which is standardized based on each course's mean and standard deviation) or top5 (binary variable =1 if the student's grade is among the top 5 in class), or pass(=1 if the student passed that course)
Would the following lines of code be okay to run a good analysis on this data sample? (passrate is a variable showing the percentage of students who passed that course)
Is there a way that I can make this analysis better and more complete? How should I deal with the binary outcome and variables? Should I use ivprobit for when the outcome is binary?
Also, Should I use "course" fixed effect even though the grades are standardized?
Another question is, how would I interpret the coefficients for the case in which I have standardized grades as outcomes to it becomes more tangible? For example if the coefficient of days is -0.0014 in the first regression (p-value=0.027) , how should I interpret it?
Thanks in advance
I have the following data. I want to use rank (obtained via lottery) as an instrument for days (days it takes to get an apartment) or as an instrument for sep_acc (binary variable =1 if the person had housing in September) or as an instrument for nov_acc (binary variable =1 if the person had housing in November) .
The outcome variable can be either grade (which is standardized based on each course's mean and standard deviation) or top5 (binary variable =1 if the student's grade is among the top 5 in class), or pass(=1 if the student passed that course)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str10 course float(pass passrate grade top5 top50 rank days sep_acc nov_acc apr_acc id) "EUHR14" 1 1 .8539126 1 1 74 17 1 1 1 23 "EUHR11" 1 1 1.0978876 1 1 74 17 1 1 1 23 "EUHR12" 1 1 .75 1 1 74 17 1 1 1 23 "HTG003F" 1 .9782609 . . . 1247 . 0 0 0 24 "SGR002F" 1 .6470588 . . . 2255 . 0 0 0 25 "SGR007F" 1 .9473684 . . . 2255 . 0 0 0 25 "SGR015F" 1 .9285714 . . . 2255 . 0 0 0 25 "COSM11" 1 .8965517 -.2489627 0 1 1630 . 0 0 0 26 "COSM10" 1 .9310345 1.403547 1 1 1630 . 0 0 0 26 "COSM12" 1 .9615384 -1.0205743 0 0 1630 . 0 0 0 26 "KBTF15" 1 .7666667 .18538883 0 1 12 17 1 1 1 27 "KBTN01" 1 .84375 .4600286 0 1 12 17 1 1 1 27 "KBTF06" 1 .9166667 -.1257388 0 1 12 17 1 1 1 27 "KBKN01" 1 .8048781 -.805823 0 1 12 17 1 1 1 27 "KMBF05" 1 .983871 .9584039 0 1 12 17 1 1 1 27 "KBTF10" 1 .8421053 . . . 12 17 1 1 1 27 "SMMV18" 1 .9263158 .3654807 0 1 181 257 0 0 1 28 "SMMV13" 1 .875 .7030613 0 1 181 257 0 0 1 28 "SMMX17" 1 .9886364 .7887334 0 1 181 257 0 0 1 28 "SMMV11" 1 .9619048 .23933636 0 1 181 257 0 0 1 28 "SMMX21" 1 .9111111 .6967559 0 1 181 257 0 0 1 28 "GNVN05" 0 .8823529 . . . 695 . 0 0 0 29 "IMEN81" 1 1 1.0499588 1 1 2072 . 0 0 0 30 "IMEN69" 1 .9807692 -1.0924842 0 0 2072 . 0 0 0 30 "IMEN60" 1 1 .4674261 1 1 2072 . 0 0 0 30 "IMEN80" 1 1 -.8451539 0 1 2072 . 0 0 0 30 "MIDA24" 1 1 -.9645888 0 0 1194 77 0 1 1 31 "MIDM12" 0 .862 -1.6872075 0 0 1194 77 0 1 1 31 "MIDA13" 1 1 .4549109 1 1 1194 77 0 1 1 31 "MIDM45" 1 .9772727 -1.2870108 0 0 1194 77 0 1 1 31 "MIDA11" 1 .9777778 -1.1127443 0 0 1194 77 0 1 1 31 "IMEN05" 1 1 1.1428761 1 1 295 17 1 1 1 32 "IMEN04" 1 1 -.04430514 0 1 295 17 1 1 1 32 "IMEN06" 1 1 -.8569977 0 0 295 17 1 1 1 32 "IMEN44" 1 1 -.25012192 0 1 295 17 1 1 1 32 end
Would the following lines of code be okay to run a good analysis on this data sample? (passrate is a variable showing the percentage of students who passed that course)
Code:
ivregress 2sls grade (days = rank), first vce(cluster id) ivregress 2sls grade (nov_acc= rank), first vce(cluster id) ivregress 2sls pass (days = rank) passrate, first vce(cluster id) ivregress 2sls pass (nov_acc= rank) passrate, first vce(cluster id) ivregress 2sls top5 (days = rank), first vce(cluster id) ivregress 2sls top5 (nov_acc= rank), first vce(cluster id)
Also, Should I use "course" fixed effect even though the grades are standardized?
Another question is, how would I interpret the coefficients for the case in which I have standardized grades as outcomes to it becomes more tangible? For example if the coefficient of days is -0.0014 in the first regression (p-value=0.027) , how should I interpret it?
Thanks in advance
Comment