Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IV regression, binary independent and dependent variables

    Hi,

    I have the following data. I want to use rank (obtained via lottery) as an instrument for days (days it takes to get an apartment) or as an instrument for sep_acc (binary variable =1 if the person had housing in September) or as an instrument for nov_acc (binary variable =1 if the person had housing in November) .
    The outcome variable can be either grade (which is standardized based on each course's mean and standard deviation) or top5 (binary variable =1 if the student's grade is among the top 5 in class), or pass(=1 if the student passed that course)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 course float(pass passrate grade top5 top50 rank days sep_acc nov_acc apr_acc id)
    "EUHR14"  1        1   .8539126 1 1   74  17 1 1 1 23
    "EUHR11"  1        1  1.0978876 1 1   74  17 1 1 1 23
    "EUHR12"  1        1        .75 1 1   74  17 1 1 1 23
    "HTG003F" 1 .9782609          . . . 1247   . 0 0 0 24
    "SGR002F" 1 .6470588          . . . 2255   . 0 0 0 25
    "SGR007F" 1 .9473684          . . . 2255   . 0 0 0 25
    "SGR015F" 1 .9285714          . . . 2255   . 0 0 0 25
    "COSM11"  1 .8965517  -.2489627 0 1 1630   . 0 0 0 26
    "COSM10"  1 .9310345   1.403547 1 1 1630   . 0 0 0 26
    "COSM12"  1 .9615384 -1.0205743 0 0 1630   . 0 0 0 26
    "KBTF15"  1 .7666667  .18538883 0 1   12  17 1 1 1 27
    "KBTN01"  1   .84375   .4600286 0 1   12  17 1 1 1 27
    "KBTF06"  1 .9166667  -.1257388 0 1   12  17 1 1 1 27
    "KBKN01"  1 .8048781   -.805823 0 1   12  17 1 1 1 27
    "KMBF05"  1  .983871   .9584039 0 1   12  17 1 1 1 27
    "KBTF10"  1 .8421053          . . .   12  17 1 1 1 27
    "SMMV18"  1 .9263158   .3654807 0 1  181 257 0 0 1 28
    "SMMV13"  1     .875   .7030613 0 1  181 257 0 0 1 28
    "SMMX17"  1 .9886364   .7887334 0 1  181 257 0 0 1 28
    "SMMV11"  1 .9619048  .23933636 0 1  181 257 0 0 1 28
    "SMMX21"  1 .9111111   .6967559 0 1  181 257 0 0 1 28
    "GNVN05"  0 .8823529          . . .  695   . 0 0 0 29
    "IMEN81"  1        1  1.0499588 1 1 2072   . 0 0 0 30
    "IMEN69"  1 .9807692 -1.0924842 0 0 2072   . 0 0 0 30
    "IMEN60"  1        1   .4674261 1 1 2072   . 0 0 0 30
    "IMEN80"  1        1  -.8451539 0 1 2072   . 0 0 0 30
    "MIDA24"  1        1  -.9645888 0 0 1194  77 0 1 1 31
    "MIDM12"  0     .862 -1.6872075 0 0 1194  77 0 1 1 31
    "MIDA13"  1        1   .4549109 1 1 1194  77 0 1 1 31
    "MIDM45"  1 .9772727 -1.2870108 0 0 1194  77 0 1 1 31
    "MIDA11"  1 .9777778 -1.1127443 0 0 1194  77 0 1 1 31
    "IMEN05"  1        1  1.1428761 1 1  295  17 1 1 1 32
    "IMEN04"  1        1 -.04430514 0 1  295  17 1 1 1 32
    "IMEN06"  1        1  -.8569977 0 0  295  17 1 1 1 32
    "IMEN44"  1        1 -.25012192 0 1  295  17 1 1 1 32
    end

    Would the following lines of code be okay to run a good analysis on this data sample? (passrate is a variable showing the percentage of students who passed that course)

    Code:
    ivregress 2sls grade (days = rank), first vce(cluster id)
    ivregress 2sls grade (nov_acc= rank), first vce(cluster id)
    
    ivregress 2sls pass (days = rank) passrate, first vce(cluster id)
    ivregress 2sls pass (nov_acc= rank) passrate, first vce(cluster id)
    
    ivregress 2sls top5 (days = rank), first vce(cluster id)
    ivregress 2sls top5 (nov_acc= rank), first vce(cluster id)
    Is there a way that I can make this analysis better and more complete? How should I deal with the binary outcome and variables? Should I use ivprobit for when the outcome is binary?
    Also, Should I use "course" fixed effect even though the grades are standardized?

    Another question is, how would I interpret the coefficients for the case in which I have standardized grades as outcomes to it becomes more tangible? For example if the coefficient of days is -0.0014 in the first regression (p-value=0.027) , how should I interpret it?

    ​​​​​​​Thanks in advance
    Last edited by Neg Kha; 22 Jan 2024, 05:01.

  • #2
    Could someone help me with this please? I would appreciate it a lot

    Comment


    • #3
      Originally posted by Neg Kha View Post
      Hi,

      I have the following data. I want to use rank (obtained via lottery) as an instrument for days (days it takes to get an apartment) or as an instrument for sep_acc (binary variable =1 if the person had housing in September) or as an instrument for nov_acc (binary variable =1 if the person had housing in November) .
      The outcome variable can be either grade (which is standardized based on each course's mean and standard deviation) or top5 (binary variable =1 if the student's grade is among the top 5 in class), or pass(=1 if the student passed that course)

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str10 course float(pass passrate grade top5 top50 rank days sep_acc nov_acc apr_acc id)
      "EUHR14" 1 1 .8539126 1 1 74 17 1 1 1 23
      "EUHR11" 1 1 1.0978876 1 1 74 17 1 1 1 23
      "EUHR12" 1 1 .75 1 1 74 17 1 1 1 23
      "HTG003F" 1 .9782609 . . . 1247 . 0 0 0 24
      "SGR002F" 1 .6470588 . . . 2255 . 0 0 0 25
      "SGR007F" 1 .9473684 . . . 2255 . 0 0 0 25
      "SGR015F" 1 .9285714 . . . 2255 . 0 0 0 25
      "COSM11" 1 .8965517 -.2489627 0 1 1630 . 0 0 0 26
      "COSM10" 1 .9310345 1.403547 1 1 1630 . 0 0 0 26
      "COSM12" 1 .9615384 -1.0205743 0 0 1630 . 0 0 0 26
      "KBTF15" 1 .7666667 .18538883 0 1 12 17 1 1 1 27
      "KBTN01" 1 .84375 .4600286 0 1 12 17 1 1 1 27
      "KBTF06" 1 .9166667 -.1257388 0 1 12 17 1 1 1 27
      "KBKN01" 1 .8048781 -.805823 0 1 12 17 1 1 1 27
      "KMBF05" 1 .983871 .9584039 0 1 12 17 1 1 1 27
      "KBTF10" 1 .8421053 . . . 12 17 1 1 1 27
      "SMMV18" 1 .9263158 .3654807 0 1 181 257 0 0 1 28
      "SMMV13" 1 .875 .7030613 0 1 181 257 0 0 1 28
      "SMMX17" 1 .9886364 .7887334 0 1 181 257 0 0 1 28
      "SMMV11" 1 .9619048 .23933636 0 1 181 257 0 0 1 28
      "SMMX21" 1 .9111111 .6967559 0 1 181 257 0 0 1 28
      "GNVN05" 0 .8823529 . . . 695 . 0 0 0 29
      "IMEN81" 1 1 1.0499588 1 1 2072 . 0 0 0 30
      "IMEN69" 1 .9807692 -1.0924842 0 0 2072 . 0 0 0 30
      "IMEN60" 1 1 .4674261 1 1 2072 . 0 0 0 30
      "IMEN80" 1 1 -.8451539 0 1 2072 . 0 0 0 30
      "MIDA24" 1 1 -.9645888 0 0 1194 77 0 1 1 31
      "MIDM12" 0 .862 -1.6872075 0 0 1194 77 0 1 1 31
      "MIDA13" 1 1 .4549109 1 1 1194 77 0 1 1 31
      "MIDM45" 1 .9772727 -1.2870108 0 0 1194 77 0 1 1 31
      "MIDA11" 1 .9777778 -1.1127443 0 0 1194 77 0 1 1 31
      "IMEN05" 1 1 1.1428761 1 1 295 17 1 1 1 32
      "IMEN04" 1 1 -.04430514 0 1 295 17 1 1 1 32
      "IMEN06" 1 1 -.8569977 0 0 295 17 1 1 1 32
      "IMEN44" 1 1 -.25012192 0 1 295 17 1 1 1 32
      end

      Would the following lines of code be okay to run a good analysis on this data sample? (passrate is a variable showing the percentage of students who passed that course)

      Code:
      ivregress 2sls grade (days = rank), first vce(cluster id)
      ivregress 2sls grade (nov_acc= rank), first vce(cluster id)
      
      ivregress 2sls pass (days = rank) passrate, first vce(cluster id)
      ivregress 2sls pass (nov_acc= rank) passrate, first vce(cluster id)
      
      ivregress 2sls top5 (days = rank), first vce(cluster id)
      ivregress 2sls top5 (nov_acc= rank), first vce(cluster id)
      Is there a way that I can make this analysis better and more complete? How should I deal with the binary outcome and variables? Should I use ivprobit for when the outcome is binary?
      Also, Should I use "course" fixed effect even though the grades are standardized?

      Another question is, how would I interpret the coefficients for the case in which I have standardized grades as outcomes to it becomes more tangible? For example if the coefficient of days is -0.0014 in the first regression (p-value=0.027) , how should I interpret it?

      Thanks in advance
      Would it be appropriate to use biprobit for the case of binary dependent and binary regressor? (pass as outcome and nov_cc as regressor)
      After that, should I use the margins command to get the marginal effect?
      Also, would it be appropriate to use ivprobit in the following line:

      Code:
      ivprobit pass (days=rank) , vce(robust) first

      Comment


      • #4


        Would it be appropriate to use biprobit for the case of binary dependent and binary regressor? (pass as outcome and nov_cc as regressor)
        After that, should I use the margins command to get the marginal effect?
        Also, would it be appropriate to use ivprobit in the following line:

        Code:
        ivprobit pass (days=rank) , vce(robust) first

        Comment

        Working...
        X