Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression using panel data variables with missing values

    Hello everyone,

    I am trying to run a regression using a panel data to examine the possible effect of the amount of student loan for the years while enrolled in college on the log of hourly wage after college graduation. The variable for student loan has values for each year when an individual is enrolled (and has zero values after college graduation), and the hourly wage has values for the years after college graduation (some individuals who worked while in college has wage values). My question is, what kind of regression I should run. For example, can I run a pooled OLS (i.e. reg log(wage) studentloan X) or should I run xtreg log(wage) studentloan X? Here, X is some control variables that are mostly time invariant (race, gender, family income) but some are time variant (age, occupation at given year). Or should I make a new variable for total student loan borrowed while in college (by summing the yearly values)?

    I am very sorry if my question is elementary, and thank you in advance for your help and guidance. If I need to post this question in specific format, please let me know and I will fix it asap.

    Have a great day!

    David





  • #2
    JungHwan:
    welcome to this forum.
    I would go -xtreg,fe- as my first choice here.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      JungHwan:
      welcome to this forum.
      I would go -xtreg,fe- as my first choice here.
      Dear Carlo, Thank you very much for your advice. As per your advice, I tried running -xtreg logwage studentloan exp exp^2 occupation,fe- where exp and occupation are experience and occupation codes which vary by years over the panel data. I get significant results for the coefficient for the student loan. I also wanted to test some IV variables that I think are possible candidates and ran -xtivreg, fe- and instrumented student loan using the IV variable. The -xtivreg, fe-, however, omitted the exp variable (everything else has the results for its coefficients). I was not sure why -xtivreg, fe- is omitting one of my variables which varies over time. Do you have any idea what could possibly affect the regression to omit this variable? Thank you very much for your help!

      Comment


      • #4
        JungHwan:
        1) your exp exp^2 terms should become:
        Code:
        c.exp##c.exp
        2) it is impossible to say without taking a look at what you typed and what Stata gave you back (as per FAQ).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          JungHwan:
          1) your exp exp^2 terms should become:
          Code:
          c.exp##c.exp
          2) it is impossible to say without taking a look at what you typed and what Stata gave you back (as per FAQ).
          Thank you very much for your comments. The below is what I did for -xtreg,fe-:
          Code:
          . xtreg lhrwage tot_sloan_ c.exp##c.exp occucode familyincome if graduated_COL==1, fe
          
          Fixed-effects (within) regression               Number of obs     =     16,494
          Group variable: PUBID                           Number of groups  =      1,602
          
          R-sq:                                           Obs per group:
               within  = 0.3848                                         min =          1
               between = 0.1374                                         avg =       10.3
               overall = 0.2978                                         max =         18
          
                                                          F(5,14887)        =    1862.66
          corr(u_i, Xb)  = -0.0144                        Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
               lhrwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
            tot_sloan_ |  -.0000122   1.57e-06    -7.80   0.000    -.0000153   -9.17e-06
                   exp |   .0603472   .0009174    65.78   0.000      .058549    .0621453
                       |
           c.exp#c.exp |   .0009643   .0000209    46.19   0.000     .0009234    .0010052
                       |
              occucode |  -.0000576   1.86e-06   -30.87   0.000    -.0000612   -.0000539
          familyincome |    .000538   .0000519    10.37   0.000     .0004363    .0006396
                 _cons |   2.568975   .0090712   283.20   0.000     2.551195    2.586756
          -------------+----------------------------------------------------------------
               sigma_u |  .33676155
               sigma_e |  .39975283
                   rho |  .41509485   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(1601, 14887) = 5.88                 Prob > F = 0.0000
          and I got the following results for -xtivreg,fe-:

          Code:
          . xtivreg lhrwage (tot_sloan_=independent_at_grad) c.exp##c.exp occucode familyincome
          if graduated_COL==1, fe
          
          Fixed-effects (within) IV regression            Number of obs     =     16,494
          Group variable: PUBID                           Number of groups  =      1,602
          
          R-sq:                                           Obs per group:
               within  =      .                                         min =          1
               between = 0.0253                                         avg =       10.3
               overall = 0.0302                                         max =         18
          
                                                          Wald chi2(4)      =   54049.52
          corr(u_i, Xb)  = -0.3494                        Prob > chi2       =     0.0000
          
          ------------------------------------------------------------------------------
               lhrwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
            tot_sloan_ |  -.0006782   .0000362   -18.75   0.000    -.0007491   -.0006073
                   exp |          0  (omitted)
                       |
           c.exp#c.exp |   .0001879   .0000661     2.84   0.005     .0000582    .0003175
                       |
              occucode |  -.0000175   7.72e-06    -2.27   0.023    -.0000326   -2.37e-06
          familyincome |  -.0000167   .0001975    -0.08   0.933    -.0004038    .0003704
                 _cons |   3.096906   .0334269    92.65   0.000      3.03139    3.162421
          -------------+----------------------------------------------------------------
               sigma_u |  .70326241
               sigma_e |  1.4460868
                   rho |  .19127099   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F  test that all u_i=0:     F(1601,14888) =     0.45      Prob > F    = 1.0000
          ------------------------------------------------------------------------------
          Instrumented:   tot_sloan_
          Instruments:    exp c.exp#c.exp occucode familyincome independent_at_grad
          ------------------------------------------------------------------------------
          I gave the condition for both regression
          Code:
          if graduated_COL==1
          in order to use individuals who are college graduates. The variable exp is the experience that varies by year, and I was not sure why this is omitted in -xtivreg,fe-. The instrument that I used,
          Code:
          independent_at_grad
          is a dummy variable that indicates whether an individual's status is considered as independent student or not. Thank you very much for your time and help!

          Comment


          • #6
            JungHwan:
            it may well be that -exp- is perfectly collinear with -independent_at_grad-.
            Please also note that the F-test that all u_i=0 after -xtivreg- goes agianst any panel-wise effect.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              JungHwan:
              it may well be that -exp- is perfectly collinear with -independent_at_grad-.
              Please also note that the F-test that all u_i=0 after -xtivreg- goes agianst any panel-wise effect.
              Thank you very much for your comment. Could you please let me know if I am understanding your comment correctly?
              The F-test statistics being 0.45 means that there is no endogeneity problem since we can't reject all u_i = 0 where u_i(if I understand correctly) is the unobserved individual error terms? I did not fully understand what it means by 'going against panel-wise effect'. Sorry if I have misunderstood the test-statistics and thank you very much for your help.

              Comment


              • #8
                Jung:
                the null of that F-statistics is: no panel-wise effect.
                If the Prob > F = 1.0000, we cannot reject the null.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Jung:
                  the null of that F-statistics is: no panel-wise effect.
                  If the Prob > F = 1.0000, we cannot reject the null.
                  Thank you for your comments. In this case, would it be better to try pooled-OLS regression since there is no panel-wise effect?
                  Would this be okay even when the panel data is unbalanced (some individuals have more observations(i.e. years) then the others).

                  Best,
                  Jung Hwan Kim

                  Comment


                  • #10
                    JungHwan:
                    first, I would consider if one instrument only is actually enough and if instruments are correlated with the endogenous regressor and not weak.
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      Originally posted by Carlo Lazzaro View Post
                      JungHwan:
                      first, I would consider if one instrument only is actually enough and if instruments are correlated with the endogenous regressor and not weak.
                      Thank you very much for your comment. As per your device, I checked the first stage from xtivreg, and for some reason, it is omitting my IV variable as follows:
                      Code:
                      . xtivreg lhrwage (tot_sloan_=independent_at_grad) c.exp##c.exp occucode family
                      > income if graduated_COL==1, first fe
                      
                      First-stage within regression
                      
                      Fixed-effects (within) regression               Number of obs     =     16,494
                      Group variable: PUBID                           Number of groups  =      1,602
                      
                      R-sq:                                           Obs per group:
                           within  = 0.0404                                         min =          1
                           between = 0.0245                                         avg =       10.3
                           overall = 0.0363                                         max =         18
                      
                                                                      F(4,14888)        =     156.90
                      corr(u_i, Xb)  = -0.0209                        Prob > F          =     0.0000
                      
                      ------------------------------------------------------------------------------
                        tot_sloan_ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                               exp |  -90.61418   4.730898   -19.15   0.000    -99.88732   -81.34103
                                   |
                       c.exp#c.exp |  -1.165878   .1085705   -10.74   0.000    -1.378689   -.9530662
                                   |
                          occucode |   .0601859    .009722     6.19   0.000     .0411297    .0792422
                      familyincome |  -.8329089   .2706273    -3.08   0.002    -1.363372    -.302446
                      independen~d |          0  (omitted)
                             _cons |   792.7128   46.90499    16.90   0.000     700.7732    884.6524
                      -------------+----------------------------------------------------------------
                           sigma_u |  961.86053
                           sigma_e |  2086.7604
                               rho |  .17523114   (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      F test that all u_i=0: F(1601, 14888) = 1.86                 Prob > F = 0.0000
                      The student loan variable has values for the year in which an individual borrowed the loan to pay tuition, but it has zero values otherwise. Hence, this variable has a lot of observations with zero values (i.e. it means that individuals did not borrow in that year. 14052 out of 16494 observations have value of zero as follow).

                      Code:
                      . tab tot_sloan_ if lhrwage !=. & exp!=. & occucode!=. & familyincome!=. & graduated_COL !=.
                      
                       tot_sloan_ |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                                0 |     14,052       85.19       85.19
                         5.620104 |          1        0.01       85.20
                         109.6578 |          1        0.01       85.21
                         150.0282 |          1        0.01       85.21
                         182.3472 |          1        0.01       85.22
                         213.4188 |          1        0.01       85.22
                         219.3156 |          1        0.01       85.23
                            226.8 |          1        0.01       85.24
                          227.934 |          1        0.01       85.24
                         240.0961 |          1        0.01       85.25
                         281.0052 |          1        0.01       85.26
                         300.0564 |          1        0.01       85.26
                            340.2 |          1        0.01       85.27
                         355.3423 |          1        0.01       85.27
                          377.622 |          1        0.01       85.28
                         420.0336 |          1        0.01       85.29
                         438.6312 |          1        0.01       85.29
                      Could it be because the student loan variable has too many zeros and that independent status variable is either 1 or 0, which would cause multicolinearity? I was thinking summating the yearly loan amount and using the total amount of student loan as the key independent variable, but in this case I would not be able to use fixed effect regression since the total amount borrowed for each individual does not vary over time. Would there be a way to deal with this situation?

                      Thank you very much for your time and help.

                      Best regards,

                      Jung Hwan Kim

                      Comment


                      • #12
                        Jung Hwan:
                        arte you sure that the omitted instrument is not a time-invariant predictor whose coefficient cannot be estimated via -fe- specification?
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Jung Hwan:
                          arte you sure that the omitted instrument is not a time-invariant predictor whose coefficient cannot be estimated via -fe- specification?
                          Dear Carlo, Thank you very much for your comment. My IV is actually time-invariant, so it now makes sense that the coefficient can't be estimated using the fixed effect specification. In this case, would be be okay for me to use random effect (although I tried doing hausman test for my data and I got p-value close to zero, which tells me to use the fixed effect..) Or would it be okay for me to use pooled OLS? Thank you very much for your time and help.

                          Best regards,

                          Jung Hwan Kim
                          Last edited by JungHwan Kim; 17 Mar 2022, 09:11.

                          Comment


                          • #14
                            Jung Hwan Kim:
                            what about exploring different instruments while sticking with -fe-?
                            In addition, cluster-robust standard errors seem mandatory here.
                            Kind regards,
                            Carlo
                            (StataNow 18.5)

                            Comment

                            Working...
                            X