Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Applying genqreg and xtqreg in models with a small number of FE

    Dear all, and dear Joao Santos Silva,

    I would like to apply QR to the estimation of the child penalty (following Kleven et al., 2019), using different estimators, however, I struggle to correctly implement the different estimators in Stata.

    The basic event study framework runs something like

    Code:
    reg earn det1-det4 det6-det16 dage* dyear* if female==1, cluster(id)
    where det`x' is the eventtime dummy for year x relative to the birth of the first child (here from -5 to +10, with -1 being dropped). dage* contains 31 year dummies and dyear* contains 32 year dummies. id is an individual id.

    A tiny data example here for selected dummies:
    HTML Code:
    input float earn byte(det6 dyear2 dage2)
    22307.373 0 0 1
    17952.541 1 1 0
      699.373 0 0 0
     8221.417 0 0 0
     2551.391 0 0 0
     6386.294 0 0 0
    18150.393 0 0 0
    17410.055 0 0 0
     18640.32 0 0 0
     27018.01 0 0 0
    26442.715 0 0 0
     40133.03 0 0 0
     25216.81 0 0 0
     24679.87 0 0 0
     23941.64 0 0 0

    My sense is that genqreg could be interesting in the sense that it would allow me to control for age and year FE (needed for causal interpretation), while allowing me to obtain unconditional QTE, which seems to be an interesting estimate for me (more so than the conditional QTE conditioning on FE).

    Question 1. Is it correct that I can use genqreg (instead of Powell's qregpd?) For the latter, I am not sure how to distinguish between "control variables" and "treatment vars", which is key.

    I, therefore, run:
    Code:
     genqreg earn det1-det4 det6-det16 if female==1, q(0.5) instruments(det1-det4 det6-det16) proneness(dage* dyear*)
    Question 2. Is this the correct specification to implement the GQR // QRPD estimator?

    It runs really fast, but SE are just HUGE. Is this just true, or am I missing something? The negative effect on labor market outcomes for women after the birth of their first child is kind of an obvious fact across all data I have ever seen, so this lack of an effect (which I also see at other quantiles using the same command) makes we wonder...


    HTML Code:
    ------------------------------------------------------------------------------
    earn | Coefficient Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    det1 | 4197.432 52082.44 0.08 0.936 -97882.28 106277.1
    det2 | 1824.229 51763.43 0.04 0.972 -99630.23 103278.7
    det3 | 2410.966 42971.69 0.06 0.955 -81811.99 86633.92
    det4 | 2404.271 45530.96 0.05 0.958 -86834.77 91643.31
    det6 | -12659.37 51777.16 -0.24 0.807 -114140.7 88822
    det7 | -17802.46 52030.97 -0.34 0.732 -119781.3 84176.36
    det8 | -18413.7 51681.05 -0.36 0.722 -119706.7 82879.3
    det9 | -17261.98 51982.03 -0.33 0.740 -119144.9 84620.93
    det10 | -16602.73 52018.75 -0.32 0.750 -118557.6 85352.15
    det11 | -14583.08 51869.27 -0.28 0.779 -116245 87078.81
    det12 | -16469.7 48660.61 -0.34 0.735 -111842.7 78903.34
    det13 | -14695.77 55579.4 -0.26 0.791 -123629.4 94237.86
    det14 | -12130.62 52270.55 -0.23 0.816 -114579 90317.77
    det15 | -12950.29 51973.15 -0.25 0.803 -114815.8 88915.2
    det16 | -8739.535 52171.63 -0.17 0.867 -110994.1 93514.98
    ------------------------------------------------------------------------------
    whereas for the mean effect as well as using qreg, I find significant neg effects (that vary by quantile in qreg).

    When running qreg for the median:
    Code:
       qreg earn det1-det4 det6-det16 dage* dyear* if female==1, q(0.5) iter(1500)  // cluster(id)
    HTML Code:
    Median regression Number of obs = 29,459
    Raw sum of deviations 2.52e+08 (about 17166.168)
    Min sum of deviations 2.30e+08 Pseudo R2 = 0.0886
    
    ------------------------------------------------------------------------------
    earn | Coefficient Std. err. t P>|t| [95% conf. interval]
    -------------+----------------------------------------------------------------
    det1 | 1884.153 1102.267 1.71 0.087 -276.3386 4044.645
    det2 | 1486.15 1000.894 1.48 0.138 -475.6464 3447.946
    det3 | 1988.806 918.5372 2.17 0.030 188.4322 3789.18
    det4 | 1192.301 857.7053 1.39 0.165 -488.8401 2873.441
    det6 | -8229.007 830.1508 -9.91 0.000 -9856.139 -6601.874
    det7 | -13067.11 830.3095 -15.74 0.000 -14694.56 -11439.67
    det8 | -15141.53 859.8237 -17.61 0.000 -16826.82 -13456.24
    det9 | -17012 867.6451 -19.61 0.000 -18712.62 -15311.37
    det10 | -18270.72 901.4004 -20.27 0.000 -20037.51 -16503.94
    det11 | -19054.4 908.8317 -20.97 0.000 -20835.75 -17273.05
    det12 | -21161 950.416 -22.26 0.000 -23023.86 -19298.14
    det13 | -22107.55 956.0565 -23.12 0.000 -23981.47 -20233.64
    det14 | -21897.76 1010.045 -21.68 0.000 -23877.49 -19918.02
    det15 | -23079.51 1017.357 -22.69 0.000 -25073.57 -21085.44
    det16 | -22298.25 1074.599 -20.75 0.000 -24404.51 -20191.98

    If I understand it correctly, qreg will give me the conditional QTE, so this might ofc explain the difference, but before trying to understand this difference, I want to make sure I am right in using genqreg even though I have panel data.

    This brings me to xtqreg:

    Say I am also interested in conditional QTE, I am not sure whether the command is suited for my data structure (with year and age FE, but no individual fe).

    Question 3: Would you agree that using xt commands makes little sense given my model?

    Overall, my results using qreg were pretty nice.

    Question 4: Is qreg generally unsuited for my data? If not, I am still wondering what estimator it actually implements, and i.e. what the right interpretation is.

    Thank you so much in advance! I feel the discussion on GQR has been very limited (which is also what Machado and Santo Silva note in their paper), therefore my direct question here. The underlying estimand seems to be very relevant.

    Katharina

  • #2
    Dear Kathi Kaeppel,

    In my view, Powell's estimators implemented in qregpd and genqreg are extremely elegant and ingenious, but I do not think they estimate the models most practitioners have in mind. So, personally, I would not recommend their use.

    As for the other approaches, your model does not appear to have the traditional fixed effects, so you would not need xtqreg. However, you should probably cluster the standard error by ID, and for that you need qreg2; the results will be as in qreg, but with clustered standard errors.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Joao Santos Silva ,

      Thank you for the prompt response!

      I might miss something about the interpretation of the GQR estimator. as for qreg2, is it correct that I would estimate conditional QTE of having children on the conditional earning distributions (conditional on age and year FE). This is not so easy to interpret, is it? Someone at the 90th quantile of that conditional distribution might not have high earnings. I thought the GQR addresses this "issue" precisely by allowing to control for age and year FE but get the QTE at the 90th quantile of the unconditional earnings distribution.

      Could you explain a bit more what you mean by the fact that the GQR does not estimate the models most practitioners have in mind?

      Additionally or relatedly, I think I am not fully sure how greg(2) deals with the problem of additive FE - or is that only problematic for individual FE? I should think about that more...

      On the actual implementation, while using qreg2, for lower quantiles I got the same error as the person described it here: matrix not positive definite when running
      Code:
      qreg2 earn det1-det4 det6-det16 dage* dyear* if female==1, q(`dec') cluster(id) wlsiter(500)
      For other quantiles, it runs well (and indeed rejects the H_0 of no clustered SE, as expected). I should be able to share the data (or share the respective ICPSR link in case you are still interested to figure out this problem (and of course, I would be interested, as I don't know how to solve it).

      Thank you again!
      Kathi

      ps: I will be sure to cite the 2016 paper if I succeed in using qreg2.
      Last edited by Kathi Kaeppel; 30 Nov 2024, 05:42.

      Comment


      • #4
        Dear Kathi Kaeppel,

        You are right in saying that conditional and unconditional quantiles have very different interpretations. As far as I understand, estimation of unconditional quantiles, however, relies on very strong assumptions that I am not generally comfortable with. In contrast, estimation of conditional quantiles requires only very mild assumptions. You are also right in saying that someone in the 90th conditional quantile may not have high earnings, but will have high earnings relatively to others with the same characteristics, and I find that interpretation easy and intuitive. Of course, this may not be what you are trying to estimate.

        You asked me to say more about why I think that Powell's estimators are not for the models practitioners have in mind. The main point is that generally practitioners have in mind models in which the fixed effects are part of the model being estimated, which is not what Powell's estimators do. As I said, I think those estimators are very interesting from a theoretical point of view, but I think their practical application is limited. For more on this, please see here.

        On the error message you get with qreg2, I suggest that in the first instance you try to solve the problem by using the silverman option or by changing the default value used the option epsilon. If you cannot solve the problem, please send me by email a dataset where the problem occurs and the respective do file, so that I can investigate.

        Best wishes,

        Joao

        Comment


        • #5
          Dear Joao Santos Silva ,

          Thank you again. It took me a moment as I was working on several things at the same time, but I solved the problem re: the clustered SE: It occured in the levels specification for earnings, where earnings were set to 0 for those who do not work. Hence, the error was due to a lack of variation in the outcome variable at low quantiles (for women, 20% of the sample had 0 earnings)...

          Thank you again also for the clarification of your comment on Powell's estimator. My applied mind might not fully understand it, but that's okay

          One last question, before giving up on my goal to get useful unconditional treatment effects, do you think the Firpo et al (2009) estimator corresponds more to what you think practitioners have in mind?

          Best wishes,

          Katharina


          __

          Firpo, S., Fortin, N. M., & Lemieux, T. (2009). Unconditional quantile regressions. Econometrica, 77(3), 953-973.

          Comment


          • #6
            Dear Kathi Kaeppel,

            If your data is like that, you probably should not be using a linear quantile regression.

            Indeed, the method proposed by Firpo et al (2009) is in line with what practitioners have in mind, but if I recall correctly it does not work for dummy variables and requires strong assumptions.

            Best wishes,

            Joao

            Comment

            Working...
            X