Dear all, and dear Joao Santos Silva,
I would like to apply QR to the estimation of the child penalty (following Kleven et al., 2019), using different estimators, however, I struggle to correctly implement the different estimators in Stata.
The basic event study framework runs something like
where det`x' is the eventtime dummy for year x relative to the birth of the first child (here from -5 to +10, with -1 being dropped). dage* contains 31 year dummies and dyear* contains 32 year dummies. id is an individual id.
A tiny data example here for selected dummies:
My sense is that genqreg could be interesting in the sense that it would allow me to control for age and year FE (needed for causal interpretation), while allowing me to obtain unconditional QTE, which seems to be an interesting estimate for me (more so than the conditional QTE conditioning on FE).
Question 1. Is it correct that I can use genqreg (instead of Powell's qregpd?) For the latter, I am not sure how to distinguish between "control variables" and "treatment vars", which is key.
I, therefore, run:
Question 2. Is this the correct specification to implement the GQR // QRPD estimator?
It runs really fast, but SE are just HUGE. Is this just true, or am I missing something? The negative effect on labor market outcomes for women after the birth of their first child is kind of an obvious fact across all data I have ever seen, so this lack of an effect (which I also see at other quantiles using the same command) makes we wonder...
whereas for the mean effect as well as using qreg, I find significant neg effects (that vary by quantile in qreg).
When running qreg for the median:
If I understand it correctly, qreg will give me the conditional QTE, so this might ofc explain the difference, but before trying to understand this difference, I want to make sure I am right in using genqreg even though I have panel data.
This brings me to xtqreg:
Say I am also interested in conditional QTE, I am not sure whether the command is suited for my data structure (with year and age FE, but no individual fe).
Question 3: Would you agree that using xt commands makes little sense given my model?
Overall, my results using qreg were pretty nice.
Question 4: Is qreg generally unsuited for my data? If not, I am still wondering what estimator it actually implements, and i.e. what the right interpretation is.
Thank you so much in advance! I feel the discussion on GQR has been very limited (which is also what Machado and Santo Silva note in their paper), therefore my direct question here. The underlying estimand seems to be very relevant.
Katharina
I would like to apply QR to the estimation of the child penalty (following Kleven et al., 2019), using different estimators, however, I struggle to correctly implement the different estimators in Stata.
The basic event study framework runs something like
Code:
reg earn det1-det4 det6-det16 dage* dyear* if female==1, cluster(id)
A tiny data example here for selected dummies:
HTML Code:
input float earn byte(det6 dyear2 dage2) 22307.373 0 0 1 17952.541 1 1 0 699.373 0 0 0 8221.417 0 0 0 2551.391 0 0 0 6386.294 0 0 0 18150.393 0 0 0 17410.055 0 0 0 18640.32 0 0 0 27018.01 0 0 0 26442.715 0 0 0 40133.03 0 0 0 25216.81 0 0 0 24679.87 0 0 0 23941.64 0 0 0
My sense is that genqreg could be interesting in the sense that it would allow me to control for age and year FE (needed for causal interpretation), while allowing me to obtain unconditional QTE, which seems to be an interesting estimate for me (more so than the conditional QTE conditioning on FE).
Question 1. Is it correct that I can use genqreg (instead of Powell's qregpd?) For the latter, I am not sure how to distinguish between "control variables" and "treatment vars", which is key.
I, therefore, run:
Code:
genqreg earn det1-det4 det6-det16 if female==1, q(0.5) instruments(det1-det4 det6-det16) proneness(dage* dyear*)
It runs really fast, but SE are just HUGE. Is this just true, or am I missing something? The negative effect on labor market outcomes for women after the birth of their first child is kind of an obvious fact across all data I have ever seen, so this lack of an effect (which I also see at other quantiles using the same command) makes we wonder...
HTML Code:
------------------------------------------------------------------------------ earn | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- det1 | 4197.432 52082.44 0.08 0.936 -97882.28 106277.1 det2 | 1824.229 51763.43 0.04 0.972 -99630.23 103278.7 det3 | 2410.966 42971.69 0.06 0.955 -81811.99 86633.92 det4 | 2404.271 45530.96 0.05 0.958 -86834.77 91643.31 det6 | -12659.37 51777.16 -0.24 0.807 -114140.7 88822 det7 | -17802.46 52030.97 -0.34 0.732 -119781.3 84176.36 det8 | -18413.7 51681.05 -0.36 0.722 -119706.7 82879.3 det9 | -17261.98 51982.03 -0.33 0.740 -119144.9 84620.93 det10 | -16602.73 52018.75 -0.32 0.750 -118557.6 85352.15 det11 | -14583.08 51869.27 -0.28 0.779 -116245 87078.81 det12 | -16469.7 48660.61 -0.34 0.735 -111842.7 78903.34 det13 | -14695.77 55579.4 -0.26 0.791 -123629.4 94237.86 det14 | -12130.62 52270.55 -0.23 0.816 -114579 90317.77 det15 | -12950.29 51973.15 -0.25 0.803 -114815.8 88915.2 det16 | -8739.535 52171.63 -0.17 0.867 -110994.1 93514.98 ------------------------------------------------------------------------------
When running qreg for the median:
Code:
qreg earn det1-det4 det6-det16 dage* dyear* if female==1, q(0.5) iter(1500) // cluster(id)
HTML Code:
Median regression Number of obs = 29,459 Raw sum of deviations 2.52e+08 (about 17166.168) Min sum of deviations 2.30e+08 Pseudo R2 = 0.0886 ------------------------------------------------------------------------------ earn | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- det1 | 1884.153 1102.267 1.71 0.087 -276.3386 4044.645 det2 | 1486.15 1000.894 1.48 0.138 -475.6464 3447.946 det3 | 1988.806 918.5372 2.17 0.030 188.4322 3789.18 det4 | 1192.301 857.7053 1.39 0.165 -488.8401 2873.441 det6 | -8229.007 830.1508 -9.91 0.000 -9856.139 -6601.874 det7 | -13067.11 830.3095 -15.74 0.000 -14694.56 -11439.67 det8 | -15141.53 859.8237 -17.61 0.000 -16826.82 -13456.24 det9 | -17012 867.6451 -19.61 0.000 -18712.62 -15311.37 det10 | -18270.72 901.4004 -20.27 0.000 -20037.51 -16503.94 det11 | -19054.4 908.8317 -20.97 0.000 -20835.75 -17273.05 det12 | -21161 950.416 -22.26 0.000 -23023.86 -19298.14 det13 | -22107.55 956.0565 -23.12 0.000 -23981.47 -20233.64 det14 | -21897.76 1010.045 -21.68 0.000 -23877.49 -19918.02 det15 | -23079.51 1017.357 -22.69 0.000 -25073.57 -21085.44 det16 | -22298.25 1074.599 -20.75 0.000 -24404.51 -20191.98
If I understand it correctly, qreg will give me the conditional QTE, so this might ofc explain the difference, but before trying to understand this difference, I want to make sure I am right in using genqreg even though I have panel data.
This brings me to xtqreg:
Say I am also interested in conditional QTE, I am not sure whether the command is suited for my data structure (with year and age FE, but no individual fe).
Question 3: Would you agree that using xt commands makes little sense given my model?
Overall, my results using qreg were pretty nice.
Question 4: Is qreg generally unsuited for my data? If not, I am still wondering what estimator it actually implements, and i.e. what the right interpretation is.
Thank you so much in advance! I feel the discussion on GQR has been very limited (which is also what Machado and Santo Silva note in their paper), therefore my direct question here. The underlying estimand seems to be very relevant.
Katharina
Comment