Hello! I am interesting in measuring changes over time in returns to work experience, with returns measured as wages--in other words, does an additional year of experience result in a similar wage increase in 2019 as it did in 1980? I have a regression with multiple fixed effects, and my variable of interest increases in a predictable way over time. I'm concerned that the regression I've run does not make sense, or that I'm interpreting the results incorrectly.
To obtain estimates of time-varying returns to work experience, I estimate a wage equation and include the total number of years of full-time work experience (and its square). To allow the estimated returns to experience to vary across years, I interact a worker's full-time job experience with year dummies. The data I'm using is from the Panel Study of Income Dynamics (PSID), a longitudinal panel survey of American families, conducted by the Survey Research Center at the University of Michigan. The observations are yearly.
Formally, the regression specification takes the following form:
data:image/s3,"s3://crabby-images/eb051/eb051a502a9a54791ca88de137c669f0aab7bb10" alt="Click image for larger version
Name: AAstatalist_equations.png
Views: 1
Size: 23.8 KB
ID: 1647242"
i stands for person and t stands for years, w denotes the real wage rate, Y is a year dummy, and Exp is overall work experience.
I include a set of controls labelled here as Gamma: a dummy for whether or not the individual has a bachelors degree, a dummy for union membership, as well as industry, occupation, and state FE.
The error term u_{it} has two components: mu_i, which represents the unobserved characteristics of individual i that affect wages in a fixed manner over time, and epsilon_{it}, which represents a normally distributed random error.
I use the following code in Stata to run this regression:
n is the worker's unique identifier, ln_w is log of real wages, experience is the number of years of full-time work experience the individual has, year is the year of the observation (here I have 1978-2019, with gaps--the survey was run yearly until 1997, and then every two years after that--so 1999, 2001, 2003, etc), union is a dummy for whether the person is in a union, ind_detailed is the industry, occ_det is the occupation, state is the fips state code, BA is a dummy that =1 if the individual has a bachelors, and weight_ind is the sample weight provided by the PSID.
I use the estimated coefficients on Exp and their squared terms to construct year-specific returns to experience.
data:image/s3,"s3://crabby-images/34da3/34da33edba22b4c67bb8fe15189da3ecc26a1f38" alt="Click image for larger version
Name: AAstatalist_eq2.png
Views: 1
Size: 13.3 KB
ID: 1647245"
Which gives the predicted returns to T years of general work experience. As a baseline, I choose T=5.
My interpretation is that the coefficients on experience represent a single estimate of the effects of work experience on wages while accounting for unit-level heterogeneity and time shocks. But, I'm concerned about using within-person variation and these year interactions with a variable that I expect to increase by either 0 (if they didn't have enough hours worked to qualify as an additional year of full-time work experience) or 1.
Here is a plot of the results:data:image/s3,"s3://crabby-images/f2bf3/f2bf34f737ca1275774106dde62ddfcfd92ae861" alt="Click image for larger version
Name: AAstatalist_image.png
Views: 1
Size: 21.1 KB
ID: 1647239"
Here is the code I ran after the regression to generate this:
When the worker is included as a fixed effect in the regression, it means I have a separate intercept shift estimated for each worker to capture time-invariant features of each individual (like worker ability). I want to interpret these coefficients as: in 1978 and the early 1980s, the average person expects their log wages to increase by about .4 (or, somewhere between .3 and .5) for an additional 5 years of experience. This deteriorates over time, and by 2019 there is no real correlation between additional years of work experience and your wages.
Again, I'm concerned about using these year interactions with a variable that I expect to increase by either 0 (if they didn't have enough hours worked to qualify as an additional year of full-time work experience) or 1, and combining that with person and year fixed effects. If I were interacting years with a time-invariant feature of the worker, say race or gender, I wouldn't be able to estimate the effects of that variable in any particular time period, but I could use year interactions like I did here to estimate the differences in the partial effects on these time-constant variables relative to a base period (see Wooldridge, 2010, pg. 302). While years of work experience isn't time-constant, the way it changes over time is very nearly time-constant. Frankly I just feel unsure of my interpretation, and I'm not sure if I'm estimating what I intended to. I'm hoping I've provided enough information and someone here has more insight.
Thanks in advance for your help!
To obtain estimates of time-varying returns to work experience, I estimate a wage equation and include the total number of years of full-time work experience (and its square). To allow the estimated returns to experience to vary across years, I interact a worker's full-time job experience with year dummies. The data I'm using is from the Panel Study of Income Dynamics (PSID), a longitudinal panel survey of American families, conducted by the Survey Research Center at the University of Michigan. The observations are yearly.
Formally, the regression specification takes the following form:
i stands for person and t stands for years, w denotes the real wage rate, Y is a year dummy, and Exp is overall work experience.
I include a set of controls labelled here as Gamma: a dummy for whether or not the individual has a bachelors degree, a dummy for union membership, as well as industry, occupation, and state FE.
The error term u_{it} has two components: mu_i, which represents the unobserved characteristics of individual i that affect wages in a fixed manner over time, and epsilon_{it}, which represents a normally distributed random error.
I use the following code in Stata to run this regression:
Code:
areg ln_w i.year#c.experience i.year#c.experience2 i.year i.union i.ind_detailed i.state i.BA i.occ_det [pw=weight_ind], absorb(n) r
I use the estimated coefficients on Exp and their squared terms to construct year-specific returns to experience.
Which gives the predicted returns to T years of general work experience. As a baseline, I choose T=5.
My interpretation is that the coefficients on experience represent a single estimate of the effects of work experience on wages while accounting for unit-level heterogeneity and time shocks. But, I'm concerned about using within-person variation and these year interactions with a variable that I expect to increase by either 0 (if they didn't have enough hours worked to qualify as an additional year of full-time work experience) or 1.
Here is a plot of the results:
Here is the code I ran after the regression to generate this:
Code:
collapse (median) w_hr_real (mean) age ln_w hours tenure experience unemp [pw=weight_ind] , by(year) sort year tsset year gen T = 5 gen year_exp = . gen year_exp_lb = . gen year_exp_ub = . local startyear 1978 quietly replace year_exp = _b[`startyear'b.year#c.experience]*T + _b[`startyear'b.year#c.experience2]*T^2 if year==`startyear' quietly replace year_exp_lb = (_b[`startyear'b.year#c.experience] - invttail(e(df_r),0.025)*_se[`startyear'b.year#c.experience])*T + (_b[`startyear'b.year#c.experience2] - invttail(e(df_r),0.025)*_se[`startyear'b.year#c.experience2])*T^2 if year==`startyear' quietly replace year_exp_ub = (_b[`startyear'b.year#c.experience] + invttail(e(df_r),0.025)*_se[`startyear'b.year#c.experience])*T + (_b[`startyear'b.year#c.experience2] + invttail(e(df_r),0.025)*_se[`startyear'b.year#c.experience2])*T^2 if year==`startyear' // local yearlist 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 foreach y of local yearlist { quietly replace year_exp = _b[`y'.year#c.experience]*T + _b[`y'.year#c.experience2]*T^2 if year==`y' quietly replace year_exp_lb = (_b[`y'.year#c.experience] - invttail(e(df_r),0.025)*_se[`y'.year#c.experience])*T + (_b[`y'.year#c.experience2] - invttail(e(df_r),0.025)*_se[`y'.year#c.experience2])*T^2 if year==`y' quietly replace year_exp_ub = (_b[`y'.year#c.experience] + invttail(e(df_r),0.025)*_se[`y'.year#c.experience])*T + (_b[`y'.year#c.experience2] + invttail(e(df_r),0.025)*_se[`y'.year#c.experience2])*T^2 if year==`y' } *************************************************************************** ************ RETURNS TO EXPERIENCE BY YEAR ************ *Graph Settings grstyle clear set scheme s2color grstyle init grstyle set plain, box grstyle color background white grstyle set color Set1 grstyle yesno draw_major_hgrid yes grstyle yesno draw_major_ygrid yes grstyle color major_grid gs8 grstyle linepattern major_grid dot grstyle set legend 4, box inside grstyle color ci_area gs12%50 twoway (rarea year_exp_lb year_exp_ub year, color(gs15)) (line year_exp year, color(black)) , name(year_exp) title("Time-Varying Returns to Work Experience") ytitle("Returns: Log Wages") legend(off)
When the worker is included as a fixed effect in the regression, it means I have a separate intercept shift estimated for each worker to capture time-invariant features of each individual (like worker ability). I want to interpret these coefficients as: in 1978 and the early 1980s, the average person expects their log wages to increase by about .4 (or, somewhere between .3 and .5) for an additional 5 years of experience. This deteriorates over time, and by 2019 there is no real correlation between additional years of work experience and your wages.
Again, I'm concerned about using these year interactions with a variable that I expect to increase by either 0 (if they didn't have enough hours worked to qualify as an additional year of full-time work experience) or 1, and combining that with person and year fixed effects. If I were interacting years with a time-invariant feature of the worker, say race or gender, I wouldn't be able to estimate the effects of that variable in any particular time period, but I could use year interactions like I did here to estimate the differences in the partial effects on these time-constant variables relative to a base period (see Wooldridge, 2010, pg. 302). While years of work experience isn't time-constant, the way it changes over time is very nearly time-constant. Frankly I just feel unsure of my interpretation, and I'm not sure if I'm estimating what I intended to. I'm hoping I've provided enough information and someone here has more insight.
Thanks in advance for your help!