Dear All,
I am interested in estimating a linear regression model with first differenced panel data. My question relates to the interpretation of first-differenced indicator variables. In the made up example I use here, my hypothesis is that state unemployment rates will be higher when there is a Republican governor than they are when there is a Democratic governor. I am using both first difference and fixed effects estimators, but I am predominately concerned with the first difference specification for purposes of this post.
Wooldridge in Introductory Economics and on the Stata forum has made the point that dummy variables are treated like any other variable. What I don't understand is that in cases where there are more than two periods a differenced (0/1) indicator variable can take on three possible values ( -1, 0, and 1). It seems as though differencing transforms an indicator into a three-category categorical variable.
In the simulated example below (using Stata 14.2), I created the indicator variable repgov where 1 = Republican governor and 0 = Democrat/Independent governor. Differencing this indicator results in three possible scenarios:
-1 = Change from Republican (t-1) to Democrat (t)
0 = No change in party control from t-1 to t (Democrat to Democrat or Republican to Republican)
1 = Change from Democrat (t-1) to Republican (t)
In Example 1 below, I estimate a linear regression model, differencing the predictor repgov, a continuous control variable lnpcinc (LN per-capita income), and a continuous outcome variable unemp (% unemployed). It does not seem right to interpret the coefficient for D.repgov as though it is a continuous variable. In other words, I don't know that I can conclude from the results shown below that Republican governors are associated with a 0.48 percentage point increase in the unemployment rate.
In an alternative specification (Example 2), I created three indicator variables corresponding to the possible values of D.repgov (-1, 0, 1). I then estimated a separate model using two of these indicators where no change in party control is the excluded category. I thought the interpretation of the second specification would indicate that relative to no change in party control, there is a 0.85 reduction in the unemployment rate when party control shifts from Republican to Democratic.
I am starting to think that the first difference specification is not appropriate for addressing my hypothesis that Republican governors are associated with a higher unemployment rate than are Democratic governors because even the second specification using the dummy variables is really capturing the change in party control. Any advice on differencing indicators and interpretation of coefficients would be greatly appreciated.
Wooldridge, J. M. (2006). Introductory econometrics: A modern approach. Mason, OH: Thomson/South-Western.
I am interested in estimating a linear regression model with first differenced panel data. My question relates to the interpretation of first-differenced indicator variables. In the made up example I use here, my hypothesis is that state unemployment rates will be higher when there is a Republican governor than they are when there is a Democratic governor. I am using both first difference and fixed effects estimators, but I am predominately concerned with the first difference specification for purposes of this post.
Wooldridge in Introductory Economics and on the Stata forum has made the point that dummy variables are treated like any other variable. What I don't understand is that in cases where there are more than two periods a differenced (0/1) indicator variable can take on three possible values ( -1, 0, and 1). It seems as though differencing transforms an indicator into a three-category categorical variable.
In the simulated example below (using Stata 14.2), I created the indicator variable repgov where 1 = Republican governor and 0 = Democrat/Independent governor. Differencing this indicator results in three possible scenarios:
-1 = Change from Republican (t-1) to Democrat (t)
0 = No change in party control from t-1 to t (Democrat to Democrat or Republican to Republican)
1 = Change from Democrat (t-1) to Republican (t)
In Example 1 below, I estimate a linear regression model, differencing the predictor repgov, a continuous control variable lnpcinc (LN per-capita income), and a continuous outcome variable unemp (% unemployed). It does not seem right to interpret the coefficient for D.repgov as though it is a continuous variable. In other words, I don't know that I can conclude from the results shown below that Republican governors are associated with a 0.48 percentage point increase in the unemployment rate.
In an alternative specification (Example 2), I created three indicator variables corresponding to the possible values of D.repgov (-1, 0, 1). I then estimated a separate model using two of these indicators where no change in party control is the excluded category. I thought the interpretation of the second specification would indicate that relative to no change in party control, there is a 0.85 reduction in the unemployment rate when party control shifts from Republican to Democratic.
I am starting to think that the first difference specification is not appropriate for addressing my hypothesis that Republican governors are associated with a higher unemployment rate than are Democratic governors because even the second specification using the dummy variables is really capturing the change in party control. Any advice on differencing indicators and interpretation of coefficients would be greatly appreciated.
Code:
regress D.(unemp lnpcinc repgov)
Code:
clear all set obs 1000 set seed 12345 ******************************************************************************** * Create simulated panel data set with t=10 and n = 100 ******************************************************************************** * generate year egen year = fill(1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10) * generate state bysort year: gen state = _n * indicate panel data xtset state year, yearly * generate indicator for Republican governor (1=Rep and 0=Dem/Ind) gen repgov = floor((1-0+1)*runiform() + 0) * generate unemployment rate gen unemp = rnormal(5.75, 3.6) * generate LN percapita income gen lnpcinc = rnormal(10.4, 0.04) * first difference Republican governor indicator gen Drepgov = D.repgov * generate three indicator variables for each of three D.repgov outcomes gen RepToDem = 1 if Drepgov == -1 replace RepToDem = 0 if Drepgov == 0 | Drepgov == 1 gen noChng = 1 if Drepgov == 0 replace noChng = 0 if Drepgov == -1 | Drepgov == 1 gen DemToRep = 1 if Drepgov == 1 replace DemToRep = 0 if Drepgov == -1 | Drepgov == 0 * Example 0: * Tab Drepgov to show possible values are -1, 0, and 1 tab Drepgov * Example 1: * First Differences Regression with D.repgov reg D.(unemp repgov lnpcinc) * Example 2: * First Differences Regression with indicators for values of D.repgov reg D.unemp RepToDem DemToRep D.lnpcinc ******************************************************************************** * Example 0: * Tabulation of Differenced Republican Governor Indicator ******************************************************************************** Drepgov | Freq. Percent Cum. ------------+----------------------------------- -1 | 220 24.44 24.44 0 | 467 51.89 76.33 1 | 213 23.67 100.00 ------------+----------------------------------- Total | 900 100.00 ******************************************************************************** * Example 1: * First Differences Regression with D.repgov ******************************************************************************** Source | SS df MS Number of obs = 900 -------------+---------------------------------- F(2, 897) = 1.91 Model | 103.053302 2 51.5266509 Prob > F = 0.1489 Residual | 24217.407 897 26.998224 R-squared = 0.0042 -------------+---------------------------------- Adj R-squared = 0.0020 Total | 24320.4603 899 27.0527923 Root MSE = 5.196 ------------------------------------------------------------------------------ D.unemp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- repgov | D1. | .4840804 .2497188 1.94 0.053 -.0060208 .9741815 | lnpcinc | D1. | -.7564374 3.135607 -0.24 0.809 -6.910418 5.397543 | _cons | .0570528 .1732557 0.33 0.742 -.282981 .3970866 ------------------------------------------------------------------------------ ******************************************************************************** * Example 2: * First Differences Regression with indicators for values of D.repgov ******************************************************************************** Source | SS df MS Number of obs = 900 -------------+---------------------------------- F(3, 896) = 1.66 Model | 134.715057 3 44.9050191 Prob > F = 0.1733 Residual | 24185.7452 896 26.9930192 R-squared = 0.0055 -------------+---------------------------------- Adj R-squared = 0.0022 Total | 24320.4603 899 27.0527923 Root MSE = 5.1955 ------------------------------------------------------------------------------ D.unemp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- RepToDem | -.8564403 .4249167 -2.02 0.044 -1.690388 -.0224923 DemToRep | .1054057 .4296481 0.25 0.806 -.7378282 .9486396 | lnpcinc | D1. | -.8343045 3.136129 -0.27 0.790 -6.989319 5.32071 | _cons | .2377924 .2405444 0.99 0.323 -.2343037 .7098886 ------------------------------------------------------------------------------
Comment