Dear all,
I am analyzing the impact of 3rd and 4th division soccer teams (1), their stadiums (2), and their affiliation to 1st & 2nd division teams VS. independence (3) on per capita GDP on county-level. My data set (strongly balanced) includes 266 counties from 1995-2012 with around 30 independent variables (many of them dummies). I am using a linear reduced form model:
yit is the per capita GDP in county i at time t; β1 is the corresponding vector of parameters to be estimated
Xit is a vector of local market variables for each county i at time t; β2 is the corresponding vector of parameters to be estimated
Zit is a vector of third and fourth league team as well as stadium variables in county i at time t
ϑi is a county i specific fixed effect
μt is a time t specific fixed effect
εit is a random disturbance
Since the data set is heteroskedastic, autocorrelated, shows contemporaneous correlation and includes a lagged dependent variable, I thought that taking first differences would eliminate autocorrelation, explicit fixed effects and the correlation of the lagged dependent variable with the disturbances. Then I would run the command xtpcse which, I think, accounts for heteroskedasticity and contemporaneous correlation. As first differencing (and then symplifying) the model above doesn't change the parameters, I would just interprete them like before first-diffrencing.
Questions:
(a) Is there anything to argue about my approach from an econometrics (and/or statistics) point of view?
(b) Can first-differencing be done with binary variables? Intuitively, this isn't as easy as it seems. I did some research but couldn't find an entirely satisfying answer.
(c) What are the Stata commands to get first-differences? All I found seems to violate the boundaries of each panel; i.e. the last year of county 1 seems to be substracted from the first year of county 2 and so on.
(d) Concerning the command xtpcse, which of the options (correlation(ar1) and correlation(psar1) ) is suitable for which type of data? The Stata manual wasn't really a help to me here.
Best regards,
Alex
Note: I am using Stata 12.
I am analyzing the impact of 3rd and 4th division soccer teams (1), their stadiums (2), and their affiliation to 1st & 2nd division teams VS. independence (3) on per capita GDP on county-level. My data set (strongly balanced) includes 266 counties from 1995-2012 with around 30 independent variables (many of them dummies). I am using a linear reduced form model:
yit= β1 Xit+ β2 Zit+ ϑi+ μt+ εit
yit is the per capita GDP in county i at time t; β1 is the corresponding vector of parameters to be estimated
Xit is a vector of local market variables for each county i at time t; β2 is the corresponding vector of parameters to be estimated
Zit is a vector of third and fourth league team as well as stadium variables in county i at time t
ϑi is a county i specific fixed effect
μt is a time t specific fixed effect
εit is a random disturbance
Since the data set is heteroskedastic, autocorrelated, shows contemporaneous correlation and includes a lagged dependent variable, I thought that taking first differences would eliminate autocorrelation, explicit fixed effects and the correlation of the lagged dependent variable with the disturbances. Then I would run the command xtpcse which, I think, accounts for heteroskedasticity and contemporaneous correlation. As first differencing (and then symplifying) the model above doesn't change the parameters, I would just interprete them like before first-diffrencing.
Questions:
(a) Is there anything to argue about my approach from an econometrics (and/or statistics) point of view?
(b) Can first-differencing be done with binary variables? Intuitively, this isn't as easy as it seems. I did some research but couldn't find an entirely satisfying answer.
(c) What are the Stata commands to get first-differences? All I found seems to violate the boundaries of each panel; i.e. the last year of county 1 seems to be substracted from the first year of county 2 and so on.
(d) Concerning the command xtpcse, which of the options (correlation(ar1) and correlation(psar1) ) is suitable for which type of data? The Stata manual wasn't really a help to me here.
Best regards,
Alex
Note: I am using Stata 12.
Comment