Dear Statalists,
My apologies in advance for the long post.
I am undertaking a study on the determinants of interregional migration in the European Union using an unbalanced panel dataset composed of 129 Europeans regions (in 13 countries) over the period 1998 – 2013 (total observations of 1350). Based on the neoclassical theory of migration, which states that differences in expected earnings are the primer driver of labor migration, I am testing the effect of regional wage and unemployment differentials on migration-induced population growth at the regional level.
I am differencing between migration within the country (people moving between regions of the same country) and migration between countries (people coming/going to/from a region in another country).
My hypothesis is the following: labor market incentives should have a stronger impact on “internal migration” as it is easier for people to move within the country to take advantage of higher regional wage or lower unemployment, as they are not subject to language barrier or whatsoever.
I compare the effect of labor market incentives on “internal” and “international” migration.
To that end, I run two similar regressions.
The dependent variable is the net migration rate, where net migration flows correspond to people moving within the country for the internal migration regression and to/from other countries for the international migration one.
The two independent variables of interest are the (logarithm of) wage and unemployment differentials (with respect to the appropriate economic area), lagged by one year to avoid endogeneity issues.
For the internal migration, differentials are expressed with respect to the national average (i.e: ratio of the regional wage over the national average wage) while for the international migration; differentials are expressed with respect to the European union average (ratio of the regional wage over the EU average wage). The logarithm of the population density is added as a control variable.
I estimated a fixed effect model with region and time fixed effects as there is a high probability that my covariates are correlated with my region fixed effects due to the inability to control for more regional characteristics. The standard hausman test as well as the robust one (xtoverid) confirms that fixed effect is indeed the appropriate model.
Also, I detected heteroskedasticity (xttest3) as well as autocorrelation (xtserial), and thus used the option fe, vce(robust).
However, I would like, especially when modeling the reaction of cross-border net migration, to take into account the economic performance of the country as a whole in addition to the regional characteristics. This would imply to include a country fixed effect and going for a three ways fixed effect model. While this method seems theoretically sound, implementing it on Stata looks quite difficult. In addition, my dataset doesn’t enjoy a high within variability; too many fixed effects may capture all the variance.
I then considered clustering at the country level (which take also care of clustering at the regional level), which will allow the error terms of regions belonging to the same country to be correlated (as well as the error-terms of observation of the same region to be correlated over time and heteroskedastic). However, I am concern whether this is the most appropriate method to handle it: as my number of cluster is quite small (13 countries), and the clustering relies on the number of cluster going to infinity. According to Kédizi (Kézdi, Gabor. 2004. “Robust Standard Error Estimation in Fixed-E↵ects Panel Models.” ), clustering with less than 50 clusters may be even worse than not clustering at all.
#0. General question: how can I take into account the fact that my data are nested by country ? Do I have to take it into account in my case or can I “ignore” the structure of the data (given my unit of observation is the region)? I guess it is even possible that regions belonging to different countries have more in common in terms of unemployment experience and regional income than regions of the same countries, which will oppose to cluster at the country level.
#1. On which criterion should I compare my estimates using cluster at the region level versus cluster at the country level. Is even clustering at the country level right theoretically and methodologically given the small number of cluster (13 clusters)?
#2. Otherwise, should I abandon the fixed effect model and go for a mixed (hierarchical) one, which will allow me to include a country fixed effect? If yes, is there something I should be careful with?
#3. Also, it seems that a lot of the variability exists between regions rather than within them -> is this evidence that I should maybe go for a between model? How can I formally check for this besides using xtsum variables.
#4. Lastly, if I am correct the robust / clustering option(s) takes care also of potential autocorrelation. When looking for it, I read that one can also use xtregar (Cochran-Orcutt transformation I think). When do we prefer xtregar over xtreg, fe robust? It is possible to include time fixed effect with xtregar (I was unable too).
Below is a sample of my dataset as well as the regression outputs using cluster(region) versus cluster (country).
[xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens ]
[xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(robust)]
[xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(robust)]
[xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(cluster country)]
[xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(cluster country)]
Sorry again for the long post, I hope I expressed myself clearly. Any help would be much appreciated.
Best ,
Randa
My apologies in advance for the long post.
I am undertaking a study on the determinants of interregional migration in the European Union using an unbalanced panel dataset composed of 129 Europeans regions (in 13 countries) over the period 1998 – 2013 (total observations of 1350). Based on the neoclassical theory of migration, which states that differences in expected earnings are the primer driver of labor migration, I am testing the effect of regional wage and unemployment differentials on migration-induced population growth at the regional level.
I am differencing between migration within the country (people moving between regions of the same country) and migration between countries (people coming/going to/from a region in another country).
My hypothesis is the following: labor market incentives should have a stronger impact on “internal migration” as it is easier for people to move within the country to take advantage of higher regional wage or lower unemployment, as they are not subject to language barrier or whatsoever.
I compare the effect of labor market incentives on “internal” and “international” migration.
To that end, I run two similar regressions.
The dependent variable is the net migration rate, where net migration flows correspond to people moving within the country for the internal migration regression and to/from other countries for the international migration one.
The two independent variables of interest are the (logarithm of) wage and unemployment differentials (with respect to the appropriate economic area), lagged by one year to avoid endogeneity issues.
For the internal migration, differentials are expressed with respect to the national average (i.e: ratio of the regional wage over the national average wage) while for the international migration; differentials are expressed with respect to the European union average (ratio of the regional wage over the EU average wage). The logarithm of the population density is added as a control variable.
I estimated a fixed effect model with region and time fixed effects as there is a high probability that my covariates are correlated with my region fixed effects due to the inability to control for more regional characteristics. The standard hausman test as well as the robust one (xtoverid) confirms that fixed effect is indeed the appropriate model.
Also, I detected heteroskedasticity (xttest3) as well as autocorrelation (xtserial), and thus used the option fe, vce(robust).
However, I would like, especially when modeling the reaction of cross-border net migration, to take into account the economic performance of the country as a whole in addition to the regional characteristics. This would imply to include a country fixed effect and going for a three ways fixed effect model. While this method seems theoretically sound, implementing it on Stata looks quite difficult. In addition, my dataset doesn’t enjoy a high within variability; too many fixed effects may capture all the variance.
I then considered clustering at the country level (which take also care of clustering at the regional level), which will allow the error terms of regions belonging to the same country to be correlated (as well as the error-terms of observation of the same region to be correlated over time and heteroskedastic). However, I am concern whether this is the most appropriate method to handle it: as my number of cluster is quite small (13 countries), and the clustering relies on the number of cluster going to infinity. According to Kédizi (Kézdi, Gabor. 2004. “Robust Standard Error Estimation in Fixed-E↵ects Panel Models.” ), clustering with less than 50 clusters may be even worse than not clustering at all.
#0. General question: how can I take into account the fact that my data are nested by country ? Do I have to take it into account in my case or can I “ignore” the structure of the data (given my unit of observation is the region)? I guess it is even possible that regions belonging to different countries have more in common in terms of unemployment experience and regional income than regions of the same countries, which will oppose to cluster at the country level.
#1. On which criterion should I compare my estimates using cluster at the region level versus cluster at the country level. Is even clustering at the country level right theoretically and methodologically given the small number of cluster (13 clusters)?
#2. Otherwise, should I abandon the fixed effect model and go for a mixed (hierarchical) one, which will allow me to include a country fixed effect? If yes, is there something I should be careful with?
#3. Also, it seems that a lot of the variability exists between regions rather than within them -> is this evidence that I should maybe go for a between model? How can I formally check for this besides using xtsum variables.
#4. Lastly, if I am correct the robust / clustering option(s) takes care also of potential autocorrelation. When looking for it, I read that one can also use xtregar (Cochran-Orcutt transformation I think). When do we prefer xtregar over xtreg, fe robust? It is possible to include time fixed effect with xtregar (I was unable too).
Below is a sample of my dataset as well as the regression outputs using cluster(region) versus cluster (country).
[xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens ]
[xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(robust)]
[xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(robust)]
[xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(cluster country)]
[xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(cluster country)]
Sorry again for the long post, I hope I expressed myself clearly. Any help would be much appreciated.
Best ,
Randa
Comment