Hello everyone,
I am currently doing a research about the effects on labour market of Venezuelan migration in Peru. For the first step I want to get the effects of natives mean wages in the three biggest cities in terms of population due to the recent mass migration. In order to do this model, I got the yearly mean wages by city, from a large dataset of yearly labour market surveys (Cross-sectional data from 2014 to 2019) and the yearly migration share on cities' population which starts in 2017, this means that the treatment variable is 0 before 2017 and increases every year, since 2017, for each city with different intensity. Before the year 2017, where the mass migration started, there is a parallel trend in mean wages between this 3 cities which also share cultural demographics.
So I tried this code:
didregress (cities_wmean) (legshare_cities, continuous), group(cities) time(year)
cities_wmean: It is a variable which is equal to the cities' mean wage. the value is the same for each respondent within each city, due to previous coding.
legshare_cities: I got legal migration share which is a proxy of the real migration, this variable goes from 0 to 1 because Stata does not accept a percentage variable, I would like to know if there is a different way to create a percentage variable.
cities: categorical variable that groups the cities' surveys respondents.
On the first try I did not set the 0 in the legal migrant share variable for the pre treatment time, so the regression p-value indicated a statistically significant effect of the treatment coefficient, this did not happen when I set the legal migrant share for the pre treatment time. The following graphs shows us this:
didregress (cities_wmean_n) (legshare_cities, continuous), group(cities) time(year) aeq
Difference-in-differences regression Number of obs = 61,216
Data type: Repeated cross-sectional
(Std. err. adjusted for 3 clusters in what)
---------------------------------------------------------------------------------
| Robust
cities_wmean_n | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
ATET |
legshare_cities | .5157201 .0411641 12.53 0.006 .3386054 .6928349
----------------+----------------------------------------------------------------
Controls |
year |
2018 | 1.211922 .2154213 5.63 0.030 .285039 2.138805
2019 | 1.465113 .1342818 10.91 0.008 .8873452 2.042881
|
_cons | 115.5126 .0882052 1309.59 0.000 115.1331 115.8921
---------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.
Difference-in-differences regression Number of obs = 120,939
Data type: Repeated cross-sectional
(Std. err. adjusted for 3 clusters in what)
---------------------------------------------------------------------------------
| Robust
cities_wmean_n | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
ATET |
legshare_cities | .5748748 .7738795 0.74 0.535 -2.75486 3.904609
----------------+----------------------------------------------------------------
Controls |
year |
2015 | 1.026905 .0330452 31.08 0.001 .8847226 1.169087
2016 | 2.118325 1.136744 1.86 0.203 -2.772689 7.00934
2017 | 1.573374 .7771873 2.02 0.180 -1.770592 4.917341
2018 | 2.749396 1.51991 1.81 0.212 -3.790251 9.289042
2019 | 2.89084 2.599513 1.11 0.382 -8.29396 14.07564
|
_cons | 114.0902 .5663133 201.46 0.000 111.6536 116.5269
---------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.
I would like to know if there is something wrong with my set up of this Diff. in diff. regression, what would be the meaning of the treatment coefficient if the set up is correct, and other suggestions.
Many thanks in advance.
I am currently doing a research about the effects on labour market of Venezuelan migration in Peru. For the first step I want to get the effects of natives mean wages in the three biggest cities in terms of population due to the recent mass migration. In order to do this model, I got the yearly mean wages by city, from a large dataset of yearly labour market surveys (Cross-sectional data from 2014 to 2019) and the yearly migration share on cities' population which starts in 2017, this means that the treatment variable is 0 before 2017 and increases every year, since 2017, for each city with different intensity. Before the year 2017, where the mass migration started, there is a parallel trend in mean wages between this 3 cities which also share cultural demographics.
So I tried this code:
didregress (cities_wmean) (legshare_cities, continuous), group(cities) time(year)
cities_wmean: It is a variable which is equal to the cities' mean wage. the value is the same for each respondent within each city, due to previous coding.
legshare_cities: I got legal migration share which is a proxy of the real migration, this variable goes from 0 to 1 because Stata does not accept a percentage variable, I would like to know if there is a different way to create a percentage variable.
cities: categorical variable that groups the cities' surveys respondents.
On the first try I did not set the 0 in the legal migrant share variable for the pre treatment time, so the regression p-value indicated a statistically significant effect of the treatment coefficient, this did not happen when I set the legal migrant share for the pre treatment time. The following graphs shows us this:
didregress (cities_wmean_n) (legshare_cities, continuous), group(cities) time(year) aeq
Difference-in-differences regression Number of obs = 61,216
Data type: Repeated cross-sectional
(Std. err. adjusted for 3 clusters in what)
---------------------------------------------------------------------------------
| Robust
cities_wmean_n | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
ATET |
legshare_cities | .5157201 .0411641 12.53 0.006 .3386054 .6928349
----------------+----------------------------------------------------------------
Controls |
year |
2018 | 1.211922 .2154213 5.63 0.030 .285039 2.138805
2019 | 1.465113 .1342818 10.91 0.008 .8873452 2.042881
|
_cons | 115.5126 .0882052 1309.59 0.000 115.1331 115.8921
---------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.
Difference-in-differences regression Number of obs = 120,939
Data type: Repeated cross-sectional
(Std. err. adjusted for 3 clusters in what)
---------------------------------------------------------------------------------
| Robust
cities_wmean_n | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
ATET |
legshare_cities | .5748748 .7738795 0.74 0.535 -2.75486 3.904609
----------------+----------------------------------------------------------------
Controls |
year |
2015 | 1.026905 .0330452 31.08 0.001 .8847226 1.169087
2016 | 2.118325 1.136744 1.86 0.203 -2.772689 7.00934
2017 | 1.573374 .7771873 2.02 0.180 -1.770592 4.917341
2018 | 2.749396 1.51991 1.81 0.212 -3.790251 9.289042
2019 | 2.89084 2.599513 1.11 0.382 -8.29396 14.07564
|
_cons | 114.0902 .5663133 201.46 0.000 111.6536 116.5269
---------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.
I would like to know if there is something wrong with my set up of this Diff. in diff. regression, what would be the meaning of the treatment coefficient if the set up is correct, and other suggestions.
Many thanks in advance.
Comment