Dear Statalisters,
I am relatively new to Stata, so please bear with me. Also, I am aware that this issue has already been addressed on this forum, but I don't seem to be able to find the solution to my problem.
I am using logit in Stata 15.1 to understand whether migration changes employment outcomes. I am using an unbalanced dataset. For the purpose of this explanation, I will use the most basic specification (i.e. without any socio-economic control variables and without margins which I use at a later stage).
I am typing:
where:
- employed is a binary variable equal to 1 for years when respondents were economically active, 0 otherwise.
- migrant is a 'treatment': a time-invariant binary variable for control and treatment groups, which equals 1 for migrants (those who migrated); 0 for non-migrants (those who stayed behind).
- migration is 'time' or 'post': a binary variable equal to 1 for years after migration, 0 for years before migration. As such, migration == 0 for both groups in the years before migration, but migration == 1 only for 1 group who underwent the treatment, i.e. migrants.
The problem I encounter is as follows: the interaction term is omitted due to collinearity (while both migrant & migration are estimated without problems). More specifically, I obtain the following output:
Please note that I tripple-checked the data to make sure they are coded in the correct way. An example of data for a migrant in my data would be:
A corresponding example for a non-migrant:
Is the problem driven by the fact that my time/post variable (here: migration) varies only for the control group (i.e. migrants)? Or is there any other issue I am not aware of? I will be most grateful for your help.
Best wishes,
Justyna
I am relatively new to Stata, so please bear with me. Also, I am aware that this issue has already been addressed on this forum, but I don't seem to be able to find the solution to my problem.
I am using logit in Stata 15.1 to understand whether migration changes employment outcomes. I am using an unbalanced dataset. For the purpose of this explanation, I will use the most basic specification (i.e. without any socio-economic control variables and without margins which I use at a later stage).
I am typing:
Code:
logit employed i.migrant##i.migration i.year, cluster(ident)
- employed is a binary variable equal to 1 for years when respondents were economically active, 0 otherwise.
- migrant is a 'treatment': a time-invariant binary variable for control and treatment groups, which equals 1 for migrants (those who migrated); 0 for non-migrants (those who stayed behind).
- migration is 'time' or 'post': a binary variable equal to 1 for years after migration, 0 for years before migration. As such, migration == 0 for both groups in the years before migration, but migration == 1 only for 1 group who underwent the treatment, i.e. migrants.
The problem I encounter is as follows: the interaction term is omitted due to collinearity (while both migrant & migration are estimated without problems). More specifically, I obtain the following output:
Code:
logit employed i.migrant##i.l_mig2 i.year, cluster(ident) note: 1950.year != 0 predicts success perfectly 1950.year dropped and 1 obs not used note: 0.migrant#1.l_mig2 identifies no observations in the sample note: 1.migrant#1.l_mig2 omitted because of collinearity note: 2009.year omitted because of collinearity Iteration 0: log pseudolikelihood = -67798.349 Iteration 1: log pseudolikelihood = -67286.391 Iteration 2: log pseudolikelihood = -67285.374 Iteration 3: log pseudolikelihood = -67285.374 Logistic regression Number of obs = 104,797 Wald chi2(60) = 224.75 Prob > chi2 = 0.0000 Log pseudolikelihood = -67285.374 Pseudo R2 = 0.0076 (Std. Err. adjusted for 4,502 clusters in ident) -------------------------------------------------------------------------------- | Robust employed | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- 1.migrant | -.259598 .0554714 -4.68 0.000 -.36832 -.150876 1.l_mig2 | .4030797 .0643309 6.27 0.000 .2769934 .529166 | migrant#l_mig2 | 0 1 | 0 (empty) 1 1 | 0 (omitted) | year | 1950 | 0 (empty) 1951 | -.8324219 1.000956 -0.83 0.406 -2.79426 1.129416 1952 | -.9865726 .5585038 -1.77 0.077 -2.08122 .1080748 1953 | -.8025823 .3977641 -2.02 0.044 -1.582186 -.0229789
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str8 ident double year float(migrant migration) "B0000001" 1991 1 0 "B0000001" 1992 1 0 "B0000001" 1993 1 0 "B0000001" 1994 1 0 "B0000001" 1995 1 0 "B0000001" 1996 1 0 "B0000001" 1997 1 0 "B0000001" 1998 1 0 "B0000001" 1999 1 0 "B0000001" 2000 1 0 "B0000001" 2001 1 0 "B0000001" 2002 1 0 "B0000001" 2003 1 1 "B0000001" 2004 1 1 "B0000001" 2005 1 1 "B0000001" 2006 1 1 "B0000001" 2007 1 1 "B0000001" 2008 1 1 "B0000001" 2009 1 1 end format %ty year
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str8 ident double year float(migrant migration) "C0001002" 1995 0 0 "C0001002" 1996 0 0 "C0001002" 1997 0 0 "C0001002" 1998 0 0 "C0001002" 1999 0 0 "C0001002" 2000 0 0 "C0001002" 2001 0 0 "C0001002" 2002 0 0 "C0001002" 2003 0 0 "C0001002" 2004 0 0 "C0001002" 2005 0 0 "C0001002" 2006 0 0 "C0001002" 2007 0 0 "C0001002" 2008 0 0 "C0001002" 2009 0 0 end format %ty year
Best wishes,
Justyna
Comment