Problem: When applying Coarsened Exact Matching (CEM) on yearly panel data (2005-2011), two different treatment persons are matched on two different years of only one control person. I have many control individuals and want to force every treated individual to be matched to only one control individual, so control individuals can only be used once.
Data: I observe every individual over 7 years (500 treated persons, 14.000 control persons, over 100.000 person-year observations) and treatment occurs at a different years for different treated individuals.
Aim and approach so far: I created a new dataset in which I only kept the year prior to treatment for the treated, but the full observation period (so 7 rows) for the control group as we want the treated person to be matched to the control group in the same calendar year (so the full span of control observations should be available to find the match).
With CEM, the year(#0) (and other covariates) and k2k option I made sure that treated individuals are matched to treated observations with similar same pre-treatment characteristics in the same calendar year. But while treated individual A (treatment in 2007, pre-treatment year 2006) can be matched to individual C in 2006, treated individual B (treatment in 2010) can also be matched to individual C in 2009. This problem is rather substantial, for the 500 matches made over a 100 come from the same control individual (I identified duplicates to find this out).
I understand that the problem occurs because I match a treated individual to a multiple control observations per control individual. A straightforward solution is of course to match in wide format with the k2k option, but this is not possible, as I want to force a match in the same calendar year and need the full span of control observations. Unlike psmatch2, where you can identify pairs, CEM sorts individuals in strata, so I have problems seeing how I can identify those treated individuals that are matched to the same control individuals and how to circumvent this.
Any suggestions to prevent the duplicates? (any other comments and suggestions on my approach are very much welcome, thanks!)
P.S. My aim eventually is to merge the matches back to the original panel dataset (keeping the CEM strata identifier) and select only the pre- and post-treatment year for the treatment variables and the same years of the control group and apply a difference in difference design.
Data: I observe every individual over 7 years (500 treated persons, 14.000 control persons, over 100.000 person-year observations) and treatment occurs at a different years for different treated individuals.
Aim and approach so far: I created a new dataset in which I only kept the year prior to treatment for the treated, but the full observation period (so 7 rows) for the control group as we want the treated person to be matched to the control group in the same calendar year (so the full span of control observations should be available to find the match).
Code:
cem sex(#0) age(18 25 45 45 55 65 100) employed(#0) year(#0), treatment(treated) k2k keep if cem_matched==1 duplicates tag ID, gen(dup) tab dup
I understand that the problem occurs because I match a treated individual to a multiple control observations per control individual. A straightforward solution is of course to match in wide format with the k2k option, but this is not possible, as I want to force a match in the same calendar year and need the full span of control observations. Unlike psmatch2, where you can identify pairs, CEM sorts individuals in strata, so I have problems seeing how I can identify those treated individuals that are matched to the same control individuals and how to circumvent this.
Any suggestions to prevent the duplicates? (any other comments and suggestions on my approach are very much welcome, thanks!)
P.S. My aim eventually is to merge the matches back to the original panel dataset (keeping the CEM strata identifier) and select only the pre- and post-treatment year for the treatment variables and the same years of the control group and apply a difference in difference design.
Comment