Dear all,
I am trying to estimate the impact of a policy passed at a regional level using DHS data in a diff-in-diff setting. I have 11 regions, in which 4 are exposed to the reform and 5 of them weren't. I pooled DHS (Demographic and Health survey) collected at different years. My question is on clustering standard error at the regional level, since the number of regions (clusters) is small, I have to use wild bootstrap, but I find it difficult with DHS data.
could you please check my code below and comment if you have a better option?
* BEFORE appending I create a weight variable for each survey round
gen wgt=v005/1000000 //v005 women's individual sample weight
gen weight=(wgt*FEMALE POPULATION)/FEMALE SAMPLE //Total number of Female population in that year, and Total number of female sample
gen survey=1 //This identifies the first wave it will be 2 for the second wave, and so on.
* Then I appended different rounds to create a single pooled cross-sectional data and svyset.
egen cluster=group(survey v021) // V021 is PSU (primary sapling unit) and survey identifies year of survey (coded 1, 2, 3 and 4)
egen stratum=group(survey v022) // V022 is sample strata for sampling error
svyset cluster [pw=wgt], strata(stratum) singleunit(centered)
* Run DiD regression
svy: reg treated_post treated post Xlist //Is this sufficient to correct the standard error, equivalent to what clustering at the regional level does?
I am so grateful for any tips.
Best,
T
I am trying to estimate the impact of a policy passed at a regional level using DHS data in a diff-in-diff setting. I have 11 regions, in which 4 are exposed to the reform and 5 of them weren't. I pooled DHS (Demographic and Health survey) collected at different years. My question is on clustering standard error at the regional level, since the number of regions (clusters) is small, I have to use wild bootstrap, but I find it difficult with DHS data.
could you please check my code below and comment if you have a better option?
* BEFORE appending I create a weight variable for each survey round
gen wgt=v005/1000000 //v005 women's individual sample weight
gen weight=(wgt*FEMALE POPULATION)/FEMALE SAMPLE //Total number of Female population in that year, and Total number of female sample
gen survey=1 //This identifies the first wave it will be 2 for the second wave, and so on.
* Then I appended different rounds to create a single pooled cross-sectional data and svyset.
egen cluster=group(survey v021) // V021 is PSU (primary sapling unit) and survey identifies year of survey (coded 1, 2, 3 and 4)
egen stratum=group(survey v022) // V022 is sample strata for sampling error
svyset cluster [pw=wgt], strata(stratum) singleunit(centered)
* Run DiD regression
svy: reg treated_post treated post Xlist //Is this sufficient to correct the standard error, equivalent to what clustering at the regional level does?
I am so grateful for any tips.
Best,
T
Comment