clustering SE in Diff-in-Diff setting in a repeated cross sectional data from DHS, and the number of cluster is small.

Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#1

clustering SE in Diff-in-Diff setting in a repeated cross sectional data from DHS, and the number of cluster is small.

24 Mar 2022, 14:34

Dear all,
I am trying to estimate the impact of a policy passed at a regional level using DHS data in a diff-in-diff setting. I have 11 regions, in which 4 are exposed to the reform and 5 of them weren't. I pooled DHS (Demographic and Health survey) collected at different years. My question is on clustering standard error at the regional level, since the number of regions (clusters) is small, I have to use wild bootstrap, but I find it difficult with DHS data.
could you please check my code below and comment if you have a better option?

* BEFORE appending I create a weight variable for each survey round
gen wgt=v005/1000000 //v005 women's individual sample weight
gen weight=(wgt*FEMALE POPULATION)/FEMALE SAMPLE //Total number of Female population in that year, and Total number of female sample
gen survey=1 //This identifies the first wave it will be 2 for the second wave, and so on.

* Then I appended different rounds to create a single pooled cross-sectional data and svyset.

egen cluster=group(survey v021) // V021 is PSU (primary sapling unit) and survey identifies year of survey (coded 1, 2, 3 and 4)
egen stratum=group(survey v022) // V022 is sample strata for sampling error
svyset cluster [pw=wgt], strata(stratum) singleunit(centered)

* Run DiD regression
svy: reg treated_post treated post Xlist //Is this sufficient to correct the standard error, equivalent to what clustering at the regional level does?

I am so grateful for any tips.

Best,
T
Tags: None
Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#2

24 Mar 2022, 19:59

svyset cluster [pw=weight], strata(stratum) singleunit(centered)

Originally posted by Tariku Getaneh View Post

Dear all,
I am trying to estimate the impact of a policy passed at a regional level using DHS data in a diff-in-diff setting. I have 11 regions, in which 4 are exposed to the reform and 5 of them weren't. I pooled DHS (Demographic and Health survey) collected at different years. My question is on clustering standard error at the regional level, since the number of regions (clusters) is small, I have to use wild bootstrap, but I find it difficult with DHS data.
could you please check my code below and comment if you have a better option?

* BEFORE appending I create a weight variable for each survey round
gen wgt=v005/1000000 //v005 women's individual sample weight
gen weight=(wgt*FEMALE POPULATION)/FEMALE SAMPLE //Total number of Female population in that year, and Total number of female sample
gen survey=1 //This identifies the first wave it will be 2 for the second wave, and so on.

* Then I appended different rounds to create a single pooled cross-sectional data and svyset.

egen cluster=group(survey v021) // V021 is PSU (primary sapling unit) and survey identifies year of survey (coded 1, 2, 3 and 4)
egen stratum=group(survey v022) // V022 is sample strata for sampling error
svyset cluster [pw=weight], strata(stratum) singleunit(centered)

* Run DiD regression
svy: reg treated_post treated post Xlist //Is this sufficient to correct the standard error, equivalent to what clustering at the regional level does?

I am so grateful for any tips.

Best,
T
Comment

Announcement

clustering SE in Diff-in-Diff setting in a repeated cross sectional data from DHS, and the number of cluster is small.

Comment