Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • clustering SE in Diff-in-Diff setting in a repeated cross sectional data from DHS, and the number of cluster is small.

    Dear all,
    I am trying to estimate the impact of a policy passed at a regional level using DHS data in a diff-in-diff setting. I have 11 regions, in which 4 are exposed to the reform and 5 of them weren't. I pooled DHS (Demographic and Health survey) collected at different years. My question is on clustering standard error at the regional level, since the number of regions (clusters) is small, I have to use wild bootstrap, but I find it difficult with DHS data.
    could you please check my code below and comment if you have a better option?

    * BEFORE appending I create a weight variable for each survey round
    gen wgt=v005/1000000 //v005 women's individual sample weight
    gen weight=(wgt*FEMALE POPULATION)/FEMALE SAMPLE //Total number of Female population in that year, and Total number of female sample
    gen survey=1 //This identifies the first wave it will be 2 for the second wave, and so on.

    * Then I appended different rounds to create a single pooled cross-sectional data and svyset.

    egen cluster=group(survey v021) // V021 is PSU (primary sapling unit) and survey identifies year of survey (coded 1, 2, 3 and 4)
    egen stratum=group(survey v022) // V022 is sample strata for sampling error
    svyset cluster [pw=wgt], strata(stratum) singleunit(centered)

    * Run DiD regression
    svy: reg treated_post treated post Xlist //Is this sufficient to correct the standard error, equivalent to what clustering at the regional level does?


    I am so grateful for any tips.

    Best,
    T

  • #2
    svyset cluster [pw=weight], strata(stratum) singleunit(centered)

    Originally posted by Tariku Getaneh View Post
    Dear all,
    I am trying to estimate the impact of a policy passed at a regional level using DHS data in a diff-in-diff setting. I have 11 regions, in which 4 are exposed to the reform and 5 of them weren't. I pooled DHS (Demographic and Health survey) collected at different years. My question is on clustering standard error at the regional level, since the number of regions (clusters) is small, I have to use wild bootstrap, but I find it difficult with DHS data.
    could you please check my code below and comment if you have a better option?

    * BEFORE appending I create a weight variable for each survey round
    gen wgt=v005/1000000 //v005 women's individual sample weight
    gen weight=(wgt*FEMALE POPULATION)/FEMALE SAMPLE //Total number of Female population in that year, and Total number of female sample
    gen survey=1 //This identifies the first wave it will be 2 for the second wave, and so on.

    * Then I appended different rounds to create a single pooled cross-sectional data and svyset.

    egen cluster=group(survey v021) // V021 is PSU (primary sapling unit) and survey identifies year of survey (coded 1, 2, 3 and 4)
    egen stratum=group(survey v022) // V022 is sample strata for sampling error
    svyset cluster [pw=weight], strata(stratum) singleunit(centered)

    * Run DiD regression
    svy: reg treated_post treated post Xlist //Is this sufficient to correct the standard error, equivalent to what clustering at the regional level does?


    I am so grateful for any tips.

    Best,
    T

    Comment

    Working...
    X