Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference with multiple treatment periods and multiple treatments

    Good day all! This is my first post on Stata list. Please let me know if any additional information would be useful to answer my question or to support your conideration.

    I am working on a causal study (using difference in difference design) on the effects of state nurse practitioner (NP) scope of practice (NP_sop) laws on the supply of nurse practitioners (specifically interested in NP supply in rural counties) for my dissertation. I have 3 categories of state SOP: 1 = most authority granted to NPs, 2 = moderate authority, and 3 = least authority. So 2 treatments and 3 is the untreated group. The treatments occur in multiple states and in different years from 2010-2017. Some states get treatment 2 then treatment 1 over the period of the study.

    I currently have the data set up in long form by state, county, year with a 3-level categorical variable for NP SOP. Based on all I'm seeing for calculating diff-in-diff in Stata, it looks like I may need 2 variables: moderate (where 1 = moderate, 0 = least (or no treatment) and most (where 1 = most, 0 = least). However, this seems to leave out the fact that some states change from moderate to most authority during the study period. Thoughts?

    Year is currently 1 variable and character type. Should I instead have a dummy variable for 2011-2017 with 2010 as the base year (0) for each dummy?

    Sample data below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float NP_rate byte fips_state_code int(fips_county_code year) float(np_sop2 modsop mostsop)
            0 31 115 2015 1 0 0
            0 31 115 2017 3 0 1
            0 31 125 2010 1 0 0
            0 31 125 2012 1 0 0
            0 31 125 2015 1 0 0
    3.6166365 31 133 2012 1 0 0
     3.760812 31 133 2015 1 0 0
     2.320724 54  21 2014 2 1 0
     2.347969 54  21 2015 2 1 0
     2.551237 54  23 2013 2 1 0
     2.566955 54  23 2014 2 1 0
    4.2495327 54  23 2015 2 1 0
     6.939625 41  69 2010 3 0 1
     7.012623 41  69 2011 3 0 1
    14.044944 41  69 2012 3 0 1
    1.9665684 42  23 2010 1 0 0
     3.992016 42  23 2011 1 0 0
    4.0494027 42  23 2012 1 0 0
     6.729475 30  17 2012 3 0 1
     9.384865 30  17 2017 3 0 1
            0 30  19 2010 3 0 1
            0 30  19 2011 3 0 1
            0 30  19 2012 3 0 1
     9.753983 30  23 2012 3 0 1
    15.374478 30  23 2017 3 0 1
            0 30  25 2012 3 0 1
      7.03493 26  47 2010 2 1 0
     7.610814 26  47 2011 2 1 0
     7.595321 26  47 2012 2 1 0
     8.074284 22 107 2012 1 0 0
     7.754343 29 211 2013 1 0 0
     6.615944 31  17 2012 1 0 0
     6.788866 31  17 2015 1 0 0
    3.3890646 31  27 2010 1 0 0
     3.436426 31  27 2011 1 0 0
    3.4301395 31  27 2012 1 0 0
      2.46063 31  29 2012 1 0 0
    3.1505985 20 105 2012 1 0 0
    4.4528556 22  81 2012 1 0 0
            0 48 341 2013 1 0 0
    1.3562387 48 341 2016 1 0 0
            0 48 345 2013 1 0 0
            0 16   7 2015 3 0 1
     3.258542 16  13 2014 3 0 1
    3.2419415 16  13 2015 3 0 1
     4.010159 19   1 2012 3 0 1
    4.1848006 28  65 2013 1 0 0
     2.537642 28  65 2014 1 0 0
    2.1174104 17  67 2012 1 0 0
    1.2547052 19 135 2010 3 0 1
     1.241311 19 135 2011 3 0 1
    1.2402332 19 135 2012 3 0 1
    4.2566776 20  29 2012 1 0 0
     8.639309 13 239 2014 1 0 0
     8.688097 13 239 2015 1 0 0
     8.667389 22 107 2017 1 0 0
     1.738828 48 483 2013 1 0 0
    11.185682 29  35 2014 1 0 0
    11.176752 29  35 2015 1 0 0
     5.243838 29  41 2013 1 0 0
     5.270787 29  41 2015 1 0 0
            0 46 119 2010 1 0 0
            0 46 119 2012 1 0 0
    1.5365704 32  27 2017 3 0 1
     3.250553 55  13 2012 1 0 0
     3.958045 55  13 2015 1 0 0
     3.736223 41   1 2014 3 0 1
    4.3736334 41   1 2015 3 0 1
     8.854781 46  59 2012 1 0 0
    12.206286 46  59 2017 1 0 0
            0 46  61 2012 1 0 0
            0 46  63 2012 1 0 0
    4.1742034 46  67 2012 1 0 0
      2.99931 51   1 2012 1 0 0
     11.13681 47  39 2012 2 1 0
     13.71507 47  39 2014 2 1 0
    13.722127 47  39 2015 2 1 0
     5.016722 47  49 2012 2 1 0
     6.740058 50   5 2013 3 0 1
    11.271714 50   5 2017 3 0 1
    2.0713463 42  83 2010 1 0 0
      2.78248 42  83 2012 1 0 0
      3.06517 42  83 2015 1 0 0
     2.725538 20  19 2010 1 0 0
     5.600672 20  19 2012 1 0 0
    2.2913444 42 105 2010 1 0 0
    1.7189022 42 105 2011 1 0 0
     1.706776 42 105 2012 1 0 0
    2.2686026 42 109 2012 1 0 0
    2.2252991 42 109 2015 1 0 0
     6.507592 13   1 2013 1 0 0
     7.044543 13   1 2015 1 0 0
            0 13   3 2014 1 0 0
    2.3815193 13   3 2015 1 0 0
    1.8811136 16  63 2014 3 0 1
     5.663583 16  63 2015 3 0 1
    2.0114653 45  65 2012 1 0 0
    2.0953379 45  65 2017 1 0 0
      4.05954 55  91 2012 1 0 0
    4.1152263 55  91 2015 1 0 0
    end

    Thanks in advance for your insight and recommendation.

    Tammie

  • #2
    Your indicator variables mostsop and modsop are not needed: Stata can create these on the fly during analysis for you when you use factor variable notation. Read -help fvvarlist- for more information about factor-variable notation. Similarly, there is no need to create indicators for year: factor-variable notation will take care of this for you with essentially no effort.

    Because your treatment changes in the counties are not synchronized, this is not amenable to classical difference in differences (DID) analysis. It must be analyzed with generalized difference in differences. If you want to learn more about that, I recommend https://www.annualreviews.org/doi/pd...-040617-013507. Your code will look something like this:

    Code:
    drop modsop mostsop
    label define np_sop2   1   "None"  ///
                            2   "Moderate"  ///
                            3   "Most"
    label values np_sop2 np_sop2
    egen long county = group(fips_state_code fips_county_code), label
    xtset county year
    
    xtreg NP_rate i.np_sop2 i.year, fe // NOTE USE OF FACTOR VARIABLE NOTATION
    The coefficients for Moderate and Most in SOP2 will be the generalized DID estimators of the effects of moderate and most SOP policy on NP_rate, compared to no policy. This code does not run properly in your example because, as it turns out, in the example data, every county either always has moderate SOP or never has moderate SOP. This makes the moderate SOP state perfectly predictable from the county fixed effects, so it is dropped from the model. Presumably in your full data there are some counties that sometimes have moderate SOP and sometimes do not. (If there are no such counties, your data are not suitable for what you are trying to accomplish as they are uninformative about the effect of moderate SOP.)

    Comment


    • #3
      Thanks for the thoughtful response. I made recommended modifications (used fips instead of county as variable name). The full dataset has some counties that sometimes have Moderate SOP and sometimes do not. Moderate is not statistically significant; this is consistent with findings of other studies.

      Results here:

      xtreg NP_rate i.np_sop2 i.year, fe

      Fixed-effects (within) regression Number of obs = 25,140
      Group variable: fips Number of groups = 3,143

      R-sq: Obs per group:
      within = 0.4327 min = 4
      between = 0.0001 avg = 8.0
      overall = 0.0669 max = 8

      F(9,21988) = 1863.13
      corr(u_i, Xb) = -0.0043 Prob > F = 0.0000

      ------------------------------------------------------------------------------
      NP_rate | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      np_sop2 |
      Moderate | .0942515 .0730595 1.29 0.197 -.0489504 .2374534
      Most | -.2207893 .0490175 -4.50 0.000 -.3168671 -.1247115
      |
      year |
      2011 | .2869376 .0282127 10.17 0.000 .2316387 .3422364
      2012 | .5848626 .0282443 20.71 0.000 .5295018 .6402234
      2013 | .9299124 .0282768 32.89 0.000 .8744878 .9853369
      2014 | 1.311735 .0282884 46.37 0.000 1.256288 1.367182
      2015 | 1.777292 .0285504 62.25 0.000 1.721331 1.833253
      2016 | 2.270512 .0288174 78.79 0.000 2.214027 2.326996
      2017 | 2.811974 .0288174 97.58 0.000 2.75549 2.868458
      |
      _cons | 2.87286 .0280957 102.25 0.000 2.81779 2.927929
      -------------+----------------------------------------------------------------
      sigma_u | 3.2124574
      sigma_e | 1.118192
      rho | .89193363 (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(3142, 21988) = 65.64 Prob > F = 0.0000

      I'd like to add other covariates instead of assuming fixed effects. For example, Medicaid expansion occurred during the study period and is likely to have an effect on the outcome that should be considered. Seems this may not be allowable with this method; investigating that now.

      Tammie

      Comment


      • #4
        I'd like to add other covariates instead of assuming fixed effects. For example, Medicaid expansion occurred during the study period and is likely to have an effect on the outcome that should be considered. Seems this may not be allowable with this method; investigating that now.

        It is, most definitely, allowable. And with 3000+ counties and an average of 8 observations per county, you have plenty of room to add lots of covariates. Just add them to the list of right-hand side variables in -xtreg-.

        What might be confusing you is knowing that with fixed-effects you cannot estimate the effects of covariates that do not change over time. That is true. But Medicaid expansion would certainly exhibit changes over the time period in your study. And for attributes that don't change over time, bear in mind that even though you cannot estimate their effects in a fixed-effects model, their effects are nevertheless still adjusted for. So if you have a covariate that you want to include so as to eliminate its potential confounding effect on the analysis but it doesn't vary over time you have nothing to worry about: the fixed effects themselves accomplish that. There will be no confounding bias (some call this omitted variable bias--same thing) from such variables. In fact, that is one of the best features of fixed-effects analysis. It's only a problem if you also need to estimate it's effect because it is a variable of direct interest.

        If you do have covariates that are of direct interest in your research question, whose effects you therefore want to estimate, and that do not vary over time, then a fixed-effect estimator will not work for you. Look into the -xthybrid- command, written by Francisco Perales and Reinhard Schunck, available from SSC.

        Comment


        • #5
          Thanks, Dr. Schecter! I was mistakenly adding the covariates using option COV( ) as in the diff command. I'll post my next steps for others (and questions I'm sure): sensitivity analysis and fixed trend assumption test.

          Comment


          • #6
            Dear all,

            I am analysing the impacts of dung beetles (treatment) on livestock productivity (outcome) using Difference-in-Differences. I have panel data from 1960 to 1980, and my geographical units are Local Government Areas (LGAs). My sample size is 94 LGAs. I have five treatments (five dung beetle species) with presence/absence and abundance (treatment intensity). However, each species was introduced at a different year into the LGAs, spreading over time. So I have multiple time periods, e.g. in 1974, species 1 was present in an LGA, then in 1978, species 2 arrived into the same LGA. So while some LGAs might have the five species at some point in time, others will only have one species or none.

            The problems/questions are:
            1. How can I combine dummy with treatment intensity (abundance)?
            Currently, my code is very simple:

            xtdidregress(livestock_productivity) (dummy_general), group(lga_id) time (year)

            xtdidregress(livestock_productivity) (abundance, continuous), group(lga_id) time (year)


            2. What is the best way to deal with the parallel trend assumption when there are multiple periods and multiple treatments? Could you please help me with the code?

            Many thanks in advance,

            Here is a sample of my data.

            Click image for larger version

Name:	sample of data.png
Views:	2
Size:	50.4 KB
ID:	1631490

            Comment

            Working...
            X