Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing Parallel Trend Assumption

    I'm working on a Difference-in-Difference model. Unfortunately, I have very little experience with these models. I've visually inspected the data as a test for the parallel trend assumption, I believe it holds. However, as this is an analysis that will be peer-reviewed, I'm sure they will require an empirical test.

    The visual inspection was done by plotting the means and fitted values for the entire time period 2010-2017, before the intervention (<2014), and after the intervention (>=2014).

    I have individual level data for >30,000 subjects. I'm running unadjusted and adjusted DID models utilizing the "diff" command.

    Code:
    by YEAR_OF_DIAGNOSIS expand, sort: egen pc_uninsured_expand = mean(100*uninsured)
        
    *Fitted
    twoway (lfit pc_uninsured_expand YEAR_OF_DIAGNOSIS if expand==1) ///
        (scatter pc_uninsured_expand YEAR_OF_DIAGNOSIS if expand==1) ///
        (lfit pc_uninsured_expand YEAR_OF_DIAGNOSIS if expand==0) ///
        (scatter pc_uninsured_expand YEAR_OF_DIAGNOSIS if expand==0), ///
        ylabel (0(5)20) ytitle(Uninsured (%)) ///
        legend(label(1 "Expansion - Fitted") label(2 "Expansion") label(3 "Nonexpansion fitted") label(4 "Nonexpansion")) xlabel(#8)

    1. Plotting of the data over entire time period.
    Click image for larger version

Name:	all.jpg
Views:	1
Size:	97.9 KB
ID:	1581035



    2. Plot of data before implementation of "treatment"
    Click image for larger version

Name:	before.jpg
Views:	1
Size:	43.2 KB
ID:	1581036



    3. Plot of data after implementation of "treatment"

    Click image for larger version

Name:	after.jpg
Views:	1
Size:	46.1 KB
ID:	1581037


    The differences in differences model that I will be running is the following; where the outcome is "uninsured" (1=uninsured, 0=insured), "expand" is my treatment variable (treated=1, not treated=0), and "exp_year" is the grouping variable for before/after 2014 (before=0, after=1).

    Code:
    diff uninsured, t(expand) p(exp_year) 
    
    diff uninsured, t(expand) p(exp_year) cov(AGE SEX race_cat hispanic_cat NO_HSD_QUAR_16 MED_INC_QUAR_16)

    Click image for larger version

Name:	Screen Shot 2020-11-09 at 19.07.15.png
Views:	1
Size:	329.5 KB
ID:	1581038



    Click image for larger version

Name:	Screen Shot 2020-11-09 at 19.08.07.png
Views:	1
Size:	128.2 KB
ID:	1581039




    I'm looking for a simple way of "proving" that the parallel trend assumption holds.

    Thanks!

  • #2
    Roberto Vidri I would never use the term "prove" in the social sciences, but I think you could provide some evidence that the pre-intervention trends do not differ across groups by pooling the data and conducting a mixed effects model, with a binary treatment indicator predicting variation in the outcome's pre-intervention slope. If the coefficient is not significant, you have some evidence that any difference between slopes is not statistically significant.

    Code:
    mixed uninsured expand##i.year || id: expand if year < 2014, vce(robust) reml
    Might code might be slightly off. It's been awhile since I've run one
    Last edited by Tom Scott; 09 Nov 2020, 19:08.

    Comment


    • #3
      Originally posted by Tom Scott View Post
      Roberto Vidri I would never use the term "prove" in the social sciences, but I think you could provide some evidence that the pre-intervention trends do not differ across groups by pooling the data and conducting a mixed effects model, with a binary treatment indicator predicting variation in the outcome's pre-intervention slope. If the coefficient is not significant, you have some evidence that any difference between slopes is not statistically significant.

      Code:
      mixed uninsured expand##i.year || id: expand if year < 2014, vce(robust) reml
      Might code might be slightly off. It's been awhile since I've run one
      Tom, Thanks for your thoughts!

      Comment


      • #4
        Dear Tom Scott,can you please tell me why did you suggest to use "mixed" instead of probit or logit in this case?
        Code:

        mixed uninsured expand##i.year || id: expand if year < 2014, vce(robust) reml
        Thank you.

        Comment


        • #5
          Marry Lee I believe because it was a multilevel model with repeated observations nested within individuals

          Comment


          • #6
            Tom Scott thank you for your quick answer. I exactly don't understand why is it a multilevel model. Shouldn't it be multilevel when there is more than a level of observation (for example individual, school, city)?

            Comment


            • #7
              Marry Lee multilevel is when any unit is nested within another. So the classic example is students within classrooms within schools. But if you collect the same data on an individual at monthly intervals for 2 years, then you have 24 monthly observations nested within that individual. If you do that for 1000 individuals across 50 cities, then you have 24,000 (24x1000) observations nested within 1000 individuals nested within 50 cities. Then you can look at things like how much variation in an outcome is within individual compared to between individual, or whether the relationship between individual characteristics (e.g., race) and your individual level outcome (e.g., desire to run for political office) depends on city characteristics (e.g., population size).

              Comment

              Working...
              X