Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a difference-in-difference graph for common trend assumption

    Hi everyone

    I have conducted a DiD-analysis and want to plot trend graphs of the dependent variable for the intervention and control group. The main point is to visualize the trends to assess the common trend assumption (also know as parallell paths assumption). Briefly,

    The key assumption here is what is known as the “Parallel Paths” assumption, which posits that the average change in the comparison group represents the counterfactual change in the treatment group if there were no treatment
    It is commonly visualized with a graph like this:

    Click image for larger version

Name:	7go66 (1).png
Views:	1
Size:	21.9 KB
ID:	1459124


    Where the dotted line is not necessary and only included in the picture to illustrate the trend for units receiving treatment if they had not received treatment.

    However, I am unsure how to get a correct graph in Stata and I wonder if anyone can give me any leads on this?

    The dependent variable is monthly regional suicide rate and the treatment status variable is aggregated regions where the intervention group consist of four regions and the control group consist of two regions. Here is some info about my data set:

    Code:
    . xtdescribe
    
      region:  1, 2, ..., 6                                      n =          6
     bymonth:  0, 1, ..., 71                                     T =         72
               Delta(bymonth) = 1 unit
               Span(bymonth)  = 72 periods
               (region*bymonth uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                            72      72      72        72        72      72      72
    
         Freq.  Percent    Cum. |  Pattern
     ---------------------------+--------------------------------------------------------------------------
            6    100.00  100.00 |  111111111111111111111111111111111111111111111111111111111111111111111111
     ---------------------------+--------------------------------------------------------------------------
            6    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    
    
    . list suiciderate bymonth eventdate region treated_region in 1/10
    
         +----------------------------------------------------------------------+
         | suicid~e   bymonth       eventdate                 region   treate~n |
         |----------------------------------------------------------------------|
      1. | .3552352         0     January2012   North central region    Control |
      2. | .4736469         1    February2012   North central region    Control |
      3. | 2.012999         2       March2012   North central region    Control |
      4. | 2.486646         3       April2012   North central region    Control |
      5. | 1.420941         4         May2012   North central region    Control |
         |----------------------------------------------------------------------|
      6. | 1.302529         5        June2012   North central region    Control |
      7. | 1.657764         6        July2012   North central region    Control |
      8. | 2.012999         7      August2012   North central region    Control |
      9. | 1.420941         8   September2012   North central region    Control |
     10. | .9472938         9     October2012   North central region    Control |
         +----------------------------------------------------------------------+

    And I will also cross-refer to a related thread: https://www.statalist.org/forums/for...regional-units


  • #2
    So you would first reduce your data set to one observation per month for the control group and one for the intervention group, containing an indicator for which group (say 1 for intervention, 0 for control), the month, and the "average" suicide rate for the group in that month. You might want to make that a weighted average, weighted by population or something like that. Anyway, probably the -collapse- command will enable you to do that. Then you want to -reshape wide suicide_rate, i(month) j(group)- and then -graph twoway line suicide_rate* month, sort-.

    Comment


    • #3
      Thanks for your reply. I reduced my data set and tried reshaping to wide by:
      Code:
      collapse (mean) suiciderate, by (intervention region bymonth)
      reshape wide suiciderate, i(bymonth) j(intervention)
      However, it does not seem like the reshape command is correct (and I've tried some other alternatives without getting it right). The error message says:
      Code:
      reshape wide suiciderate, i(bymonth) j(intervention)
      (note: j = 0 1)
      values of variable intervention not unique within bymonth
      Here is a look at my original data:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(suiciderate attemptrate intervention) long region float(bymonth eventdate)
       .3552352  3.552352 0 1 0 624
       .4736469 2.2498226 0 1 1 625
       2.012999 3.1971166 0 1 2 626
       2.486646  3.552352 0 1 3 627
      1.4209406  2.605058 0 1 4 628
       1.302529  3.315528 0 1 5 629
       1.657764 2.2498226 0 1 6 630
       2.012999 3.0787046 0 1 7 631
      1.4209406  4.025998 0 1 8 632
       .9472938 2.7234695 0 1 9 633
      end
      format %tm bymonth
      format %tmMCY eventdate
      label values intervention intervention
      label def intervention 0 "Control", modify
      label values region region
      label def region 1 "North central region", modify
      And a random sample that includes all regions:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(suiciderate attemptrate intervention) long region float(bymonth eventdate)
        1.19644  2.632168 0 1 16 640
       .9571519   2.39288 0 1 21 645
      1.9621284  2.697927 0 1 39 663
       .8805054 2.0125837 0 1 61 685
       1.253316  3.759948 1 2  2 626
       .8517325  2.981064 1 2 53 677
       .5102633  5.485331 0 3 45 669
      1.8519597  3.571637 0 3 66 690
       .8205981  2.051495 1 4  2 626
       .6224772 2.2132525 1 4 29 653
       .7055012  3.033655 1 4 68 692
       .4779543  .4779543 1 5 59 683
      .23527063 2.1174357 1 6 27 651
       .6587578  3.058518 1 6 33 657
       .3300042 3.6300466 1 6 37 661
      end
      format %tm bymonth
      format %tmMCY eventdate
      label values intervention intervention
      label def intervention 0 "Control", modify
      label def intervention 1 "Intervention", modify
      label values region region
      label def region 1 "North central region", modify
      label def region 2 "North east region", modify
      label def region 3 "North west region", modify
      label def region 4 "South central region", modify
      label def region 5 "South east region", modify
      label def region 6 "South west region", modify

      Comment


      • #4
        Please re-read what I wrote in #2. You need a single observation per month for the control group and for the intervention group. So your -collapse- command must not retain the region variable.

        Comment


        • #5
          Thank you for clearing this up and sorry for misreading #2. I retried without region and the results gave sense now with both graphs for year and month ("bymonth"):

          Code:
          . collapse (mean) suiciderate, by (intervention year)
          
          . reshape wide suiciderate, i(year) j(intervention)
          (note: j = 0 1)
          
          Data                               long   ->   wide
          -----------------------------------------------------------------------------
          Number of obs.                       12   ->       6
          Number of variables                   3   ->       3
          j variable (2 values)      intervention   ->   (dropped)
          xij variables:
                                      suiciderate   ->   suiciderate0 suiciderate1
          -----------------------------------------------------------------------------
          
          . graph twoway line suiciderate0 suiciderate1 year, sort

          Comment


          • #6
            Dear,
            I have DID data sets dummy time,dummy groups,interaction ...yet I dont know How I can command DID graph on stat?
            Attached Files

            Comment


            • #7
              Hi Nathan

              I am not sure I understand you correctly. What do you mean by "command DID graph on stat"?

              Do you want a graph like the one in #1 of the thread or something else? If you only have two treatment groups and t > 2 with some intervention, a starting point would be a trend graph of the outcome by treatment group over time. You can then make an assessment of the parallell trend assumption by examining the difference in outcome trends before and after the intervention.

              Comment


              • #8
                Dear, I need the graph like Number 1,I have 2 groups & 2 time period(dummy)

                Comment


                • #9
                  Ok, I think following the code in #5 should do the trick then. Just exchange "suiciderate" with your outcome variable, "intervention" with your treatment variable and "year" with your time variable.

                  Note that collapse reduces your data set to only the included variables in the collapse command so save your data set before proceeding. Alternatively, I know there is a way to restore to the original data set w/o reloading data set after collapse, but I don't remember the code right now.

                  Comment


                  • #10
                    Dear,
                    Thank you very much, it is very helpful. I have some concerns on Difference in Difference Impact Evaluation Method
                    1-What are other options to assess impact when randomization fails or during natural experiments???
                    2-Does DID is descriptive or analytic statistics?
                    3-What are the strength and the weakness of DID?
                    4-What are assumptions of DID rather than parallel trend assumption and how we can check them?
                    5-What we are going to do If our data sets fails to fulfill DID assumption?
                    Thanks in advance!!!

                    Comment


                    • #11
                      Hi Nathan

                      I would go to the literature to answer these questions. Angrist & Pishcke Mostly Harmless Econometrics and/or Mastering Metrics (the latter is to a large extent a lighter version of the former) and this more specific DiD intro article by Wing et al. (2018) should have the answers to most of your questions.

                      Comment

                      Working...
                      X