Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding across regional units

    Hi

    I have a dataset with suicides for several regions in a country over many years.

    Some regions are intervention regions, while remaining are control.

    I would like to combine all suicides across intervention regions and similarly add all suicides across control regions for a dummy variable with suicides in intervention and control regions. The variable also needs to account for time as I plan to use the variable to examine the common trend assumption in a DD-design. I suspect the contruction of this variable is pretty straightforward, but I've not found a solution with either -gen- or -egen-.

    I've tried,

    Code:
    gen suicide_intervention=suicide if region==1+suicide if region==2+suicide if region==3
    But this does not solve the problem.
    Last edited by Tarjei W. Havneraas; 21 Aug 2018, 03:58.

  • #2
    This will fail as illegal as the condition

    Code:
     if region==1+suicide if region==2+suicide if region==3
    is quite illegal: for a statement you can only have one if qualifier in a statement like this and you can't mix arguments and qualifiers like that. This may be closer to what you want:
    Code:
    egen wanted123 = total(cond(inlist(region, 1, 2, 3), suicide, 0)), by(year) 
    See https://www.stata-journal.com/sjpdf....iclenum=dm0055 especially Sections 9 and 10 and if not sated search for mentions of dm0055 in this forum.

    A data example would have made your question clearer.
    Last edited by Nick Cox; 21 Aug 2018, 04:26.

    Comment


    • #3
      Thanks for your reply, Nick Cox. The code gave me what I was looking for. However, I now see that the graph I need requires different coding. I want to examine pre and post intervention trends in suicide rate in the intervention regions and the control regions.

      So, I want a graph with suicide rate as y-axis, time as x-axis, divided into treatment and control regions.

      I obtain a graph for all regions with:

      Code:
      twoway (line suiciderate bymonth), by(region)
      Click image for larger version

Name:	Graph_suicidebyregion.png
Views:	1
Size:	65.4 KB
ID:	1459043


      However, when I code regions into intervention and control regions to make a new graph to show trends by grouped intervention and control regions, the graph does not make much sense:

      Code:
      gen treatment_region=.
      replace treatment_region=1 if region==2|region==4|region==5|region==6
      replace treatment_region=0 if region==1|region==3
      
      twoway (line suiciderate bymonth), by(treatment_region)
      Click image for larger version

Name:	Graph_suicide by treatment status.png
Views:	1
Size:	120.4 KB
ID:	1459044


      Here is some additional information about data:

      Code:
      . sum suiciderate bymonth region treatment_region
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
       suiciderate |        432    .7699556    .3963652          0   2.486646
           bymonth |        432        35.5     20.8067          0         71
            region |        432         3.5    1.709805          1          6
      treatment_~n |        432    .6666667    .4719511          0          1
      
      . xtdescribe
      
        region:  1, 2, ..., 6                                      n =          6
       bymonth:  0, 1, ..., 71                                     T =         72
                 Delta(bymonth) = 1 unit
                 Span(bymonth)  = 72 periods
                 (region*bymonth uniquely identifies each observation)
      
      Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                              72      72      72        72        72      72      72
      
           Freq.  Percent    Cum. |  Pattern
       ---------------------------+--------------------------------------------------------------------------
              6    100.00  100.00 |  111111111111111111111111111111111111111111111111111111111111111111111111
       ---------------------------+--------------------------------------------------------------------------
              6    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

      Comment


      • #4
        I'd rewrite your last block of code

        Code:
        gen treatment_region=.
        replace treatment_region=1 if region==2|region==4|region==5|region==6
        replace treatment_region=0 if region==1|region==3  
        twoway (line suiciderate bymonth), by(treatment_region)
        as

        Code:
        gen treatment_region = inlist(region, 1, 3)
        
        label def treatment_region 1 "1, 3" 0 "2, 4, 5, 6"
        label val treatment_region treatment_region
        
        twoway line suiciderate bymonth, by(treatment_region) sort
        where sort is a one-word answer to your question.

        That said, I would

        * use a Stata monthly date and plot in those terms

        * look for seasonality as well as trend

        * use real region names unless there is an ethical reason otherwise.

        Comment


        • #5
          In fact there is another difficulty. You have aggregated your regions to 2 but done nothing to aggregate the response variable. If you want to superimpose the 6 different regions in each graph, 2 in one graph, 4 in the other you need another command. If you want to combine suicide rates somehow you need to calculate that first. As these are rates that would need to make use of the underlying populations.

          Comment


          • #6
            Thank you, this was really helpful and the code solved the problem. I'll make sure to use a Stata month variable and I anonymized region names for ethics. In the DiD-analyses I adjust for time effects with a dummy for month, but I'll look into seasonal variables as I know seasonal effects may be relevant for suicide rates.

            The other difficulty you point out is very relevant. My response variable uses monthly regional suicide data and yearly regional population data, as the most detailed level at population level was yearly. The suicide rate is calculated the following way (with 2012 just as example):
            Code:
            // generate regional population
            
            gen regpop=.
            * 2012
            replace regpop=844511 if region==1 & year==2012
            replace regpop=957460 if region==2 & year==2012
            replace regpop=823469 if region==3 & year==2012
            replace regpop=1462348 if region==4 & year==2012
            replace regpop=1067981 if region==5 & year==2012
            replace regpop=2128783 if region==6 & year==2012
            
            ...
            
            // generate suicide rate per 100 000
            
            // suicide proportion
            gen deadprop=dead/regpop
            
            // suicide rate per 100 000
            gen suiciderate=deadprop*100000
            I want to combine suicide rates for intervention and control regions and to do that I guess I can adjust the above mentioned coding for regional population like this:
            Code:
            gen intervention_pop=.
            * 2012
            replace intervention_pop=1667980 if region==1 & region==3 & year==2012
            replace intervention_pop=5616571 if region==2 & region==4 & region==5 & region==6 & year==2012
            One last thing: In the main DiD-analyses, the response variable is the suicide rate that is based on regional monthly data, but the treatment assignment variable is aggregated to intervention and control regions. It seems like the aggregation problem you mention will also apply to the DiD-analyses, and I would highly appreciate your input on this?

            Comment


            • #7
              Use egen to get numerators and denominators as totals over groups, then divide. You don't have to write very much custom code at all.

              Comment


              • #8
                Ok, thank you! As for the last question I mentioned: Do you think the aggregation problem applies to DiD-analyses as well (or regression in general)?

                Comment


                • #9
                  Sorry, never done any DiD work. I even have to strain mightily to know what that means.

                  See https://www.statalist.org/forums/for...tline-by-dummy for the Third Law of Statalist.

                  Comment


                  • #10
                    Ok, but thank you for your other inputs which have been very helpful. I'll keep the Third Law of Statalist in mind for the future.

                    Comment


                    • #11
                      I'd start a new thread mentioning difference in difference or whatever it is in the title. Cross-refer to this thread (and vice versa) but ask a new question.

                      Comment

                      Working...
                      X