Adding across regional units

Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#1

Adding across regional units

21 Aug 2018, 03:56

Hi

I have a dataset with suicides for several regions in a country over many years.

Some regions are intervention regions, while remaining are control.

I would like to combine all suicides across intervention regions and similarly add all suicides across control regions for a dummy variable with suicides in intervention and control regions. The variable also needs to account for time as I plan to use the variable to examine the common trend assumption in a DD-design. I suspect the contruction of this variable is pretty straightforward, but I've not found a solution with either -gen- or -egen-.

I've tried,

Code:

gen suicide_intervention=suicide if region==1+suicide if region==2+suicide if region==3

But this does not solve the problem.

Last edited by Tarjei W. Havneraas; 21 Aug 2018, 03:58.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35436
#2

21 Aug 2018, 04:13

This will fail as illegal as the condition

Code:

if region==1+suicide if region==2+suicide if region==3

is quite illegal: for a statement you can only have one if qualifier in a statement like this and you can't mix arguments and qualifiers like that. This may be closer to what you want:

Code:

egen wanted123 = total(cond(inlist(region, 1, 2, 3), suicide, 0)), by(year)

See https://www.stata-journal.com/sjpdf....iclenum=dm0055 especially Sections 9 and 10 and if not sated search for mentions of dm0055 in this forum.

A data example would have made your question clearer.

Last edited by Nick Cox; 21 Aug 2018, 04:26.
1 like
Comment

Tarjei W. Havneraas

Join Date: Nov 2016
Posts: 136

21 Aug 2018, 08:33

Thanks for your reply, Nick Cox. The code gave me what I was looking for. However, I now see that the graph I need requires different coding. I want to examine pre and post intervention trends in suicide rate in the intervention regions and the control regions.

So, I want a graph with suicide rate as y-axis, time as x-axis, divided into treatment and control regions.

I obtain a graph for all regions with:

Code:

twoway (line suiciderate bymonth), by(region)

Click image for larger version

Name: Graph_suicidebyregion.png
Views: 1
Size: 65.4 KB
ID: 1459043

However, when I code regions into intervention and control regions to make a new graph to show trends by grouped intervention and control regions, the graph does not make much sense:

Code:

gen treatment_region=.
replace treatment_region=1 if region==2|region==4|region==5|region==6
replace treatment_region=0 if region==1|region==3

twoway (line suiciderate bymonth), by(treatment_region)

Click image for larger version

Name: Graph_suicide by treatment status.png
Views: 1
Size: 120.4 KB
ID: 1459044

Here is some additional information about data:

Code:

. sum suiciderate bymonth region treatment_region

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 suiciderate |        432    .7699556    .3963652          0   2.486646
     bymonth |        432        35.5     20.8067          0         71
      region |        432         3.5    1.709805          1          6
treatment_~n |        432    .6666667    .4719511          0          1

. xtdescribe

  region:  1, 2, ..., 6                                      n =          6
 bymonth:  0, 1, ..., 71                                     T =         72
           Delta(bymonth) = 1 unit
           Span(bymonth)  = 72 periods
           (region*bymonth uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                        72      72      72        72        72      72      72

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+--------------------------------------------------------------------------
        6    100.00  100.00 |  111111111111111111111111111111111111111111111111111111111111111111111111
 ---------------------------+--------------------------------------------------------------------------
        6    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35436

21 Aug 2018, 08:42

I'd rewrite your last block of code

Code:

gen treatment_region=.
replace treatment_region=1 if region==2|region==4|region==5|region==6
replace treatment_region=0 if region==1|region==3  
twoway (line suiciderate bymonth), by(treatment_region)

Code:

gen treatment_region = inlist(region, 1, 3)

label def treatment_region 1 "1, 3" 0 "2, 4, 5, 6"
label val treatment_region treatment_region

twoway line suiciderate bymonth, by(treatment_region) sort

where sort is a one-word answer to your question.

That said, I would

* use a Stata monthly date and plot in those terms

* look for seasonality as well as trend

* use real region names unless there is an ethical reason otherwise.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35436
#5

21 Aug 2018, 10:57

In fact there is another difficulty. You have aggregated your regions to 2 but done nothing to aggregate the response variable. If you want to superimpose the 6 different regions in each graph, 2 in one graph, 4 in the other you need another command. If you want to combine suicide rates somehow you need to calculate that first. As these are rates that would need to make use of the underlying populations.
1 like
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#6

21 Aug 2018, 15:10

Thank you, this was really helpful and the code solved the problem. I'll make sure to use a Stata month variable and I anonymized region names for ethics. In the DiD-analyses I adjust for time effects with a dummy for month, but I'll look into seasonal variables as I know seasonal effects may be relevant for suicide rates.

The other difficulty you point out is very relevant. My response variable uses monthly regional suicide data and yearly regional population data, as the most detailed level at population level was yearly. The suicide rate is calculated the following way (with 2012 just as example):

Code:

// generate regional population gen regpop=. * 2012 replace regpop=844511 if region==1 & year==2012 replace regpop=957460 if region==2 & year==2012 replace regpop=823469 if region==3 & year==2012 replace regpop=1462348 if region==4 & year==2012 replace regpop=1067981 if region==5 & year==2012 replace regpop=2128783 if region==6 & year==2012 ... // generate suicide rate per 100 000 // suicide proportion gen deadprop=dead/regpop // suicide rate per 100 000 gen suiciderate=deadprop*100000

I want to combine suicide rates for intervention and control regions and to do that I guess I can adjust the above mentioned coding for regional population like this:

Code:

gen intervention_pop=. * 2012 replace intervention_pop=1667980 if region==1 & region==3 & year==2012 replace intervention_pop=5616571 if region==2 & region==4 & region==5 & region==6 & year==2012

One last thing: In the main DiD-analyses, the response variable is the suicide rate that is based on regional monthly data, but the treatment assignment variable is aggregated to intervention and control regions. It seems like the aggregation problem you mention will also apply to the DiD-analyses, and I would highly appreciate your input on this?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#7

21 Aug 2018, 15:41

Use egen to get numerators and denominators as totals over groups, then divide. You don't have to write very much custom code at all.
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#8

21 Aug 2018, 15:55

Ok, thank you! As for the last question I mentioned: Do you think the aggregation problem applies to DiD-analyses as well (or regression in general)?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#9

22 Aug 2018, 00:57

Sorry, never done any DiD work. I even have to strain mightily to know what that means.

See https://www.statalist.org/forums/for...tline-by-dummy for the Third Law of Statalist.
1 like
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#10

22 Aug 2018, 03:21

Ok, but thank you for your other inputs which have been very helpful. I'll keep the Third Law of Statalist in mind for the future.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#11

22 Aug 2018, 03:28

I'd start a new thread mentioning difference in difference or whatever it is in the title. Cross-refer to this thread (and vice versa) but ask a new question.
1 like
Comment

Announcement