Counting occurances of dummy's in specific sub-groups to create count variable for a different group - Panel Data

Mike McRae

Join Date: Nov 2020
Posts: 5

Counting occurances of dummy's in specific sub-groups to create count variable for a different group - Panel Data

02 Jun 2021, 09:48

Stata 14.0 SE on Mac - Data contais 90,000 obervations

Hi. I am using panel data on US counties (and States) and Medical Marijuana Laws. I am trying to create a count variable which increases by 1 in one state, when a dummy variable is activated in any of the bordering states. This would mean that the count variable would increase by 1 any time a neighbouring state dummy is activated, and decreases by one any time the neighbour dummy is deactivated.

Below, s_id is the stae indicator - with multiple counties per state (although the example below only shows one county per state); fips is the unique county code; c_id is the county within state code, MML is an indicator = 1 if state i has introduced a medical marijuana law in year t.

The Neighbour_MML variable is what I am trying to create. If we assume that s_id 1 is neighboured by s_id's 3 and 4; s_id 2 has no nieghbours; s_id 3 neighbours s_id 4; s_id 4 neighbour s_id's 3 and 5; and s_id 5 only neighbour s_id 4:

Dependent	Neighbour
1	3
1	4
3	1
4	1
4	5
5	4

With this neighbour pairing combination the Neighbour_MML count variable would look as it does below:

year	fips	c_id	s_id	MML	Neighbour_MML
1990	1001	1	1	0	0
1991	1001	1	1	0	0
1992	1001	1	1	1	1
1993	1001	1	1	1	2
1994	1001	1	1	1	2
1990	2001	1	2	0	0
1991	2001	1	2	0	0
1992	2001	1	2	0	0
1993	2001	1	2	1	0
1994	2001	1	2	1	0
1990	3001	1	3	0	0
1991	3001	1	3	0	0
1992	3001	1	3	0	1
1993	3001	1	3	1	1
1994	3001	1	3	1	1
1990	4001	1	4	0	0
1991	4001	1	4	0	0
1992	4001	1	4	1	0
1993	4001	1	4	1	1
1994	4001	1	4	1	2
1990	5001	1	5	0	0
1991	5001	1	5	0	0
1992	5001	1	5	0	1
1993	5001	1	5	0	1
1994	5001	1	5	1	1

I have all the state pairing but need some assistance in understaing how I can generate the Neighbour_MML count variable for all states based on all pairings?

Any help greatly appreciated,
Thanks,
Mike

Last edited by Mike McRae; 02 Jun 2021, 09:54.

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10195

02 Jun 2021, 14:41

Your description and table do not match. Here is one way:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(dependent neighbour)
1 3
1 4
3 4
4 3
4 5
5 4
end
rename neighbour s_id
tempfile neighbors
save `neighbors'
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year fips) byte(c_id s_id mml neighbour_mml)
1990 1001 1 1 0 0
1991 1001 1 1 0 0
1992 1001 1 1 1 1
1993 1001 1 1 1 2
1994 1001 1 1 1 2
1990 2001 1 2 0 0
1991 2001 1 2 0 0
1992 2001 1 2 0 0
1993 2001 1 2 1 0
1994 2001 1 2 1 0
1990 3001 1 3 0 0
1991 3001 1 3 0 0
1992 3001 1 3 0 1
1993 3001 1 3 1 1
1994 3001 1 3 1 1
1990 4001 1 4 0 0
1991 4001 1 4 0 0
1992 4001 1 4 1 0
1993 4001 1 4 1 1
1994 4001 1 4 1 2
1990 5001 1 5 0 0
1991 5001 1 5 0 0
1992 5001 1 5 0 1
1993 5001 1 5 0 1
1994 5001 1 5 1 1
end
preserve
keep year s_id mml
joinby s_id using `neighbors'
collapse (sum) mml, by(year dependent)
rename (mml dependent) (wanted s_id)
save `neighbors', replace
restore
merge 1:1 year s_id using `neighbors', nogen
replace wanted=0 if missing(wanted)
assert wanted== neighbour_mml

Res.:

Code:

. sort s_id year

. l, sepby(s_id)

     +-----------------------------------------------------+
     | year   fips   c_id   s_id   mml   neighb~l   wanted |
     |-----------------------------------------------------|
  1. | 1990   1001      1      1     0          0        0 |
  2. | 1991   1001      1      1     0          0        0 |
  3. | 1992   1001      1      1     1          1        1 |
  4. | 1993   1001      1      1     1          2        2 |
  5. | 1994   1001      1      1     1          2        2 |
     |-----------------------------------------------------|
  6. | 1990   2001      1      2     0          0        0 |
  7. | 1991   2001      1      2     0          0        0 |
  8. | 1992   2001      1      2     0          0        0 |
  9. | 1993   2001      1      2     1          0        0 |
 10. | 1994   2001      1      2     1          0        0 |
     |-----------------------------------------------------|
 11. | 1990   3001      1      3     0          0        0 |
 12. | 1991   3001      1      3     0          0        0 |
 13. | 1992   3001      1      3     0          1        1 |
 14. | 1993   3001      1      3     1          1        1 |
 15. | 1994   3001      1      3     1          1        1 |
     |-----------------------------------------------------|
 16. | 1990   4001      1      4     0          0        0 |
 17. | 1991   4001      1      4     0          0        0 |
 18. | 1992   4001      1      4     1          0        0 |
 19. | 1993   4001      1      4     1          1        1 |
 20. | 1994   4001      1      4     1          2        2 |
     |-----------------------------------------------------|
 21. | 1990   5001      1      5     0          0        0 |
 22. | 1991   5001      1      5     0          0        0 |
 23. | 1992   5001      1      5     0          1        1 |
 24. | 1993   5001      1      5     0          1        1 |
 25. | 1994   5001      1      5     1          1        1 |
     +-----------------------------------------------------+

.

Comment

Mike McRae

Join Date: Nov 2020

Posts: 5
#3

03 Jun 2021, 04:38

Thanks for pointing out the error in the description.

This worked perfectly, although I should mention that since there were multiple counties per s_id, I collapsed mml by year and s_id first, and then merged many to one once the new variable was created.

code:
clear
input byte(dependent neighbour)
1 3
1 4
3 4
4 3
4 5
5 4
end
rename neighbour s_id
tempfile neighbors
save `neighbors'
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year fips) byte(c_id s_id mml)
1990 1001 1 1 0
1991 1001 1 1 0
1992 1001 1 1 1
1993 1001 1 1 1
1994 1001 1 1 1
1990 1002 1 1 0
1991 1002 1 1 0
1992 1002 1 1 1
1993 1002 1 1 1
1994 1002 1 1 1
1990 2001 1 2 0
1991 2001 1 2 0
1992 2001 1 2 0
1993 2001 1 2 1
1994 2001 1 2 1
1990 2001 1 2 0
1991 2001 1 2 0
1992 2001 1 2 0
1993 2001 1 2 1
1994 2001 1 2 1
1990 3001 1 3 0
1991 3001 1 3 0
1992 3001 1 3 0
1993 3001 1 3 1
1994 3001 1 3 1
1990 4001 1 4 0
1991 4001 1 4 0
1992 4001 1 4 1
1993 4001 1 4 1
1994 4001 1 4 1
1990 5001 1 5 0
1991 5001 1 5 0
1992 5001 1 5 0
1993 5001 1 5 0
1994 5001 1 5 1
end
preserve
collapse mml, by(year s_id)
joinby s_id using `neighbors'
collapse (sum) mml, by(year dependent)
rename (mml dependent) (wanted s_id)
save `neighbors', replace
restore
merge m:1 year s_id using `neighbors', nogen
replace wanted=0 if missing(wanted)

Just a quick question, what does saving a file using the `filename' code do differently do saving the file to a standard filepath, i.e. save neighbours.dta?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#4

03 Jun 2021, 06:18

Just a quick question, what does saving a file using the `filename' code do differently do saving the file to a standard filepath, i.e. save neighbours.dta?

It creates a temporary file holding the dataset, which disappears once you end your Stata session. As these are intermediate datasets that you use in the process of creating your final dataset, there is no point of storing them in your computer. If you do a bit of programming, you will realize that you have a lot of unnecessary files littering your computer if you do not get into the habit of using temporary files. Better still, frames (Stata 16+) to a large extent replace temporary files, so you may need to familiarize yourself with them too. For more, see

Code:

help tempfile help frames
Comment

Announcement

Counting occurances of dummy's in specific sub-groups to create count variable for a different group - Panel Data

Comment

Comment

Comment