Two questions about handling Data

Rafael Acevedo

Join Date: Apr 2019
Posts: 17

Two questions about handling Data

19 Jun 2024, 07:45

Hello Statalist,

I need some help, it is probably something simple and basic but I cannot see a way to do it quickly without devoting a lot of time to manually picking and doing this...

1) I'm working with a panel dataset that looks similar to the following example:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte quarter str2 state str1 county float inf
1991 1 "AL" "A"  .3
1991 1 "AL" "B"  .3
1991 1 "AK" "A" .17
1991 1 "AK" "B" .17
1991 2 "AL" "A"  .8
1991 2 "AL" "B"  .8
1991 2 "AK" "A"  .2
1991 2 "AK" "B"  .2
end

I need to generate a quarterly subsample by year and state. As you can see all counties have the same value of variable "inf" for the same year and quarter, then I need to keep only 1 observation by year, quarter, and state. My final subsample would look like the following:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte quarter str2 state float inf
1991 1 "AL"  .3
1991 1 "AK" .17
1991 2 "AL"  .8
1991 2 "AK"  .2
end

2) I have an annual panel dataset per state (let's call it Dataset "Velocity"):

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year str2 state float vel
1991 "AL"   1
1991 "AK" 1.2
1991 "NE" 1.8
1992 "AL" 1.4
1992 "AK" 1.1
1992 "NE" 1.5
end

My working dataset is quarterly and by state and county (like the first example in question (1) of this post). I need to create the variable "vel" in my working dataset. The value of "vel" in the working dataset would be the same for all counties and quarters as the "vel" value in the dataset "Velocity" for that respective State and year. The additional problem here is that there are some counties, in this example county "C" (State XI), that the value of "vel" should be the average of the value of "vel" of some states in that respective year, for that reason, in this example, the value of "vel" 1991 (all quarters) of State "XI" is the average of the value of "vel" in 1991 for states "NE" and "AK". There are more of this case, for example, it could be a state "XA" as an average of states "AL and NE", etc... the final dataset that I need would look like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte quarter str2 state str1 county byte othervariables float vel
1991 1 "AL" "A" .   1
1991 1 "AL" "B" .   1
1991 1 "AK" "A" . 1.2
1991 1 "AK" "B" . 1.2
1991 1 "NE" "A" . 1.8
1991 1 "NE" "B" . 1.8
1991 1 "XI" "C" . 1.3
1991 2 "AL" "A" .   1
1991 2 "AL" "B" .   1
1991 2 "AK" "A" . 1.2
1991 2 "AK" "B" . 1.2
1991 2 "NE" "A" . 1.8
1991 2 "NE" "B" . 1.8
1991 2 "XI" "C" . 1.3
1991 3 "AL" "A" .   1
1991 3 "AL" "B" .   1
1991 3 "AK" "A" . 1.2
1991 3 "AK" "B" . 1.2
1991 3 "NE" "A" . 1.8
1991 3 "NE" "B" . 1.8
1991 3 "XI" "C" . 1.3
1991 4 "AL" "A" .   1
1991 4 "AL" "B" .   1
1991 4 "AK" "A" . 1.2
1991 4 "AK" "B" . 1.2
1991 4 "NE" "A" . 1.8
1991 4 "NE" "B" . 1.8
1991 4 "XI" "C" . 1.3
1992 1 "AL" "A" . 1.4
1992 1 "AL" "B" . 1.4
1992 1 "AK" "A" . 1.1
1992 1 "AK" "B" . 1.1
1992 1 "NE" "A" . 1.5
1992 1 "NE" "B" . 1.5
1992 1 "XI" "C" . 1.3
1992 2 "AL" "A" . 1.4
1992 2 "AL" "B" . 1.4
1992 2 "AK" "A" . 1.1
1992 2 "AK" "B" . 1.1
1992 2 "NE" "A" . 1.5
1992 2 "NE" "B" . 1.5
1992 2 "XI" "C" . 1.3
1992 3 "AL" "A" . 1.4
1992 3 "AL" "B" . 1.4
1992 3 "AK" "A" . 1.1
1992 3 "AK" "B" . 1.1
1992 3 "NE" "A" . 1.5
1992 3 "NE" "B" . 1.5
1992 3 "XI" "C" . 1.3
1992 4 "AL" "A" . 1.4
1992 4 "AL" "B" . 1.4
1992 4 "AK" "A" . 1.1
1992 4 "AK" "B" . 1.1
1992 4 "NE" "A" . 1.5
1992 4 "NE" "B" . 1.5
1992 4 "XI" "C" . 1.3
end

Thank you very much in advance for your help.

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

19 Jun 2024, 08:39

My rule of thumb is that when trying to do something seemingly simple and ordinary but for which no solution comes to mind, look at -help egen- to see if something is already available. In this case, egen's -tag- function offers one solution for your question 1):

Code:

egen just1 = tag(year quarter state) keep if just1

I'd note that this is not a "sample" of observations, which generally connotes a random selection.

As for your your question 2), I have to say I don't understand it. You say "...the value of "vel" should be the average of the value of "vel" of some states in that respective year ... ." What is the rule by which "some" states are the ones to choose for a particular county? Further, what is the rule by which some counties are selected to have this kind of special assignment of an average value? Finally, in your last sentence, you switch to talking just about states in a way that seems unconnected to counties. At least for me, you'd need to clarify these things to make it possible to give an answer.
1 like
Comment

Rafael Acevedo

Join Date: Apr 2019
Posts: 17

19 Jun 2024, 09:33

Mike Lacy thanks for your help.

1) I ran your code and it helps me. I only needed 1 observation per year (all quarters of the same year have the same value). But using your code and after that keeping any of the quarters gives me what I need.
2) Probably, trying to shorten a long history made it confusing. I'm not working with "counties" but MSAs; some MSAs are related to more than 1 State, that's the rule. In my example, I probably messed up putting the same letters to counties from different states, in my example county "A" of State "AL" is different than county "A" from state "AK" and go on. Let me try to correct here the error, (there is no a MSA sharing NE and AK, but we can think that MSA "C" is Omaha-Council Bluffs that shares NE and IA -that I also have it in my "Velocity" dataset-, I just wanted to keep the question as short as possible):

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte quarter str2 state str1 msa float vel
1991 1 "AL" "A"   1
1991 1 "AL" "B"   1
1991 1 "AK" "D" 1.2
1991 1 "AK" "E" 1.2
1991 1 "NE" "F" 1.8
1991 1 "NE" "G" 1.8
1991 1 "XI" "C" 1.3
1991 2 "AL" "A"   1
1991 2 "AL" "B"   1
1991 2 "AK" "D" 1.2
1991 2 "AK" "E" 1.2
1991 2 "NE" "F" 1.8
1991 2 "NE" "G" 1.8
1991 2 "XI" "C" 1.3
1991 3 "AL" "A"   1
1991 3 "AL" "B"   1
1991 3 "AK" "D" 1.2
1991 3 "AK" "E" 1.2
1991 3 "NE" "F" 1.8
1991 3 "NE" "G" 1.8
1991 3 "XI" "C" 1.3
1991 4 "AL" "A"   1
1991 4 "AL" "B"   1
1991 4 "AK" "D" 1.2
1991 4 "AK" "E" 1.2
1991 4 "NE" "F" 1.8
1991 4 "NE" "G" 1.8
1991 4 "XI" "C" 1.3
1992 1 "AL" "A" 1.4
1992 1 "AL" "B" 1.4
1992 1 "AK" "D" 1.1
1992 1 "AK" "E" 1.1
1992 1 "NE" "F" 1.5
1992 1 "NE" "G" 1.5
1992 1 "XI" "C" 1.3
1992 2 "AL" "A" 1.4
1992 2 "AL" "B" 1.4
1992 2 "AK" "D" 1.1
1992 2 "AK" "E" 1.1
1992 2 "NE" "F" 1.5
1992 2 "NE" "G" 1.5
1992 2 "XI" "C" 1.3
1992 3 "AL" "A" 1.4
1992 3 "AL" "B" 1.4
1992 3 "AK" "D" 1.1
1992 3 "AK" "E" 1.1
1992 3 "NE" "F" 1.5
1992 3 "NE" "G" 1.5
1992 3 "XI" "C" 1.3
1992 4 "AL" "A" 1.4
1992 4 "AL" "B" 1.4
1992 4 "AK" "D" 1.1
1992 4 "AK" "E" 1.1
1992 4 "NE" "F" 1.5
1992 4 "NE" "G" 1.5
1992 4 "XI" "C" 1.3
end

Thanks.-

Last edited by Rafael Acevedo; 19 Jun 2024, 09:35.

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#4

19 Jun 2024, 11:30

why not just collapse the data?
Comment

Announcement

Two questions about handling Data

Comment

Comment

Comment