Generating a new row for specific year

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#16

23 Feb 2024, 18:13

but it does not transfer the "city" and "state" information to the newly created entry for 2004

Yes, it does. I think what you are noticing relates to what I said about sticking with the transfer of information from the previous year. If you run the code in #13 with your original example data, you will see that year 2004 does acquire the city and state information in id 1004. It does not do that for id 1005 because in that id, 2004 is the first year and there is no previous year from which to transfer the information.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#17

23 Feb 2024, 22:14

Hi Clyde,

this makes sense.

how would I make this to not fill the information from the preceding year but from the subsequent year?

thanks!

Originally posted by Clyde Schechter View Post

Yes, it does. I think what you are noticing relates to what I said about sticking with the transfer of information from the previous year. If you run the code in #13 with your original example data, you will see that year 2004 does acquire the city and state information in id 1004. It does not do that for id 1005 because in that id, 2004 is the first year and there is no previous year from which to transfer the information.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

#18

23 Feb 2024, 22:38

It's just one additional line of code near the end:

Code:

by id (year), sort: egen has_2004 = max(year == 2004)
by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
expand expander
by id (expander year), sort: replace year = 2004 if _n == _N ///
    & has_in_range_years & !has_2004

isid id year, sort
ds id year has_2004, not
foreach v of varlist `r(varlist)' {
    by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
    by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
}
drop expander

Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

#19

23 Feb 2024, 23:34

thanks! Will try soon!

Originally posted by Clyde Schechter View Post

It's just one additional line of code near the end:

Code:

by id (year), sort: egen has_2004 = max(year == 2004)
by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
expand expander
by id (expander year), sort: replace year = 2004 if _n == _N ///
& has_in_range_years & !has_2004

isid id year, sort
ds id year has_2004, not
foreach v of varlist `r(varlist)' {
by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
 by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
}
drop expander

Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

#20

24 Feb 2024, 14:35

Thanks, Clyde.

I am trying to understand your code instead of blindly just executing them. So pardon my questions.

In your code:

Code:

by leaseid (fyear): gen expander = cond(has_in_range_fyears &!has_2003 & _n == 1, 2, 1
expand expander

What does:
1) the last conditional mean -_n==1,2,1- and what does it do?
2) -expand expander- do?

thanks.

Originally posted by Clyde Schechter View Post

It's just one additional line of code near the end:

Code:

by id (year), sort: egen has_2004 = max(year == 2004)
by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
expand expander
by id (expander year), sort: replace year = 2004 if _n == _N ///
& has_in_range_years & !has_2004

isid id year, sort
ds id year has_2004, not
foreach v of varlist `r(varlist)' {
by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
}
drop expander

Last edited by Stephen Ch; 24 Feb 2024, 14:39.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#21

24 Feb 2024, 15:43

Code:

by leaseid (fyear): gen expander = cond(has_in_range_fyears &!has_2003 & _n == 1, 2, 1

This command creates a new variable, expander. To do that it examines each observation. In a leasid group of observations that includes data for years within that 2000-2010 range and but does not have an observation for year 2003 (did you mistype that--the original code says 2004), then expander will be set to 2 in the first (that's what the _n == 1 does) observation of that group. In all other observations of that leasid group, it is set to 1. And it is also set to 1 in all observations of a leasid group that either has no observations between 2000 and 2010 or already has a year 2003 (2004?) observation.

If you are wondering why I singled out the first observation in the group for special treatment, the answer is that I wasn't really interested in the first. I just needed to somehow identify one observation in each of those groups that I would duplicate to serve as the template for the new 2004 observation. Choosing the first is the easiest way to do that. I could have chosen the last--that's equally easy. I could have chosen one at random, but that's more complicated and less efficient.

Code:

expand expander

The -expand- command creates extra copies of (some) observations. For each observation, the number depends on the value of the variable expander. (It can be called anything, it doesn't have to be named expander). In observations where expander == 1, no additional copies are made. If expander == N, where N > 1, then after -expander- is done, there will be a total of N copies of that observation in the data set. So, in our situation, in each group of observations for a leaseid, if it has some years between 2000 and 2010 and doesn't already have a 2003 (2004?) observation, the preceding command has already set expander to 2 in one of the (the first) observations for the group. That one will be duplicated. Everything else is left alone. So now we have a new observation for each of the groups that needs a new 2003 (2004?) observation. The rest of the code just changes the year in that extra observation to 2004 and then copies the other variables from the chronologically preceding observation, or the following observation if there is no observation before 2003 (2004?).
1 like
Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

#22

25 Feb 2024, 08:28

Hi Clyde,

Your previous line of code when the new row was filling information from the preceding observations:

Code:

by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004

Is this necessary (i.e., can i get rid of it) now that I use this line of coding to fill from the subsequent observation?

Code:

 by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')

Thanks.

Originally posted by Clyde Schechter View Post

It's just one additional line of code near the end:

Code:

by id (year), sort: egen has_2004 = max(year == 2004)
by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
expand expander
by id (expander year), sort: replace year = 2004 if _n == _N ///
& has_in_range_years & !has_2004

isid id year, sort
ds id year has_2004, not
foreach v of varlist `r(varlist)' {
by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
 by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
}
drop expander

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#23

25 Feb 2024, 11:39

Well, I think you need both commands. If you eliminate the first and use only the second, you will find that if the new 2004 observation turns out to be the final observation for that particular leaseid, then the information in those variables will not get filled in. So unless you are absolutely certain that there will always be an observation later than year 2004, you need both commands.
1 like
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#24

25 Feb 2024, 11:50

thanks, this makes sense!

Originally posted by Stephen Ch View Post

Hi Clyde,

Your previous line of code when the new row was filling information from the preceding observations:

Code:

by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004

Is this necessary (i.e., can i get rid of it) now that I use this line of coding to fill from the subsequent observation?

Code:

by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')

Thanks.
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1054
#25

25 Feb 2024, 21:41

Deleted this, since I posted I found it's been resolved on page 2.
1 like
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#26

26 Feb 2024, 06:57

thanks, Ken!

Originally posted by Ken Chui View Post

Deleted this, since I posted I found it's been resolved on page 2.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment