Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    but it does not transfer the "city" and "state" information to the newly created entry for 2004
    Yes, it does. I think what you are noticing relates to what I said about sticking with the transfer of information from the previous year. If you run the code in #13 with your original example data, you will see that year 2004 does acquire the city and state information in id 1004. It does not do that for id 1005 because in that id, 2004 is the first year and there is no previous year from which to transfer the information.

    Comment


    • #17
      Hi Clyde,

      this makes sense.

      how would I make this to not fill the information from the preceding year but from the subsequent year?

      thanks!

      Originally posted by Clyde Schechter View Post
      Yes, it does. I think what you are noticing relates to what I said about sticking with the transfer of information from the previous year. If you run the code in #13 with your original example data, you will see that year 2004 does acquire the city and state information in id 1004. It does not do that for id 1005 because in that id, 2004 is the first year and there is no previous year from which to transfer the information.

      Comment


      • #18
        It's just one additional line of code near the end:
        Code:
        by id (year), sort: egen has_2004 = max(year == 2004)
        by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
        by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
        expand expander
        by id (expander year), sort: replace year = 2004 if _n == _N ///
            & has_in_range_years & !has_2004
        
        isid id year, sort
        ds id year has_2004, not
        foreach v of varlist `r(varlist)' {
            by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
            by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
        }
        drop expander

        Comment


        • #19
          thanks! Will try soon!


          Originally posted by Clyde Schechter View Post
          It's just one additional line of code near the end:
          Code:
          by id (year), sort: egen has_2004 = max(year == 2004)
          by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
          by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
          expand expander
          by id (expander year), sort: replace year = 2004 if _n == _N ///
          & has_in_range_years & !has_2004
          
          isid id year, sort
          ds id year has_2004, not
          foreach v of varlist `r(varlist)' {
          by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
           by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
          }
          drop expander

          Comment


          • #20
            Thanks, Clyde.

            I am trying to understand your code instead of blindly just executing them. So pardon my questions.

            In your code:

            Code:
            by leaseid (fyear): gen expander = cond(has_in_range_fyears &!has_2003 & _n == 1, 2, 1
            expand expander
            What does:
            1) the last conditional mean -_n==1,2,1- and what does it do?
            2) -expand expander- do?

            thanks.

            Originally posted by Clyde Schechter View Post
            It's just one additional line of code near the end:
            Code:
            by id (year), sort: egen has_2004 = max(year == 2004)
            by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
            by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
            expand expander
            by id (expander year), sort: replace year = 2004 if _n == _N ///
            & has_in_range_years & !has_2004
            
            isid id year, sort
            ds id year has_2004, not
            foreach v of varlist `r(varlist)' {
            by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
            by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
            }
            drop expander
            Last edited by Stephen Ch; 24 Feb 2024, 14:39.

            Comment


            • #21
              Code:
              by leaseid (fyear): gen expander = cond(has_in_range_fyears &!has_2003 & _n == 1, 2, 1
              This command creates a new variable, expander. To do that it examines each observation. In a leasid group of observations that includes data for years within that 2000-2010 range and but does not have an observation for year 2003 (did you mistype that--the original code says 2004), then expander will be set to 2 in the first (that's what the _n == 1 does) observation of that group. In all other observations of that leasid group, it is set to 1. And it is also set to 1 in all observations of a leasid group that either has no observations between 2000 and 2010 or already has a year 2003 (2004?) observation.

              If you are wondering why I singled out the first observation in the group for special treatment, the answer is that I wasn't really interested in the first. I just needed to somehow identify one observation in each of those groups that I would duplicate to serve as the template for the new 2004 observation. Choosing the first is the easiest way to do that. I could have chosen the last--that's equally easy. I could have chosen one at random, but that's more complicated and less efficient.

              Code:
              expand expander
              The -expand- command creates extra copies of (some) observations. For each observation, the number depends on the value of the variable expander. (It can be called anything, it doesn't have to be named expander). In observations where expander == 1, no additional copies are made. If expander == N, where N > 1, then after -expander- is done, there will be a total of N copies of that observation in the data set. So, in our situation, in each group of observations for a leaseid, if it has some years between 2000 and 2010 and doesn't already have a 2003 (2004?) observation, the preceding command has already set expander to 2 in one of the (the first) observations for the group. That one will be duplicated. Everything else is left alone. So now we have a new observation for each of the groups that needs a new 2003 (2004?) observation. The rest of the code just changes the year in that extra observation to 2004 and then copies the other variables from the chronologically preceding observation, or the following observation if there is no observation before 2003 (2004?).

              Comment


              • #22
                Hi Clyde,

                Your previous line of code when the new row was filling information from the preceding observations:

                Code:
                by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
                Is this necessary (i.e., can i get rid of it) now that I use this line of coding to fill from the subsequent observation?

                Code:
                 by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
                Thanks.

                Originally posted by Clyde Schechter View Post
                It's just one additional line of code near the end:
                Code:
                by id (year), sort: egen has_2004 = max(year == 2004)
                by id (year), sort: egen has_in_range_years = max(inrange(year, 2000, 2010))
                by id (year): gen expander = cond(has_in_range_years &!has_2004 & _n == 1, 2, 1)
                expand expander
                by id (expander year), sort: replace year = 2004 if _n == _N ///
                & has_in_range_years & !has_2004
                
                isid id year, sort
                ds id year has_2004, not
                foreach v of varlist `r(varlist)' {
                by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
                 by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
                }
                drop expander

                Comment


                • #23
                  Well, I think you need both commands. If you eliminate the first and use only the second, you will find that if the new 2004 observation turns out to be the final observation for that particular leaseid, then the information in those variables will not get filled in. So unless you are absolutely certain that there will always be an observation later than year 2004, you need both commands.

                  Comment


                  • #24
                    thanks, this makes sense!

                    Originally posted by Stephen Ch View Post
                    Hi Clyde,

                    Your previous line of code when the new row was filling information from the preceding observations:

                    Code:
                    by id (year): replace `v' = `v'[_n-1] if year == 2004 & !has_2004
                    Is this necessary (i.e., can i get rid of it) now that I use this line of coding to fill from the subsequent observation?

                    Code:
                     by id (year): replace `v' = `v'[_n+1] if year == 2004 & !has_2004 & missing(`v')
                    Thanks.


                    Comment


                    • #25
                      Deleted this, since I posted I found it's been resolved on page 2.

                      Comment


                      • #26
                        thanks, Ken!

                        Originally posted by Ken Chui View Post
                        Deleted this, since I posted I found it's been resolved on page 2.


                        Comment

                        Working...
                        X