Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating lags with several obs. belonging to the same year

    Hey,

    I have a data set structured as follows

    Name Year Variable Variable_lagged
    Name1 2000 0 .
    Name1 2000 0 .
    Name1 2000 0 .
    Name1 2001 1 0
    Name1 2002 0 1
    Name1 2002 0 1
    Name1 2003 0 0
    Name2 1999 1 .
    Name2 2000 0 1

    The "Variable_lagged" is what I want to achiev, but did not accomplish yet.
    The amount of how often the Name-Year-Variable observation appears can be anything.
    I thought of combinging [_n-1] with if but did not find a solution. I also though of collapsing the data, but this yields problems with other variables.

    Thank you for your help in advance!
    Julian

  • #2
    What you are trying to do makes no sense, at least not as explained so far. You have three different observations with Name == Name1 and Year == 2000. As it happens, Variable has the same value in all of them; but in your larger data set you may not be so fortunate. If there were different values of Variable, there is no way to know which of those values should be considered "the lag" for 2001.

    So you should clarify what you are doing. Does Variable always take on the same value when there are multiple observations for the same Name and Year? If not, then what do you want to do to resolve the conflicts?

    Comment


    • #3
      It is ensured that Variable always takes the same value when there are multiple observations for the respective year. Just the amount of observations "per name per year" varies.

      Comment


      • #4
        I just found a solution.

        I generate a helpfile with

        keep name year variable
        collapse variable, by (name year)
        by name: gen variable_lag = variable[_n-1]

        then merge it m:1 with the "old" file.

        Anyway: Is there a smart solution to do it within the original data-set?
        Last edited by Julian Nuessle; 25 Jul 2017, 08:54.

        Comment


        • #5
          OK. In that case, verifying that assumption is actually correct first is a good idea: data sets often contain nasty surprises around issues like this!

          Code:
          // VERIFY ASSUMPTION
          by Name Year (Variable), sort: assert Variable[1] == Variable[_N]
          
          //  NOW GENERATE THE "LAG"
          by Name (Year): gen Lagged_Variable = Variable[_n-1] if Year[_n-1] == Year - 1
          Added: Crossed with #4. The above works only with the original data. Also, I believe the solution in #3 does something different: you will not get missing values for the lag when the year repeats: you will get repetitions of the non-missing value.

          Comment

          Working...
          X