Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate lagged value for psudo panel

    Hi all,

    I have a data looks as below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 id byte(time iv)
    "a" 1 10
    "a" 2 20
    "a" 3 30
    "a" 4 40
    "a" 5 50
    "b" 1 12
    "b" 2 13
    "b" 3 14
    "b" 4 15
    "b" 5 16
    end

    I want to generate the lagged value for each of the row as below. The time variable here is not really a date and the dataset is not a real panel data. I wonder if anyone knows how to realize this?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 id byte(time iv ivt_1 ivt_2)
    "a" 1 10  .  .
    "a" 2 20 10  .
    "a" 3 30 20 10
    "a" 4 40 30 20
    "a" 5 50 40 30
    "b" 1 12  .  .
    "b" 2 13 12  .
    "b" 3 14 13 12
    "b" 4 15 14 13
    "b" 5 16 15 14
    end
    Thanks

  • #2
    Code:
    encode id, gen(n_id)
    xtset n_id time
    gen ivt_1 = L1.iv
    gen ivt_2 = L2.iv
    It does not matter that the time variable is just an ordering sequence and not a real time. The only thing that matters is that it is an integer value, and that id and time uniquely identify observations in the data.

    Added: I forgot to mention that probably you shouldn't do this anyway. There is seldom any need to create variables containing lags: if your plan is to build a regression model that includes lagged values of iv, you can just insert them directly into the regression command without cluttering up the data set with new variables that contain no new information. Stata will create them "on the fly" for you:

    Code:
    regression_command dependent_variable iv L1.iv L2.iv perhaps_other_variables
    Last edited by Clyde Schechter; 02 May 2022, 18:48.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Code:
      encode id, gen(n_id)
      xtset n_id time
      gen ivt_1 = L1.iv
      gen ivt_2 = L2.iv
      It does not matter that the time variable is just an ordering sequence and not a real time. The only thing that matters is that it is an integer value, and that id and time uniquely identify observations in the data.

      Added: I forgot to mention that probably you shouldn't do this anyway. There is seldom any need to create variables containing lags: if your plan is to build a regression model that includes lagged values of iv, you can just insert them directly into the regression command without cluttering up the data set with new variables that contain no new information. Stata will create them "on the fly" for you:

      Code:
      regression_command dependent_variable iv L1.iv L2.iv perhaps_other_variables
      Hi Clyde,

      Thank you so much for your reply. There's one issue with my dataset: sometimes id and time don't uniquely identify observations in the data. That's why I can't declare a panel data. In the current sample I think this is the case, but when I tried the second command, there's an error says r(111) time variable not set. I wonder if you have any idea how to fix this error? And also do I have to drop some obs so that id ad time uniquely identity observations?

      Thanks and look forward to your reply.

      Best wishes
      Meng

      Comment


      • #4
        Originally posted by Meng JI View Post

        Hi Clyde,

        Thank you so much for your reply. There's one issue with my dataset: sometimes id and time don't uniquely identify observations in the data. That's why I can't declare a panel data. In the current sample I think this is the case, but when I tried the second command, there's an error says r(111) time variable not set. I wonder if you have any idea how to fix this error? And also do I have to drop some obs so that id ad time uniquely identity observations?

        Thanks and look forward to your reply.

        Best wishes
        Meng
        Sorry misunderstood what you mean. This code works for the sample dataset.

        encode id, gen(n_id)
        xtset n_id time
        reg dv iv L1.iv
        But for the large dataset, it shows error: repeated time values within panel.

        Comment


        • #5
          In that case, what you are asking to do is ill-posed. Here's why. Suppose we have two observations that are both id = "a" and time = 1, like this:
          Code:
              id   time   iv  
               a      1   10  
               a      1   20  
               a      2   30  
               a      3   40
          What would the value of ivt_1 in the third observation be? Should it be 10, from the first observation, or 20 from the second one? There is no way to resolve this. There simply is no such thing as the lagged value of a variable when there is not a unique observation that immediately precedes it in time. The concept makes no sense in that context. You either need new data, or a new plan.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            In that case, what you are asking to do is ill-posed. Here's why. Suppose we have two observations that are both id = "a" and time = 1, like this:
            Code:
            id time iv
            a 1 10
            a 1 20
            a 2 30
            a 3 40
            What would the value of ivt_1 in the third observation be? Should it be 10, from the first observation, or 20 from the second one? There is no way to resolve this. There simply is no such thing as the lagged value of a variable when there is not a unique observation that immediately precedes it in time. The concept makes no sense in that context. You either need new data, or a new plan.
            Hi Clyde,

            Thank you so much for your patient explanation. The issue is very clear to me now. I'll think about it and see if there's any better data that I can use for the analysis.

            Comment

            Working...
            X