Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Lagged variables with repeated values

    Dear all,

    I wanted help in creating lagged variables when there are repeated values in the dataset.

    In particular, I want to construct monthly lags for the dependent variable State Wins for the following regression:

    reg StateWins ramadan_month lagged_one_month_StateWins lagged_two_month_StateWins

    Here is a snapshot of my dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(StateWins ramadan_month) int yeardecision str9 monthdecided str21 islamicdate
    1 0 1953 "July" "Shawwal, 1372"
    1 0 1953 "July" "Shawwal, 1372"
    1 0 1953 "July" "Shawwal, 1372"
    0 1 1953 "June" "Ramadan, 1372"
    1 1 1953 "June" "Ramadan, 1372"
    1 1 1953 "June" "Ramadan, 1372"
    1 0 1953 "March" "Djumadal-Akhira, 1372"
    0 0 1954 "March" "Djumadal-Akhira, 1372"
    end

    tsset month or tsset yeardecision gives me error that I have repeated values in the sample so I cannot use l.StateWins
    Code:
    . tsset yeardecision
    repeated time values in sample
    r(451);
    
    . tsset month
    repeated time values in sample
    r(451);
    How may I construct a lagged variable at monthly level for my dependent variable StateWins?

    Thank you very much. Your help here will really be aappreciated.

    Kind Regards,
    Roger

  • #2
    You can't. This data is not capable of generating lagged variables; it doesn't even make sense.

    Think about it. Pick any observation in your data set: there are multiple candidates for the "lagged" observation, because your time variables occur repeatedly. How would Stata, or you, or anyone, decide which of these is the appropriate observation to call the "lag." It simp;ly can't be done, unless it is based on other variables in the data that you have not mentioned.

    Even if you combine yeardecision and monthdecided into a single monthly time variable (which I think you would have to do as a minimum if the data were suitable) you still have duplicate values. This is simply not time-series data: either the data are corrupt and you need to fix them so they are time series data, or you need to revise your understanding of the data and plan an analysis that does not require lags.

    Comment

    Working...
    X