Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate variable from future value of existing variable

    Greetings:

    Question: I want to generate a variable in my dataset that is the value of an existing variable but a future value.

    Unsure of how to build the syntax of "gen var = electoraltier if yearofparliament is plus 4" for instance. (I'm trying to create a prospective electoral tier [how they ran in next election] for each elected official. Obviously I'll have missing values at the end for this new variable, but that's the next step to worry about)

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 mspname float(yearofparliament electoraltier)
    "Canavan, Dennis" 1999 4
    "Goldie, Annabel" 1999 3
    "Scanlon, Mary" 1999 3
    "Fergusson, Alex" 1999 3
    "Johnstone, Alex" 1999 3
    "Wallace, Ben" 1999 3
    "Monteith, Mr Brian" 1999 3
    "Davidson, Mr David" 1999 3
    "McLetchie, David" 1999 3

    My dataset goes to 2021, so just a short example above, of course.

    Appreciate any suggestions.

  • #2
    I am guessing that MSP stands for Member of Scottish Parliament. I don't have the specifics related to the election cycle, but if it occurs strictly every 4 years, you need to specify that the years are in units of 4. After encoding the MSP string variable and declaring your dataset to be a panel, you can then use time-series operators. See

    Code:
    help encode
    help xtset
    help tsvarlist
    Here is a modified version of your data example, incorporating some assumptions about the election cycle. Special circumstances, such as a by-election, will not fit into this structure.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 mspname float(yearofparliament electoraltier)
    "Canavan, Dennis" 1999 4
    "Canavan, Dennis" 2003 5
    "Canavan, Dennis" 2007 6
    "Goldie, Annabel" 1999 3
    "Goldie, Annabel" 2003 2
    "Goldie, Annabel" 2007 1
    "Scanlon, Mary" 1999 3
    "Scanlon, Mary" 2003 4
    "Scanlon, Mary" 2007 3
    end
    
    encode mspname, gen(id)
    order id year
    xtset id year, delta(4)
    gen wanted= F.electoraltier
    Res.:

    Code:
    . l, sepby(id)
    
         +------------------------------------------------------------------+
         |              id   yearof~t           mspname   electo~r   wanted |
         |------------------------------------------------------------------|
      1. | Canavan, Dennis       1999   Canavan, Dennis          4        5 |
      2. | Canavan, Dennis       2003   Canavan, Dennis          5        6 |
      3. | Canavan, Dennis       2007   Canavan, Dennis          6        . |
         |------------------------------------------------------------------|
      4. | Goldie, Annabel       1999   Goldie, Annabel          3        2 |
      5. | Goldie, Annabel       2003   Goldie, Annabel          2        1 |
      6. | Goldie, Annabel       2007   Goldie, Annabel          1        . |
         |------------------------------------------------------------------|
      7. |   Scanlon, Mary       1999     Scanlon, Mary          3        4 |
      8. |   Scanlon, Mary       2003     Scanlon, Mary          4        3 |
      9. |   Scanlon, Mary       2007     Scanlon, Mary          3        . |
         +------------------------------------------------------------------+

    Comment


    • #3
      Thanks!

      I ran into another issue, data specific, that I'll have to figure out. Because of various reasons (by-elections, party switches, election years), I have some duplicate data in year, so when I attempt to to put in a panel I get an error:

      xtset MSP yearofparliament, delta (4)
      repeated time values within panel
      Using the Stata FAQ: How do I deal with a report of values within a panel?, I see it's because I have data that can look like this:

      . duplicates list MSP yearofparliament

      Duplicates in terms of MSP yearofparliament

      +------------------------------------------------+
      | Group Obs MSP yearof~t |
      |------------------------------------------------|
      | 1 5 Adam, Brian 2003 |
      | 1 6 Adam, Brian 2003 |
      | 2 33 Adamson, Clare 2016 |
      | 2 34 Adamson, Clare 2016 |
      | 3 167 Ballantyne, Michelle 2020 |
      |------------------------------------------------|
      | 3 168 Ballantyne, Michelle 2020 |
      | 4 245 Boyack, Sarah 2011 |
      | 4 246 Boyack, Sarah 2011 |
      | 5 366 Byrne, Ms Rosemary 2006 |
      | 5 367 Byrne, Ms Rosemary 2006
      I realise Stata sees it as an "error" though it isn't (different various variables associated with each entry). So I'll have to figure it out. But I thank you for your advice, it gives me a good jumping point.

      Comment


      • #4
        xtset with a panel identifier and a time variable requires that each (identifier, time) pair occurs at most once and it's an error to specify that if the requirement is not satisfied by the data. The underlying question is what do you want to do that requires such a declaration beforehand. Typically the real problem is when some model for panel data needs that kind of data.

        I can't see that delta(4) is fully compatible with dates like 2003 2006 2011 2016 2020 and indeed irregularities such as you describe are inevitable with your kind of data.

        It could well be that analysis requires you to fall back on specifying a panel identifier alone and/or to specify dates more precisely for the same year.

        Code:
        SJ-23-1 st0709  . Tip 150: When is it appropriate to xtset with panelvar only?
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Lazzaro
                Q1/23   SJ 23(1):281--292                               (no commands)
                discusses error r(451) in a panel dataset context and
                when using xtset with only a panelvar is appropriate

        Comment


        • #5
          To add to Nick's excellent advice in #4, the 4-year restriction in #1 is not feasible in the presence of by-elections and election cycles that do not strictly occur every 4 years. Secondly, is the variable "electoraltier" constant within an MSP and parliamentary year? That is, if the candidate switches parties in a given electoral year, are they still in the same electoral tier? If so, you can create a lead electoral tier variable that does not necessarily follow the 4-year rule. It could simply be the next observed electoral tier, where the duration may vary depending on the MSP's observed electoral tiers.

          Comment

          Working...
          X