Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a variable with consecutive numbers that follow certain a rule of when to start counting consecutively?

    Dear experts,

    I am working with a panel dataset (from 1946 to 2015) that contains among other variables a dummy equal to 1 for each year a country is at war.
    I am trying to create a variable that records peace. In essence, the peace variable is supposed to start with 1 when a war ends (thus the war dummy would be 0), and continue to have consecutive values (2, 3, 4 etc.) until a new war starts (war = 1), and then peace gets values of 0 until that new war ends, and so on. Thus, it measures the years of peace from the time a war stops, and up until a new war starts.

    I have created the peace variable itself, but I have some issues with it. Below is the code I used to create the peace variable.

    Code:
    //Generating Peace Variable
    gen peace = 1
    replace peace = 0 if owar==1
    by country, sort: replace peace = peace + peace[_n-1] if lowar==0
    replace peace = 0 if peace!=0 & owar==1
    peace - peace variable
    owar - war dummy
    lowar - lag of war dummy

    The problem with this peace variable is that if a country, say Afghanistan, begins in the dataset (at 1946) without a war, I want it to have values of 0 for the peace variable until the first war of that country starts, and then when the war ends I want the peace variable to continue normally. However the variable I created counts the first years of any country (1946 and up) as "post-war" peace years and assigns consecutive values starting from 1 until a war starts.
    An example to make it a bit clearer:
    country year owar my "peace" variable new "peace" variable I want
    Afghanistan 1946 0 1 0
    Afghanistan 1947 0 2 0
    Afghanistan 1948 1 0 0
    Afghanistan 1949 0 1 1
    Angola 1946 0 1 0
    Angola 1947 1 0 0
    Angola 1948 0 1 1
    Angola 1949 0 2 2
    Unfortunately the first wars of all the countries start at different times, so I can't specify a certain time that would encompass all first wars of each country.
    Essentially, I want the countries and years before the first wars start to have 0 values for peace, and then once the first wars start the peace variable continues normally (consecutively).
    However, I can't figure out how to make that distinction in Stata.
    In the worst case scenario, I would have to fix them manually.
    I hope I've made myself clear enough, and if not, I apologize.

    Thank you in advance!

  • #2
    Your example data doesn't include any examples where the country does start out at war in the data set, so this code has not been tested for that case, but I believe it will still work correctly. It creates, just from country, year, and owar, the variable peace_var, which is what you are asking for.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str11 country int year byte(owar mypeacevariable newpeacevariableiwant)
    "Afghanistan" 1946 0 1 0
    "Afghanistan" 1947 0 2 0
    "Afghanistan" 1948 1 0 0
    "Afghanistan" 1949 0 1 1
    "Angola"      1946 0 1 0
    "Angola"      1947 1 0 0
    "Angola"      1948 0 1 1
    "Angola"      1949 0 2 2
    end
    
    by country (year), sort: gen int war_count = sum(owar == 1  & owar[_n-1] != 1)
    by country (year), sort: gen spell = sum(owar != owar[_n-1])
    by country spell (year), sort: gen peace_var = ///
        cond(owar == 1 | war_count == 0, 0, _n)
    assert peace_var == newpeacevariableiwant
    In the future, when showing data examples, please use the -dataex- command to do so, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    In the worst case scenario, I would have to fix them manually.
    Manually changing the data should essentially never be done. In the very rare case where there is no alternative, make sure you have a log running to document the changes you make so that you have a complete audit trail of everything you do. Frankly, in a data set where each observation is uniquely identified by country and year, I cannot think of any circumstance where it would be warranted to manually change the data. You can always do the changes with a series of -replace this = that if country == country_needing_change & year == year_needing_change- commands.

    Added: Are you sure you want it to be 0 in the years before the first war begins? It sounds to me like it might be more appropriate to have it be missing value in those years. It depends on how you plan to analyze it and interpret the analysis, but if your analysis and interpretation treat this as truly representing 0 years of peace, I think your results are likely to be distorted. If you want it to be missing value, the code would be
    Code:
    by country (year), sort: gen int war_count = sum(owar == 1  & owar[_n-1] != 1)
    by country (year), sort: gen spell = sum(owar != owar[_n-1])
    by country spell (year), sort: gen peace_var = ///
        cond(owar == 1, 0, _n) if war_count > 0
    Last edited by Clyde Schechter; 14 Jul 2018, 18:47.

    Comment


    • #3
      See also https://www.stata-journal.com/articl...article=dm0029 and tsspell (SSC). Your desire to treat the first spell of peace differently would imply an extra ad hoc fix.

      WIth xtset data code (not tested) might be

      Code:
      tsspell, cond(owar==0) 
      
      bysort country (year) : replace _seq = 0 if _spell == 1 & owar[1] == 0 
      by country (year) : replace _end = 1 if _spell == 1 & owar[1] == 0 
      by country (year) : replace _spell = _spell - 1 if owar[1] == 0

      Comment


      • #4
        The below code, which might be a little more direct, also serves for what Gentlan wants as explained in #1. However, I do support Clyde’s suggestion that the value of peace before the first war should be missing instead of 0. Such argument would keep the logic thoroughly consistent for further analyzing.

        Code:
        gen peace=.
        bys country (year): replace peace = cond(owar==1,0,peace[_n-1]+1)
        replace peace=0 if peace==. // disable this line to keep peace as missing before first war

        Comment


        • #5
          Thank you both for your help. The code worked exactly as I wanted it to. I tried the code from Mr. Schechter first and it worked even for the countries that started out in the dataset with an ongoing war.
          The alternative spell fix by Mr. Cox also worked. Even though I am working with spell data it seems, I was blind to the concept and its conceptualization in Stata until now. I will make sure to read up on spells, and especially the paper you provided Mr. Cox.
          Also, I will make sure to provide code via dataex next time.


          Added: Are you sure you want it to be 0 in the years before the first war begins? It sounds to me like it might be more appropriate to have it be missing value in those years. It depends on how you plan to analyze it and interpret the analysis, but if your analysis and interpretation treat this as truly representing 0 years of peace, I think your results are likely to be distorted. If you want it to be missing value, the code would be
          Code:
          Code:
           
           by country (year), sort: gen int war_count = sum(owar == 1  & owar[_n-1] != 1) by country (year), sort: gen spell = sum(owar != owar[_n-1]) by country spell (year), sort: gen peace_var = ///     cond(owar == 1, 0, _n) if war_count > 0
          If the first war isn't from 1946 and onward, I plan to simply replace the first 3, or 6 values of peace with missing data, provided a war does not start during those first 3 or 6 years (according to what some of the papers I have read have suggested). However, I did not mention this since I am able to carry out this modification. Nonetheless, thank you for the additional code. I will use it if need be.

          Sincerely,

          Gentian Gashi

          Comment


          • #6
            Originally posted by Romalpa Akzo View Post
            The below code, which might be a little more direct, also serves for what Gentlan wants as explained in #1. However, I do support Clyde’s suggestion that the value of peace before the first war should be missing instead of 0. Such argument would keep the logic thoroughly consistent for further analyzing.

            Code:
            gen peace=.
            bys country (year): replace peace = cond(owar==1,0,peace[_n-1]+1)
            replace peace=0 if peace==. // disable this line to keep peace as missing before first war
            Dear Ms. Akzo,

            Thank you for your input. Your code serves my purpose as well, and is even shorter.
            I will make sure to understand what exactly is being done with this kind of code so that I can use it in the future for other cases too.

            Best,

            Gentian Gashi

            Comment

            Working...
            X