Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with troubleshooting empty values

    Hi all,

    I am having issues troubleshooting the following problematic situation. I need to compute for each legislator the distance (expressed in n. of years) between any given year and the following election year (flagged by the dummy "election_year").

    The data is a panel of legislators' identifiers (bioguide), the year and the dummy flagging election years. They are sorted in alphabetical order for the bioguide variable and descending order for the years. Hence, the data look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 bioguide float(year election_year)
    "a000001" 1951 0
    "a000002" 1972 1
    "a000002" 1971 0
    "a000002" 1970 1
    "a000002" 1969 0
    "a000002" 1968 1
    "a000002" 1967 0
    "a000002" 1966 1
    "a000002" 1965 0
    "a000002" 1964 1
    "a000002" 1963 0
    "a000002" 1962 1
    "a000002" 1961 0
    "a000002" 1960 1
    "a000002" 1959 0
    "a000002" 1958 1
    "a000002" 1957 0
    "a000002" 1956 1
    "a000002" 1955 0
    "a000002" 1954 1
    "a000002" 1953 0
    "a000002" 1952 1
    "a000002" 1951 0
    "a000002" 1950 1
    "a000002" 1949 0
    "a000009" 1986 1
    "a000009" 1985 0
    "a000009" 1984 0
    "a000009" 1983 0
    "a000009" 1982 0
    "a000009" 1981 0
    "a000009" 1980 1
    "a000009" 1979 0
    "a000009" 1978 1
    "a000009" 1977 0
    "a000009" 1976 1
    "a000009" 1975 0
    "a000009" 1974 1
    "a000009" 1973 0
    "a000011" 1964 1
    "a000011" 1963 0
    "a000014" 2009 1
    "a000014" 2008 1
    "a000014" 2007 0
    "a000014" 2006 1
    "a000014" 2005 0
    "a000014" 2004 1
    "a000014" 2003 0
    "a000014" 2002 1
    "a000014" 2001 0
    "a000014" 2000 1
    "a000014" 1999 0
    "a000014" 1998 1
    "a000014" 1997 0
    "a000014" 1996 1
    "a000014" 1995 0
    "a000014" 1994 1
    "a000014" 1993 0
    "a000014" 1992 1
    "a000014" 1991 0
    "a000016" 1972 1
    "a000016" 1971 0
    "a000016" 1970 1
    "a000016" 1969 0
    "a000016" 1968 1
    "a000016" 1967 0
    "a000016" 1966 1
    "a000016" 1965 0
    "a000016" 1964 1
    "a000016" 1963 0
    "a000016" 1962 1
    "a000016" 1961 0
    "a000016" 1960 1
    "a000016" 1959 0
    "a000016" 1958 1
    "a000016" 1957 0
    "a000016" 1956 1
    "a000016" 1955 0
    "a000016" 1954 1
    "a000016" 1953 0
    "a000016" 1951 0
    "a000016" 1950 1
    "a000016" 1949 0
    "a000016" 1948 1
    "a000016" 1947 0
    "a000017" 1978 1
    "a000017" 1977 0
    "a000017" 1976 0
    "a000017" 1975 0
    "a000017" 1974 0
    "a000017" 1973 0
    "a000017" 1972 1
    "a000017" 1971 0
    "a000018" 1976 1
    "a000018" 1975 0
    "a000018" 1974 1
    "a000018" 1973 0
    "a000018" 1972 1
    "a000018" 1971 0
    "a000022" 2012 1
    end
    Yet, when I type the following commands to get the desired outcome:
    Code:
    gen years_to_election=0
    by bioguide: replace years_to_election = years_to_election[_n-1] + 1 if election_year!=1
    I get some missing values in the new "years_to_election" variable. They are a tiny fraction, but still I don't see how they come from.
    Many thanks in advance to all for the precious support!

  • #2
    I did this, which may help.

    Code:
    gsort bioguide -year 
    by bioguide : gen next = year if election_year 
    by bioguide : replace next = next[_n-1] if missing(next)
    gen years_to_election = year - next 
    list , sepby(bioguide next)
    One problem with your code is that years_to_election[0] will be evaluated as missing when _n - 1 == 0 or _n == 1.

    Comment


    • #3
      Should be next - year in #2.

      Comment


      • #4
        Many thanks Nick Cox, that for sure is an improvement on the original situation!

        However, after having posted here on the forum, I realized that the problem lies in the underlying structure of the data. Missing values in fact appears in two cases: 1) when a legislator ends her tenure before the standard duration of the term (because of resignation, switch to another government position and so forth); 2) when I don't have the occurence of an election year in the data, which results in all the previous years ending up as missing values as well (e.g. since my data end in 2020, all the legislator facing elections in 2022 will end up with missing values in all the years leading up to 2022).

        So eventually it is more a structural problem of the data rather than a purely code-related issue. Thanks anyway for your helpful support!

        Comment

        Working...
        X