Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting how many values belong to a predefined list

    Hi there,

    I have a dataset that looks like this:
    Current Year Birth Year Person
    2000 1980 John
    2001 1980 John
    2005 1963 Emily
    2006 1963 Emily
    I would like to create a new variable that tells me how many years between the Current Year and Birth Year values are in a list, let's say (1974, 1977, 1995, 1996, 1999, 2006). The new variable in this sample data should show the following values:
    Current Year Birth Year Person New Variable
    2000 1980 John 3
    2001 1980 John 3
    2005 1963 Emily 5
    2006 1963 Emily 6
    Can you please help me write a code that accomplishes this?

  • #2
    This is unclear. Where did the 3 come from for John 1?

    Comment


    • #3
      If you list of years to match is fairly short and your data set is not very large, you can do it quite simply like this:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int(currentyear birthyear) str5 person
      2000 1980 "John"
      2001 1980 "John"
      2005 1963 "Emily"
      2006 1963 "Emily"
      end
      
      local target_years 1974 1977 1995 1996 1999 2006
      
      gen wanted = 0
      foreach t of local target_years {
          replace wanted = wanted + 1 if inrange(`t', birthyear, currentyear)
      }
      But if you have a very long list of years to match, or if your data set is very large, it would be more efficient to make a data set out of the list of years and do it like this:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float target
      1974
      1977
      1995
      1996
      1999
      2006
      end
      tempfile targets
      save `targets'
      
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int(currentyear birthyear) str5 person
      2000 1980 "John"
      2001 1980 "John"
      2005 1963 "Emily"
      2006 1963 "Emily"
      end
      tempfile original
      save `original'
      
      isid person currentyear birthyear
      
      rangejoin target birthyear currentyear using `targets'
      collapse (count) wanted = target, by(currentyear birthyear person)
      merge 1:1 currentyear birthyear person using `original', ///
          assert(using match) nogenerate
      replace wanted = 0 if missing(wanted)
      In this method, it is assumed, and verified, that the combination of person, birthyear, and currentyear uniquely identify observations in the data. The code will break with an error message at the -isid- command if this is not true, and no results will be produced.

      -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

      I have above placed both the original data and the list of target matching years into tempfiles. I do that so I don't clutter up my hard drive with lots of little example files like these. But you can just have them in regular permanent files and refer to them accordingly.

      In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
      Last edited by Clyde Schechter; 27 Feb 2024, 18:25.

      Comment

      Working...
      X