Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop with 3 criteria with variables changing year and month

    Hi all

    I have a complicated problem (in my view at least) and would very much appreciate any help to make my coding less time consuming.

    For each month in the years 2014 to 2021 I need to make a variable counting the number of individuals who 1) are living in a specific geografic area 2) are living without income the given month and the two months preceding 3) are between 15-29 years all days of the given month.

    This is done using 3 variables:
    income'yyyy'_'mm': showing the income status for the individual in year 'yyyy' and month 'mm', where '.' means no income.
    area'yyyymm': showing the geografic area for the individual in year 'yyyy' and month 'mm'. The relevant area codes are listed below and does not change throughout the period.
    bday: showing the birthday of the individual.

    For October 2021 I can do this writing:

    Code:
    gen cntinc_2021_10=0
    replace cntinc_2021_10 =1 if ///
    income2021_08-income2021_10==. ///
    & inlist(area202110, 615, 657, 661, 665, 671, 706, 707, 710, 727, 730, 740, 741, 746, 751, 756, 760, 766, 779, 791)
    & inrange(bday, td(1/11/1992), td(1/11/2006)
    However I need to do this for all months in the all the years which is very time consuming. I have the feeling there should be some loop solving my problem (using the command). I have read all the guides and articles about the foreach / forvalues commands without coming up with a solution of my own. Maybe there is not a solution which taking all three criteria into consideration at one time, but just solving one or two of them would save up a lot code, as I suppose I could add the third criteria manually later. Also, if you have suggestions for a more informative title to my problem, I would happily edit the title of the post.
    Last edited by Emil Alnor; 23 Feb 2022, 06:35.

  • #2
    As a detail note that


    Code:
     
     income2021_08-income2021_10==.
    means that the difference between income2021_8 and income_2021 is missing. It does not mean that all the variables income2021_08-income2021_10 are all missing. So, - means subtraction, and the wildcard interpretation doesn't apply here.

    More broadly, I note that this looks difficult because it is. This problem and many others would be easier after reshape long.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      As a detail note that


      Code:
      income2021_08-income2021_10==.
      means that the difference between income2021_8 and income_2021 is missing. It does not mean that all the variables income2021_08-income2021_10 are all missing. So, - means subtraction, and the wildcard interpretation doesn't apply here.
      Thanks for the note Nick, this is indeed not what I am trying to do. I was convinced you could use - to refer to a range of variables. At least I am fairly sure this works for other commands (but please correct me if I am wrong). Would there be any other convinient way of refering to the variablelist 'income2021_08 income2021_09 income2021_10'?

      Ok, I will try to approach the problem after reshaping.

      Comment


      • #4
        Code:
        min(income202108, income2021_09, income2021_10)
        will be missing whenever all those variables are missing. For handling more variables, it might be easier to use one of the row*() functions provided under egen.

        Comment

        Working...
        X