Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorizing ages from a varlist

    Hello!

    Each row of my dataset represents a family, and within the family I have each family member's age listed under the variables age1 - age14. What I'd like to do is summarize how many people in the total dataset are under 5 years old, how many are 6 - 10 years, 11 - 15 etc. and compile it under one new variable called agegrp. Through past posts I was able to figure out the following command to successfully generate a new variable for all individuals in the sample that are under 6, but to generate a new variable for each age group seems like an inefficient way to go about this.

    egen agegrp1 = anycount(age*), value(0,5)

    I feel like this is likely a job for a loop of some sort but I can't seem to figure it out. Any guidance would be greatly appreciated!

  • #2
    Perhaps someone else will be able to address your question. I'm not quite able to figure out what you want the variable agegrp to be. It seems to me you are thinking of your data as thought it were a spreadsheet, and off to the right you want to add a column with, say, the first 20 rows having counts by age group. This is not in general a productive way to use data in Stata. It's hard to figure out what to do in this case, because you don't tell us how you intend to make use of these counts by age group.

    But if you get no advice, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small sample of your data, including the family ID and a few of the age variables (maybe just age1-age5). In particular, please read FAQ #12 and use dataex when posting sample data to Statalist.

    Comment


    • #3
      Hi William, thank you for highlighting the room for improvement in my post, hopefully the following clarifies what I'm trying to do?

      In discussing the demographics of my sample, I would just like to present a table demonstrating how many children I have under 5, how many are between the ages of 6 - 10, 11 - 15, 16 - 20 etc.

      Right now, my data looks something like this, where each _id represents a family with multiple family members of varying ages:

      _id age1 age2 age3 age4 age5
      1 50 23 10 5 2
      2 34 100 22 18 15
      3 67 18 59 21 11
      4 45 24 23 14 16
      5 78 57 28 24 3

      Ideally, I'd like one variable called agegrp, where agegrp=1 is a sum of age1 - age5 that are less than or equal to 5; agegrp=2 is is a sum of all age1 through age5 that are between 6 to 10. Such that I could tab agegrp and get a count of how many individuals in my sample are children under 5, children 6 to 10 years old,...,up through adults greater than 70 years old.

      Using:

      egen agegrp1 = anycount(age*), value(0,5)

      This successfully counted all children between 0 and 5, but following this methodology, I'd need to create a new variable for each age group of interest which seems like an inefficient way to go about this. Furthermore, my attempts of using the count function and creating a loop doesn't work as the count function doesn't appear to work with variable lists and only with explicit variables.

      Any thoughts or perhaps something I need to clarify further?

      Thank you!

      Comment


      • #4
        First, that would be great to present data under code delimiters, hence I did it for you.

        Second, I gather you need to reshape long beforehand. Please see below:

        Code:
        . input _id age1 age2 age3 age4 age5
        
                   _id       age1       age2       age3       age4       age5
          1. 1 50 23 10 5 2
          2. 2 34 100 22 18 15
          3. 3 67 18 59 21 11
          4. 4 45 24 23 14 16
          5. 5 78 57 28 24 3
          6. end
        
        . reshape long age, i(_id) j(subjects)
        (note: j = 1 2 3 4 5)
        
        Data                               wide   ->   long
        -----------------------------------------------------------------------------
        Number of obs.                        5   ->      25
        Number of variables                   6   ->       3
        j variable (5 values)                     ->   subjects
        xij variables:
                             age1 age2 ... age5   ->   age
        -----------------------------------------------------------------------------
        
        . list
        
             +----------------------+
             | _id   subjects   age |
             |----------------------|
          1. |   1          1    50 |
          2. |   1          2    23 |
          3. |   1          3    10 |
          4. |   1          4     5 |
          5. |   1          5     2 |
             |----------------------|
          6. |   2          1    34 |
          7. |   2          2   100 |
          8. |   2          3    22 |
          9. |   2          4    18 |
         10. |   2          5    15 |
             |----------------------|
         11. |   3          1    67 |
         12. |   3          2    18 |
         13. |   3          3    59 |
         14. |   3          4    21 |
         15. |   3          5    11 |
             |----------------------|
         16. |   4          1    45 |
         17. |   4          2    24 |
         18. |   4          3    23 |
         19. |   4          4    14 |
         20. |   4          5    16 |
             |----------------------|
         21. |   5          1    78 |
         22. |   5          2    57 |
         23. |   5          3    28 |
         24. |   5          4    24 |
         25. |   5          5     3 |
             +----------------------+
        
        recode age 0/5 = 0 6/10 = 1 11/15=2 16/20=3 21/25=4 26/max=5 , gen(agegrp)
        tab agegrp
        
          RECODE of |
                age |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |          3       12.00       12.00
                  1 |          1        4.00       16.00
                  2 |          3       12.00       28.00
                  3 |          3       12.00       40.00
                  4 |          6       24.00       64.00
                  5 |          9       36.00      100.00
        ------------+-----------------------------------
              Total |         25      100.00

        Hopefully that helps
        Last edited by Marcos Almeida; 06 Nov 2018, 08:39.
        Best regards,

        Marcos

        Comment


        • #5
          You may also label it in the same process:

          Code:
          drop agegrp
          recode age (0/5 = 0 up_to_five_years) (6/10 = 1 from_6_to_10) (11/15=2 from_11_to_15) (16/20=3 from_16_to_20) (21/25=4 from_21_to_25) (26/max=5 over_26_years) , gen(agegrp)
          tab agegrp
          
             RECODE of age |      Freq.     Percent        Cum.
          -----------------+-----------------------------------
          up_to_five_years |          3       12.00       12.00
              from_6_to_10 |          1        4.00       16.00
             from_11_to_15 |          3       12.00       28.00
             from_16_to_20 |          3       12.00       40.00
             from_21_to_25 |          6       24.00       64.00
             over_26_years |          9       36.00      100.00
          -----------------+-----------------------------------
                     Total |         25      100.00
          Best regards,

          Marcos

          Comment


          • #6
            I would just add that a variable of the form

            Code:
            gen age5 = 5 * ceil(age/5)
            returns values that are 0, 5, 10, etc. without need for multiple operations, value labels, or whatever.

            Comment


            • #7
              Amazing, this was exactly what I was looking for! Sorry for the extra trouble you had to go through to help me. Many thanks, Marcos!

              Comment


              • #8
                Hi. I was wondering if there was a more efficient way to code age into groups as I have done below. To avoid the dummy trap, I need to make one group a reference group (probably the first group). Also, the last group is for 65 and above. I'm not sure if there is a convention of number of years per group, so I did 10.

                gen age1524 = 0
                replace age1524 = 1 if age>14 & age<25
                gen age2534 = 0
                replace age2534 = 1 if age>24 & age<35
                gen age3544 = 0
                replace age3544 = 1 if age>34 & age<45
                gen age4554 = 0
                replace age4554 = 1 if age>44 & age<55
                gen age5564 = 0
                replace age5564 = 1 if age>54 & age<65
                gen age65plus = 0
                replace age65plus = 1 if age>64

                Thank you in advance.

                Comment


                • #9
                  Making your own 0/1 variables is worthwhile when you are learning what dummy (indicator) variables are and how they work, but once you're past that, the best practice is to let Stata do that for you, through the mechanism of so-called factor variables, notated as "i.MyXVariable" See -help fvvarlist-. By default, Stata will omit the lowest-number category of a categorical variable referenced with the i.MyXVariable notation, but you can choose other categories.

                  And, as a side note, there isn't any "trap" regarding indicator variables. Any modern statistical software (i.e., probably anything post-1980) will drop out one of your indicator variables if you include a redundant list of them while using your do-it-yourself approach to them. It might not drop out the one you want, but it's not a trap.

                  A Stata-ish way to do what you want is:
                  Code:
                  // See -help recode-
                  recode age (15/24= 1 "15-24") (25/34=2 "25-34") ...[you fill in the rest]... (65/max = .. "65 +" ), generate(agecat)
                  regress y i.agecat
                  An example:
                  Code:
                  sysuse auto
                  regress price i.rep78

                  Comment


                  • #10
                    Mike Lacy gives excellent advice. Here's some more. Purely as a matter of technique the code in #8 could be re-written as say

                    Code:
                    local HI 24 34 44 54 64 200 
                    
                    forval lo = 15(10)65 { 
                        gettoken hi HI : HI 
                        gen age`lo'`hi' = inrange(age, `lo', `hi') 
                    }
                    
                    rename age65200 age65plus
                    At the same time, ages split by arbitrary breaks are just that. Perhaps there is a parameterisation using a polynomial or spline that matches the underlying process.

                    Comment


                    • #11
                      Thank you Mike Mike Lacy that was very helpful. I appreciate your time.

                      Comment


                      • #12
                        Thank you very much Nick Nick Cox. I'm just learning how to do loops ... Is "lo" "hi" there to describe low age, high age? If I wanted to add labels to each age category, should I do that as a separate line afterwards (e.g. below) or is there a way to incorporate that in the loop code you provided?

                        "label define agelab 1 "[1] 1-24" 2 "[2] 25-39" 3 "[3] 40-54" 4 "[4] 55-69" 5 "[5] 70-plus", modify"
                        "label values age agelab"

                        Comment


                        • #13
                          Defining and attaching the labels through separate commands is fine.

                          Comment


                          • #14
                            Thank you Nick Nick Cox

                            Comment


                            • #15
                              Dear friends,

                              I want to create a group var if the age is in the ranges.My code is below,
                              Code:
                              local HI 9 19 29 39 49 59 69 79
                              forval lo = 0 ( 10)80 {
                                  gettoken hi HI : HI
                                  gen  gr = 1 if age= inrange ( age, `lo ' , `hi ')
                                   replace  gr= 2 if if age= inrange ( age, `lo'+1, `hi'+1)
                              }}
                              It is like,

                              gen gr= 0

                              replace gr= 1 if age>=0 & age<10

                              replace gr = 2 if age>=10 & age<20
                              ......


                              Thank you!

                              Last edited by Bright Tree; 15 Apr 2020, 19:55.

                              Comment

                              Working...
                              X