Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create bins from age variable?

    Hello everyone,

    I want to create bins from a age variable (which contains values from 15 up to 64). Any idea how can I do that correctly? I need to do a binscatter plot by age levels afterwards.

    My idea was to do the following:

    Code:
    egen age_bins = cut(ca_age), group(8)
    but when I sum by 8 values, I don't obtain the sum that I would rather obtain. Here is a dataex sample:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int ca_age float age_bins
    37 2
    40 3
    30 1
    33 1
    34 2
    54 5
    37 2
    62 7
    56 6
    39 2
    50 4
    47 4
    49 4
    45 3
    58 6
    60 7
    52 5
    17 0
    46 4
    47 4
    64 7
    34 2
    53 5
    50 4
    52 5
    53 5
    53 5
    29 1
    59 6
    31 1
    39 2
    34 2
    40 3
    48 4
    55 6
    29 1
    24 0
    35 2
    59 6
    20 0
    40 3
    52 5
    44 3
    58 6
    61 7
    32 1
    31 1
    52 5
    51 5
    35 2
    end
    label values ca_age ca_age
    Thanks a lot in advance for the help.
    Best,

    Michael

  • #2
    So, what is the result you would rather obtain or what would be "correct"? I can't see anything on that in the question beyond a complaint that egen, cut() doesn't do what you want, which I can believe.

    I never use egen, cut() because

    1. I can't remember what its rules are at bin limits -- or beyond them.

    2. More crucially, no-one reading any code that mentions will find it self-evident what it does just from being mentioned.

    But https://www.stata-journal.com/articl...article=dm0095 is a review of binning rules that are more transparent, and some others.

    Backing up, I suggest that lpoly and npregress are usually better methods that don't require degrading your data by binning them.

    For age that could be crucial for age ranges say 16 to 25 or 60 up which might contain crucial thresholds such as leaving school or retiring from paid employment. binscatter (from SSC, as you're asked to explain) is sometimes but not always competitive with such methods that don't bin first.

    Comment


    • #3
      Thank you for the suggestions.
      I will have a look at the URL that you send me.

      Basically, what I wanted was:

      Code:
      gen age_bins = 1 if (inrange(ca_age,15,24))
      replace age_bins = 2 if (inrange(ca_age,25,34))
      replace age_bins = 3 if (inrange(ca_age,35,44))
      replace age_bins = 4 if (inrange(ca_age,45,54))
      replace age_bins = 5 if (inrange(ca_age,55,64))
      But the code above seems to work. Thank you.

      Michael

      Comment


      • #4
        That should indeed work.

        Code:
        gen wanted = ceil((ca_age-14)/10)
        is more concise, and also more cryptic.

        Comment

        Working...
        X