Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate newvar

    I want to create a new variable "dependent" that should include the count of the values from another variable "age". I run the following command but it doesn't work:
    egen dependent = count(age <= 15 & age>=65), by(hhid).

    Please help me to solve this problem. My data is like this:
    hhid age sex
    1201 65 1
    1201 22 2
    1201 1 1
    1201 11 2
    1202 5 1
    1202 45 1
    1202 40 2
    1203 36 2
    1203 14 2
    1203 34 1

    And my results should be like this:
    hhid dependent (no)
    1201 3
    1202 1
    1203 1
    Last edited by shakir malik; 27 Nov 2024, 11:18.

  • #2
    Code:
    egen dependent = total(age <= 15 & age>=65), by(hhid)
    Watch out for missing values, which count as more than 65.

    Your code "works"; it just doesn't do what you want. count() counts non-missing values. The expression

    Code:
    age <= 15 & age>=65
    is evaluated as 1 if true and 0 if false. Either way, it's not missing, so you end up just counting observations.

    Comment


    • #3
      Thanks Nick.

      I tried

      egen dependent = total(age <= 15 & age>=65), by(hhid)

      but it generates '0' under variable "dependent" for all observations.

      I also tried:

      bys hhid : egen dependent=count( hhid ) if age<=15

      The result gives the count of those who are under 15 years. But i need the count of those who are under 15 years and above 65 years for each hhid.

      So, I run the following command:

      bys hhid : egen dependent=count( hhid ) if age<=15 & age>=65

      This time again, all the observations filled with '0'.
      Last edited by shakir malik; 27 Nov 2024, 11:54.

      Comment


      • #4
        Sorry; I should have spotted that problem too. You want

        Code:
        age <= 15 | age>=65
        It's impossible to be both under 16 AND over 64. You want the observations that have under 16 and the observations that have over 64, but that is the union of two subsets, not their intersection.

        Code:
        !inrange(age, 16, 64)
        would be another way to do it.
        Last edited by Nick Cox; 27 Nov 2024, 11:56.

        Comment


        • #5
          Here is my suggestion:

          Code:
          clear
          input float(hhid age sex)
          1201 65 1
          1201 22 2
          1201  1 1
          1201 11 2
          1202  5 1
          1202 45 1
          1202 40 2
          1203 36 2
          1203 14 2
          1203 34 1
          end
          
          gen age_flag = age <= 15 | age >= 65 if !missing(age)
          
          egen dependent = sum(age_flag), by(hhid)
          
          list hhid dependent

          Comment


          • #6
            Let's discuss the possibility of missing values in age more systematically,

            If you want to ignore missing values, here are two more ways to do that.


            Code:
            egen wanted = total(cond(age < ., age <= 15 | age >= 65, .)), by(hhid)
            
            egen wanted2 = total(age <= 15 | (inrange(age, 65, .)), by(hhid)
            Note that Ali Gokhan Yucel used egen, sum(). That will work, but it's been an undocumented synonym of egen, total() since Stata 9, some 21 years.

            Comment


            • #7
              Nick Cox and Ali Gokhan Yucel Thank you very much. Both commands works.

              Thanks again!!!

              Comment

              Working...
              X