Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get rangestat(sum) to incorporate missings

    Using example data below, the rangestat(sum) code I have will produce a quantity that sums how many siblings a child had that attended a program in a given year. So for example, in 2002, child_id 301 had both of their siblings attend, so they get a value of 2.

    What I can't figure out is that, even when a child has all their siblings missing on binary_attend_SELF in a given year (like 1994 in this example data), the rangestat(sum) produces a 0 when I want it to produce a missing. How can I get it to do this?

    Code:
    clear
    input child_id mother_id year binary_attend_SELF
    301 3 1994 .
    301 3 1996 0
    301 3 1998 1
    301 3 2000 0
    301 3 2002 0
    301 3 2004 .
    301 3 2006 .
    301 3 2008 .
    301 3 2010 .
    302 3 1994 .
    302 3 1996 .
    302 3 1998 0
    302 3 2000 1
    302 3 2002 1
    302 3 2004 0
    302 3 2006 .
    302 3 2008 .
    302 3 2010 .
    303 3 1994 .
    303 3 1996 .
    303 3 1998 .
    303 3 2000 .
    303 3 2002 1
    303 3 2004 0
    303 3 2006 0
    303 3 2008 0
    303 3 2010 .
    end
    
    rangestat (sum) binary_attend_SIBSUM = binary_attend_SELF, excludeself interval(year 0 0) by(mother_id)
    list child_id year binary_attend_SIBSUM

  • #2
    This is standard Stata procedure. If the sum of missing and 42 is 42 on the grounds that missing should be ignored, then the sum of missing is equivalent to zero.

    All you need to do is replace 0 with missing if you are confident that that is what 0 means. .

    Comment


    • #3
      I might just be overlooking something obvious here, but what would that missing statement be to replace 0 with missing? I only want the new variable's value to be missing if it's missing for all of the siblings (in a given year), not just in any individual year.

      In other words, I'm okay with a 0 and a missing summing to 0, but don't want a missing and missing to sum to 0. I tried using a few tricks with bysort: egen but couldn't figure it out.
      Last edited by Garrett Baker; 27 Oct 2024, 13:14.

      Comment


      • #4
        Your example data does not meet the requirement you want for replacing missing by zero, as there is only one mother in the example and there are non-missing values of binary_attend_SIBSUM for all of her children. So the following code is untested, but I believe it does what you want:
        Code:
        by mother_id (child_id), sort: egen nm_obs = count(binary_attend_SELF)
        by mother_id child_id: egen SELF_nm_obs = count(binary_attend_SELF)
        by mother_id child_id: gen sib_nm_obs = nm_obs - SELF_nm_obs
        replace binary_attend_SIBSUM = . if sib_nm_obs == 0
        If this is not correct, when posting back, please give a new data example that includes some situations where you want missing replaced with zero. (Or if the example in #1 actually does have such a situation, please point it out to me explicitly and explain how, as I must have misunderstood your intentions.)

        Comment


        • #5
          Thanks for the example Clyde. Sorry if my original description wasn't clear. What I want is to count the number of siblings who attended within year.

          So in my example data, if you imagine 301 as the focal child--in 1994 and in 1996, both of their siblings (302 and 303) has a missing value for binary_attend_SELF. When I use the rangestat(sum) code, it combines those two missing values in a way that treats them as 0, so the resulting value equals 0 for binary_attend_SIBSUM. But I would really want the resulting value there to be missing, not 0.

          Does that make more sense? I don't think it would matter if I added another mother to the example, but I can if it would be necessary in a way that I'm not thinking of now.

          Comment


          • #6
            OK. The reason I misunderstood you is that in #3 you said "I only want the new variable's value to be missing if it's missing for all of the siblings (in a given year), not just in any individual year." [emphasis added] Now you say that you do want to do this by year. So, if I understand correctly now, that's a very minor modification of #4:
            Code:
            by mother_id year (child_id), sort: egen nm_obs = count(binary_attend_SELF)
            by mother_id year child_id: egen SELF_nm_obs = count(binary_attend_SELF)
            by mother_id year child_id: gen sib_nm_obs = nm_obs - SELF_nm_obs
            replace binary_attend_SIBSUM = . if sib_nm_obs == 0

            Comment


            • #7
              Ah sorry about that, I guess what I should've said is "any individual child-year". This is exactly what I was looking for, thanks!

              Comment

              Working...
              X