Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • scale composite accounting for missing data

    Hi all, I searched on this topic, didn't find anything, but apologies if I used the wrong terms. I'm computing many scale composites. Some of them have missing data. I'm using code like this:


    gen status = (status_15_r+status_16+status_17+status_18)/4

    OR

    egen status = rowtotal(status_15_r status_16 status_17 status_18)
    replace status = status/4



    1) is there a cleaner way to do the above?

    2) how best can I account for missing data? I'd like to compute this scale if 3 of the 4 variables are present, and otherwise generate a missing value for the composite. Also, at present, the resulting value isn't divided by the right number of items if anything is missing.

    Thank you!

  • #2
    2) how best can I account for missing data? I'd like to compute this scale if 3 of the 4 variables are present, and otherwise generate a missing value for the composite.
    What is "best" really depends on what you are planning to do with these composite scores. I know it is quite common practice to have rules like the one you suggest, i.e. computing a score if n out of N items are non-missing, where N is given by design and n is arbitrary chosen. The problem with such approach is, that it does not reflect the inherent uncertainty. In my opinion it is cleaner to compute the composite score only for those respondents that have no missing values, then multiply impute the values for the rest of the sample. But, as I said, whether this is "best" depends on what you are doing with these scores.

    Best
    Daniel
    Last edited by daniel klein; 12 Mar 2015, 03:12.

    Comment


    • #3
      If you read about the other functions documented in help egen whose names start with row then you will see the tools you need. You can always calculate a row mean from whatever is non-missing but choose to ignore it if the number of values used is too small for your purposes. Three steps are needed but all are easy.

      Comment


      • #4
        Thank you Daniel and Nick. I appreciate you mentioning multiple imputation as a better approach. To clarify, with the original (flawed) approach, assuming an arbitrary rule of no more than 25% missing data, is there a simpler way to do this?

        egen miss = rowmiss(cc_id_*)
        egen nonmiss = rownonmiss(cc_id_*)
        egen composite = rowmean(cc_id_*) if miss ≤ (.25*nonmiss)

        Comment


        • #5
          No more than 25% of the data being missing can be stated in terms of either #missing/#variables or #non-missing/#variables. I wouldn't define it in terms of #missing/#non-missing.

          In your specific case, you want 3 or 4 out of 4 to be defined so

          Code:
           
          egen mean = rowmean(cc_id*)  
          egen rownonmiss = rownonmiss(cc_id*) 
          replace mean = . if rownonmiss >= 3
          It could be done in one line in terms of a condition like

          Code:
          (!missing(a) + !missing(b) + !missing(c) + !missing(d)) >= 3
          but that's too messy to generalise to more variables.

          Comment


          • #6
            Thank you Nick, very helpful. I've used a lot of your posts over the years, by the way. I think you meant to reverse the sign in line 3 of that first box.

            Code:
            egen envid1 = rowmean(cc_id*)  
            egen rownonmiss = rownonmiss(cc_id*)
            replace envid1 = . if rownonmiss <= 2

            Comment


            • #7
              Good catch. Too late to edit. Thanks.

              Comment


              • #8
                Not to late to edit for me ...

                I like to work with rowmiss()


                Best
                Daniel
                Last edited by daniel klein; 12 Mar 2015, 14:19.

                Comment


                • #9
                  Daniel: Cameron is right. If there are 3 or 4 non-missing, the result is acceptable; otherwise he wants to overwrite with missing.

                  Comment

                  Working...
                  X