Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How best to deal with data of lab. test results?

    For example, hba1c test results are in numeric form. However once it detected below the normal range (4.4-6.4), data will be indicated simply as <4.3. Because of this, data is stored as string instead of numeric.

    Any advise on how best to deal with such data?

  • #2
    This is very unclear to me. A listing of some data, code and output could help. How is hba1c coded when it is not below the normal range? What do you want to do with the variable -- is it an independent variable or a dependent variable?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Just noticed this is a duplicate post. Richard Williams' response is pretty similar to my response to the other post. If the original poster responds, it would be best to respond to only one of these threads and close out the other.

      Comment


      • #4
        Originally posted by Mytok View Post
        . . . data is stored as string instead of numeric.

        Any advise on how best to deal with such data?
        Apparently, your major concern is to populate a numeric variable with those string values that represent numeric values. For that, you can try something like that below.
        Code:
        version 13.1
        
        clear *
        set more off
        
        input str5 hba1c
        "7.6"
        "6.4"
        "5.1"
        "4.4"
        "<4.3"
        end
        
        *
        * Begin here
        *
        quietly generate double hba1c_n = real(hba1c)
        format hba1c_n %3.1f
        quietly replace hba1c_n = .l if trim(hba1c) == "<4.3"
        label define HbA1c .l "<4.3"
        label values hba1c_n HbA1c
        // Based upon the purpose of the clinical laboratory test, I assume
        // that the following two lines are unnecessary.
        quietly replace hba1c_n = .u if trim(hba1c) == ">6.5"
        label define HbA1c .u ">6.5", add
        
        list, noobs
        
        exit
        As to how best to deal with the situation, as both Richard and Clyde have mentioned, it depends upon the purpose of your activity. You might not need to do much more than what's above in the do-file, or to take advantage of one or more of Stata's estimation commands for these kind of data. Or you might end up needing to go back to the source and retrieve the actually measured values for those below-normal-range values that weren't reported.

        Comment


        • #5
          Mytok (please, as per FAQ, re-register with your full name surname, too. Just click on the Contact us button and follow the instructions):
          - if hba1c is your "censored from below" continuous dependent variable to be regressed on a set of predictors, you may want to take a look at - help tobit - and related entry in Stata 13.1 .pdf manual (especially Example 1).

          Kind regards,
          Carlo
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            I agree that the optimal solution depends on the purpose. Here I suggest a solution where the low hba1c measurements get a not too unrealistic value (kind of simple imputation), which may be better than giving them a missing value:

            Code:
             clear
            input str5 hba1c
            "7.6"
            "6.4"
            "5.1"
            "4.4"
            "<4.3"
            end
            
            destring hba1c , generate(hba1c_n) ignore("<")
            recode hba1c_n (4.3=4)
            label define hba1c_n 4 "<4.3"
            label values hba1c_n hba1c_n
            
            . list, nolabel
                 +-----------------+
                 | hba1c   hba1c_n |
                 |-----------------|
              1. |   7.6       7.6 |
              2. |   6.4       6.4 |
              3. |   5.1       5.1 |
              4. |   4.4       4.4 |
              5. |  <4.3         4 |
                 +-----------------+
            
            . list
                 +-----------------+
                 | hba1c   hba1c_n |
                 |-----------------|
              1. |   7.6       7.6 |
              2. |   6.4       6.4 |
              3. |   5.1       5.1 |
              4. |   4.4       4.4 |
              5. |  <4.3      <4.3 |
                 +-----------------+

            Comment


            • #7
              yes, replacing by values below the limit of detection is in general what you want; however, I strongly recommend that you do a sensitivity analysis by repeating the analysis using different replacement values; you could even do this within a "missing data" situation by using MI (with a model that restricts the imputed values to be no higher than your limit)

              Comment


              • #8
                A better procedure for handling observations below the limit of detection is to treat them as left-censored observations and analyze with survival data programs.. There are two approaches in Stata. The first is to use the ordinary Kaplan-Meier estimate on a reversed time scale. The second is to use Patrick Royston's stpm module, downloadable from SSC; it models left-censored data as a special case of interval censoring. See also Gillespie et al. (2010). I illustrate both below.

                Now a personal note: Long-time Statalist etiquette, discussed in the FAQ, has been to register with full real names. This practice has promoted professionalism and friendship on the list and you can see that it is followed by every responder to your question so far. I urge you re-register with your real name to enjoy the full benefits of being on Statalist. Just use the Contact Us button on the bottom right of the page.


                Code:
                /*  If below =1, the real observation fell below the limit of detection
                indicated by the corresponding value of  x */
                clear
                input x below
                0.5   1
                1     0
                1     0
                2.3   1
                3     0
                4     0
                5     0
                5.4   1
                6     0
                7     0
                11     0
                12    0
                end
                
                label var below " Below LOD"
                sum x
                local xmax = r(max)
                /* Reverse Values: make them positive starting at 1 */
                gen rx = -x + `xmax' + 1
                stset rx, fail(below=0) /* Note */
                
                stsum  /* Get quantiles for original  x */
                di "v50 = "-r(p50) + `xmax' +1
                di "v25 = "-r(p75) + `xmax' +1
                di "v75 = "-r(p25) + `xmax' +1
                
                /* Generate Cumulative distribution function for x
                This is found from the survival curve for rx */
                
                sts gen cif1 = s
                label var cif "Cumulative Distribution"
                
                
                /* Compare to stpm */
                stset x
                gen leftv = _t
                replace leftv =0 if below
                replace _d=0 if below
                
                stpm , left(leftv) scale(hazard) df(3)
                predict cif2, failure
                
                scatter cif1 cif2 x, sort(x) c(l l)
                Reference: Gillespie, Brenda W, Qixuan Chen, Heidi Reichert, Alfred Franzblau, Elizabeth Hedgeman, James Lepkowski, Peter Adriaens, Avery Demond, William Luksemburg, and David H Garabrant. 2010. Estimating population distributions when some data are below a limit of detection by using a reverse Kaplan-Meier estimator. Epidemiology 21, no. 4 (Supplement): S64-S70.
                Last edited by Steve Samuels; 17 Sep 2014, 20:26.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Steve's suggestion is probably optimal for describing a distribution when information about some of the observations is imperfect (however, predict does not allow the failure option). But I don't see how to use it for analyzing hba1c as an outcome or as a predictor of some outcome.

                  For hba1c (a measure of long-term regulation of blood glucose in diabetes) the low values are considered "normal", and hba1c as a predictor could be categorized with these values as the reference category. For hba1c as on outcome the solution is less obvious, I think. I would tend to use the method I suggested in post #6.

                  Svend

                  Comment


                  • #10
                    Dear Steve,
                    I found your following post very useful.
                    Although I have few questions:
                    1. Do you replace the censored observations from the variable cif1?
                    2. I can’t run the command “predict cif2, failure” after stpm it doesn’t allow the failure option. When I run it without the failure option it generates a flat variable with value -2.777787 for each observation. Am I doing something wrong?
                    3. Do you have any update on this topic?
                    Bayzid

                    Comment


                    • #11
                      I'm sorry to say that I know nothing more to add to this topic. I don't know why the predict statement isn't working for you and Sven. The code above worked fine for me just now in Stata 14.2 and "failure" is listed as a possible statistic for predict in the Help for stpm..

                      When stpm predicts for the values <1 it is extrapolating from the area of known data. Extrapolation like this always dangerous, so I agree with the suggestions of trying different methods.
                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment

                      Working...
                      X