Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • For each string value loop command

    Hello all,

    I have a large database, and I'm trying to apply a function based on the value of a string variable (patient_id):
    patient_id encounter_id candidiasis event_date2 dx_date2 days cand2
    EAPH HQMLW 1 13-Jan-23 24-Aug-21 -507 1
    EAPH HwULW 1 13-Jan-23 13-Sep-23 243 1
    EAPH HQOLW 1 13-Jan-23 27-Jan-22 -351 1

    I want a command to replace cand2 to 0 if variable days is negative or below a certain threshold for all patient_id with the same value.

    Thank you so much. I'm really struggling and I have not come even close

    Warmly,

    Andres

  • #2
    I guess this means

    Code:
    bysort patient_id (days) : replace cand2 = 0 if days[1] < 0
    which overwrites cand2 with 0 if any value of days is negative for a given patient.

    However, that code would be dangerous if it's not what you want, so be cautious and consider whether

    Code:
    bysort patient_id (days) : gen wanted = cond(days[1] < 0, 0, cand2)
    produces what you want.

    Comment


    • #3
      Amazing Nick. Thank you so much. What about if I want all the cand2 values to be 1 if there is at least one days>0 AND the rest days<-1825

      Comment


      • #4
        Code:
        assert !missing(days)
        bysort patient_id (days) : replace cand2 = 1 if inrange(days[_N], 1, 1824)
        Notes:
        1. This code assumes and verifies that days never has a missing value. The code will break at the assert statement if this is not the case. A different approach is required when days can be missing. Post back if that applies to you.
        2. I don't know what days <- 1825 is supposed to mean. Did you mean < or did you mean <=? Or, less likely, something else? The code above interprets it as <. If you intended <=, change 1824 to 1825 in the code.

        Comment


        • #5
          I ment a negative value (- 1825) or 5 years in days. In other words, I want for each patient to be 1 for cand2 if any days values is + (>0) PLUS the rest of the days values are less (<) than negative (-)1825 or missing. Otherwise, it should return 0. Thank you so much. I feel I'm almost there

          Comment


          • #6
            Code:
            gen byte missing_days = missing(days)
            
            by missing_days patient_id (days), sort: gen cand2 = days[_N] > 0 ///
                & days[_N-1] < -1825 if !missing(days)
            by patient_id (cand2), sort: replace cand2 = cand2[1]
            Note: There is an edge case your problem description does not cover. What should happen if for some patient_id(s) there is only one non-missing value of days and it is, in fact, positive. Then the any values positive criterion is met. But the "rest of the days" are no days at all, so it is unclear whether to consider the condition that the "rest of the days" are <-1825 or missing is vacuously true. Or you might, for your purposes, need to have cand2 = 0 when there are no other days but the one.

            Comment


            • #7
              Thank you so much Clade. Fortunately I don't have missing values for days. The command works well but still misses two instances (1) when patient_id has a positive (>0) value and the other value(s) is less (<) than negative (-1825); and 2) when there is only one patient_id value and it is positive (>0):
              patient_id days cand2 desire
              EANT -2712 0 1
              EANT 66 0 1
              EQEH 128 0 1
              I really appreciate your input.

              Comment


              • #8
                Code:
                assert !missing(days)
                by patient_id (days), sort: gen cand2 = days[_N] > 0 ///
                    & (days[_N-1] < -1825 | _N == 1)
                will do it.

                Comment

                Working...
                X