Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Query around egen function

    Hello,

    I am re-running a Do file from my colleagues and trying to understand some of the codes they have written. However, I am struggling to understand one of them involving egen and max/min functions.

    For context: I have a data with patients attending clinics for diagnosis of a disease. 1 patient can have multiple attendances at the clinic and 1 attendance can have multiple rows depending upon the diagnosis.
    I have to create a new date variable where it will take the first attendance date the patient was diagnosed with disease X (coded by X), and this has to be populated for all of the attendances of that patient. (in order to flag that patient is diagnosed with the disease)

    The code that I have been given is :
    egen date_new = min(attendance_date/(disease_code == X)), by(clinic_code patient_id)

    where attendance_date is date of the attendances; disease_code is the codes given at the time of diagnosis; clinic_code is the code for clinics the patient attended; pateint_id is the unique patient ID.

    In the above code, I do not understand the use of backward slash (/) with the min function.

    I would really appreciate any help with this please.

    Thank you
    Kritika

  • #2
    What you call backward slash is division. You are dividing by a variable that is either 0 or 1. If the divisor is 0, the result is missing and ignored.

    See Section 10 of https://journals.sagepub.com/doi/pdf...867X1101100210

    Some friends regard this use as "too clever by half" and indeed because it sometimes causes puzzlement I too would now tend to write

    Code:
    min(cond(disease_code == X, attendance_date, .))
    not

    Code:
    min(attendance_date/(disease_code == X))
    I would say forward slash here.
    Last edited by Nick Cox; 02 Oct 2024, 11:14.

    Comment


    • #3
      Thank you for the reply Nick. That was helpful!

      Regards
      Kritika

      Comment


      • #4
        It is an accident of notation that

        Code:
         attendance_date/(disease_code == X)
        is close to
        Code:
        attendance_date | (disease_code == X)
        where | indicates "given that" or "conditional on" as seen in elementary orobability, say pr(A | B) Even odder: the syntax above is legal in Stata but not at all what you want! That is why the cond() function was introduced. Trivium: This | notation for conditioning was introduced by Sir Harold Jeffreys about 90 years ago.
        Last edited by Nick Cox; 03 Oct 2024, 06:13.

        Comment

        Working...
        X