Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting

    Hello,
    I have a database in Stata and I need to count the number of nodules in each patient.
    The first column gives patient ID and the second column gives nodule. Is there a command that I can use to count all the nodules in one patient ID? For example, I see in my data multiple rows have patient ID 2 and the next column gives nodule. Can I add up all the nodules in the second column for patient ID2 and so on?

    clear
    input float(patient_id nodule)
    1 1
    2 1
    2 3
    2 4
    3 1
    4 4
    4 5
    5 2
    5 1
    5 3
    5 4
    5 5
    6 2
    6 3
    7 1
    8 2
    9 5
    9 3
    10 1
    10 6
    end
    [/CODE]

  • #2
    Yes. There are a few ways to do it. If you want to add it as a new variable in the existing data set:

    Code:
    by patient_id, sort: egen total_nodules = total(nodule)
    If you want a different data set with just one observation per patient_id, that showing the total number of nodules:
    Code:
    collapse (sum) nodule, by(patient_id)
    If you just want a listing of patient_id's and total number of nodules, without modifying the data set in memory:
    Code:
    levelsof patient_id, local(patients)
    foreach p of local patients {
        summ nodule if patient_id == `p', meanonly
        display "Patient_id `p' has `r(sum)' nodules"
    }

    Comment


    • #3
      Thank you so much That was very fast

      Comment


      • #4
        Your description is not clear, or at least, I see it in a different way than Clyde does.

        Clyde thinks that your nodule variable is a quantity of nodules, and so you want the total of the individual counts.

        I think that your nodule variable is an identifier of an individual affected nodule, and what you want to do is count the number of affected nodules in each patient.

        Post #1 doesn't tell us what answer you expect for patient 2.

        I am inclined to trust Clyde's interpretation, because he is an epidemiologist and I'm something else (not sure what).

        So if the answer you want for patient 2 is 8, then Clyde's code examples are what you want.

        But if the answer you want for patient 2 is 3, then
        Code:
        by patient_id, sort: generate total_nodules = _N
        or
        Code:
        collapse (count) nodule, by(patient_id)
        should give you that answer, at least for your example data.

        Comment

        Working...
        X