Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace specific values in all rows corresponding to each ID

    Hi there - I have a data set with multiple rows per ID and a column with a variable (var1) that varies from row to row. I would like all rows of var1 to be replaced with PTSD Present for every ID, but only if there is at least one observation in var1 that equals PTSD Present for each ID; otherwise, keep it as is. Below is an example of the data. For IDs 1, 2, and 4, I would like all rows to have the value of PTSD present because there is at least one value of PTSD present for the ID. For ID 3, both rows are the same so this should stay as is. I am using Stata/SE 17.0

    eoc_id is the ID variable
    icd_ptsd_present is a float variable (0=PTSD Absent; 1=PTSD Present)
    eoc_id icd_ptsd_present
    1 PTSD Absent
    1 PTSD Present
    1 PTSD Present
    1 PTSD Absent
    2 PTSD Absent
    2 PTSD Absent
    2 PTSD Present
    3 PTSD Present
    3 PTSD Present
    4 PTSD Absent
    4 PTSD Present
    I have tried using the Stata bysort and replace commands indicated below; however, upon manual data inspection, this did not work appropriately within the IDs. For example, within some IDs, all rows for this var were replaced with 'PTSD Absent.'
    code: bysort eoc_id (icd_ptsd_present): replace icd_ptsd_present = icd_ptsd_present[1]

    Thank you for your help!
    Last edited by whitney wortham; 02 Jul 2024, 15:03. Reason: Edited to include Stata version

  • #2
    Try
    Code:
    bysort eoc_id: egen icd_ptsd_present_aux = max(cond(icd_ptsd_present=="PTSD Present",1,0))

    Comment


    • #3
      For example, within some IDs, all rows for this var were replaced with 'PTSD Absent.'
      code: bysort eoc_id (icd_ptsd_present): replace icd_ptsd_present = icd_ptsd_present[1]
      Yes, that is what would happen. icd_ptsd_present is a string variable. When string variables are sorted, they are put into alphabetical order (well, actually, ASCII order, but same result with this data). So PTSD Absent will precede PTSD Present. That means that if any eoc_id has any observations with icd_ptsd_present = "PTSD Absent", those will sort to the top, and, in particular, will populate the first observation for that eoc_id. What you need instead is:
      Code:
      bysort eoc_id (icd_ptsd_present): replace icd_ptsd_present = icd_ptsd_present[_N]
      That said, you should really do what alejoforero recommends in #2. That will give you a numeric 0/1 variable encoding the presence of PTSD. That will, in most situations, be much more useful in Stata than the string variable. If you want to be reminded of what that numeric variable encodes, you can do:
      Code:
      label define ptsd 0 "PTSD Absent" 1 "PTSD Present"
      label values icd_ptsd_present_aux ptsd

      Comment

      Working...
      X