Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing value based on group identifier

    Dear all,

    I would consider myself a bloody beginner in Stata and statalist so I might not have found a previous post where this question was already asked, if so please let me know!

    To my problem:
    My data set has two identifiers, one unique for each participant lets call it "S" and one which is shared for every family member lets call it "F".
    For each family "F" I have a member "S" who has a value in the variable "X" while all other members have a missing value.
    Now I would like to replace all the missing values in X with the value of the specific family member.

    my approach so far:

    Code:
    replace X = value1 if F =="F1"
    replace X = value2 if F =="F2"
    This is not only time consuming but also increases the risk of human error as it has to be repeated for each family.
    So, I am sure that there is a more elegant way to complete this and I would like to know how a more experienced Stata user would tackle this.

    thanks for any input!

    I am using Stata 16
    Last edited by Christoff Galvao; 25 Feb 2022, 03:55.

  • #2
    Data example please https://www.statalist.org/forums/help#stata

    Comment


    • #3
      this is how my data looks like:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      *dataex SUJETO FAMILIA cs_2_2, count(30)
      clear
      input str7(SUJETO FAMILIA) byte cs_2_2
      "S00004" "F0002" .
      "F0006B" "F0006" .
      "F0006A" "F0006" .
      "F0008"  "F0008" .
      "S00043" "F0008" 2
      "S00045" "F0009" .
      "S00073" "F0015" .
      "S00211" "F0046" 4
      "S00213" "F0046" .
      "F0083C" "F0083" .
      "S00365" "F0083" 3
      "F0083"  "F0083" .
      "F0083B" "F0083" .
      "F0083D" "F0083" .
      "S00465" "F0103" .
      "S00528" "F0115" .
      "S00535" "F0118" .
      "F118A"  "F0118" .
      "S00536" "F0118" .
      "F0138C" "F0138" .
      "S0138A" "F0138" .
      "F0138D" "F0138" .
      "F0138B" "F0138" .
      "S00617" "F0138" 2
      "S00668" "F0149" .
      "S00677" "F0152" .
      "S00678" "F0152" .
      "F0155A" "F0155" .
      "S00687" "F0155" 4
      "S03697" "F0171" 3
      end

      Comment


      • #4
        Perhaps you want something like

        Code:
        bysort FAMILIA (cs2_2) : replace cs2_2 = cs2_2[_n-1] if missing(cs2_2)
        See https://www.stata.com/support/faqs/d...issing-values/

        If there are two or more distinct answers within each family. this is unlikely to be the best solution.

        Comment


        • #5
          This is exactly what I was looking for.
          Thanks Nick Cox for the input and also for the link!

          Comment

          Working...
          X