Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create Additional Observations in a Loop and Replacing Values

    Hello,

    I am in the process of making documents for data reconciliation. All the data should have been double entered, but I have about 30 ids that were only entered once.

    This is going to sound like a weird request, but it’s part of the data cleaning process. How can I create additional observations for the data that was only entered once? I have a variable called dup that counts the number of times the id variable was entered into the dataset.
    Example Data in Current Form
    obs id dup gender
    1 12 1 2
    2 13 1 2
    3 14 2 1
    4 15 2 1
    5 16 2 3
    6 17 2 2
    7 18 1 1
    I would like to create an additional observation for each id that was entered once. In the new observation row, I would like only the id variable to have a value (the rest can be missing)

    What I want the data to look like
    obs id dup gender
    1 12 1 2
    2 13 1 2
    3 14 2 1
    4 15 2 1
    5 16 2 3
    6 17 2 2
    7 18 1 1
    8 12 . .
    9 13 . .
    10 18 . .
    I was working on quite a few loops but got pretty stuck. I settled on the one below but I am not quite sure how to replace in the next subsequent observation row. Any advice would be greatly appreciated.

    foreach x in 12 13 18 {
    set obs `=_N+1'
    replace id = `x' in 8
    }
    Last edited by Madison Avila; 24 Mar 2025, 15:26.

  • #2
    No loop needed.

    Code:
    expand 2 if dup == 1

    Comment


    • #3
      I agree that -expand 2- is the core of the solution here. But O.P. also wants all of the variables other than id to be set to missing. And the command in #2 won't do that part. A small expansion of the code does it:
      Code:
      expand 2 if dup == 1, gen(added)
      ds id added, not
      foreach v of varlist `r(varlist)' {
          replace `v' = . if added
      }
      drop added

      Comment


      • #4
        Wow, thank you Nick and Clyde for your responses. The solutions are so elegantly simple! And they are exactly what I needed.

        Clyde, I modified your code slightly because I had some string variables in my actual dataset too! I posted the final product below (in case it’s ever helpful to someone else).

        Thanks again!

        **list all string variables, and then list all numeric variables
        ds, has(type string) varwidth(22)
        ds, has(type numeric) varwidth(22)

        **Create globals for string variables, and numeric variables
        global strings put variable names here but make sure your id variable isn’t listed
        global numeric put variable names here but make sure your id variable isn’t listed

        **Create duplicates for all single entries
        expand 2 if dup == 1, gen(added)

        **make all other variables (apart from your id variable) missing
        foreach var of varlist $numeric {
        replace `var' = . if added
        }

        foreach var of varlist $strings {
        replace `var' = "" if added
        }
        **drop added
        drop added

        Comment


        • #5
          You could simplify your code something like this:

          Code:
          expand 2 if dup == 1, gen(added)
          
          ds id dup added, not
          
          local varlist `r(varlist)'
          
          ds `varlist', has(type string)
          
          foreach v in `r(varlist)'  {
               replace `v' = "" if added
          }
          
          ds `varlist', has(type numeric)
          
          foreach v in `r(varlist)'  {
               replace `v' = . if added
          }
          
          drop added
          Last edited by Nick Cox; 24 Mar 2025, 17:42.

          Comment


          • #6
            Wow, that's even better! Thank you, Nick.

            Comment


            • #7
              Or

              Code:
              expand 2 if dup == 1, gen(added)
              ds id dup added, not 
              
              foreach v in `r(varlist)' { 
                    capture replace `v' = "" if added 
                    if _rc replace `v' = . if added 
              } 
              
              drop added

              Comment


              • #8
                Another great option! Thanks so much.

                Comment

                Working...
                X