Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping only variables in a list

    I have a question that feels painfully straightforward, but is giving me some trouble.

    I have a list of variable names (ie. v1, v2, v3, v4...) and I want to loop over a bunch of datasets and drop any variables that are not contained in the list from each dataset

    The simple idea is that I create a local list of these variables, and then essentially say:
    HTML Code:
    loc varlist v1 v2 v3 v4...
    
    foreach dataset in ds1 ds2 ds3 {
    use `dataset', clear
    
    "keep if var is contained in varlist"
    }
    That last command in quotes is my problem (hence the fact that it isn't an actual command). I'm only familiar with using the if option on the keep and drop commands when I'm keeping or dropping observations, but I need to apply it here to variables



  • #2
    Code:
    help drop

    Comment


    • #3
      ya, I read the help file and the only examples where you drop or keep with a condition are dropping and keeping observations. I'm sure there's a straightforward answer, but its not in the help file

      Comment


      • #4
        How about something simpler:
        Code:
        local varlist v1 v2 v3 v4 //etc.
        
        foreach dataset in ds1 ds2 ds3 {
            use `varlist' using `dataset', clear
            // DO SOMETHING WITH THE DATA NOW
        }
        This will work provided each of the data sets actually contains all of the variables mentioned in varlist. If some of those variables, however, don't exist in each data set, then the code above will fail. In that situation, it's a bit more complicated:

        Code:
        local varlist v1 v2 v3 v4 //etc.
        
        foreach dataset in ds1 ds2 ds3 {
            use `dataset', clear
            quietly ds _all
            local vbles `r(varlist)'
            local keepers: list varlist & vbles
            keep `keepers'
            // DO SOMETHING WITH THE DATA NOW
        }

        Comment


        • #5
          that second block is exactly the solution I needed. I forgot to mention that some of the variables don't exist in each dataset, but you're one step ahead of me. Appreciate it!

          Comment

          Working...
          X