Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare variable lists between two datasets

    Hello,

    I'm using Stata 15.1. I want to compare variable lists between two datasets. It seems that cfvars by Kit Baum did this in Stata 9, but that this package is no longer available from ssc. I've searched for solutions, but I've only found suggestions to use cf, which compares variable values as opposed to just the variable lists between datasets. Any suggestions are appreciated.

  • #2
    Here's a "no frills" solution:

    Code:
    des using file1, varlist
    local file1_vars `r(varlist)'
    des using file2, varlist
    local file2_vars `r(varlist)'
    
    local common: list file1_vars & file2_vars
    local file1_only: list file1_vars - file2_vars
    local file2_only: list file2_vars - file1_vars
    
    display `"`common'"'
    
    display `"`file1_only'"'
    
    display `"`file2_only'"'

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Here's a "no frills" solution:

      Code:
      des using file1, varlist
      local file1_vars `r(varlist)'
      des using file2, varlist
      local file2_vars `r(varlist)'
      
      local common: list file1_vars & file2_vars
      local file1_only: list file1_vars - file2_vars
      local file2_only: list file2_vars - file1_vars
      
      display `"`common'"'
      
      display `"`file1_only'"'
      
      display `"`file2_only'"'
      This worked perfectly. Thank you.

      Comment


      • #4
        Hi! I have a question related to this issue. How would this work if I wanted to compare variable lists between 3 different datasets?

        Comment


        • #5
          Bear in mind that with 3 data sets you now face 6 pairwise comparisons: 1 but not 2, 2 but not 1, 1 but not 3, 3 but not 1, 2 but not 3, and 3 but not 2. And also 3 "all but one" comparisons. So this gets a little complicated. And I think I would take a different approach:
          Code:
          des using file1, varlist
          local file1_vars `r(varlist)'
          des using file2, varlist
          local file2_vars `r(varlist)'
          des using file3, varlist
          local file3_vars `r(varlist)'
          
          //    CREATE A LIST OF ALL THE VARIABLES IN ANY OF THE 3 DATA SETS
          local all_vars: list file1_vars | file2_vars
          local all_vars: list all_vars | file3_vars
          
          local n_vars: word count `all_vars'
          clear
          set obs `n_vars'
          gen varname = ""
          gen byte in_1 = 0
          gen byte in_2 = 0
          gen byte in_3 = 0
          local cursor 1
          
          foreach a of local all_vars {
              replace varname = "`a'" in `cursor'
              forvalues i = 1/3 {
                  if `:list posof "`a'" in file`i'_vars' {
                      replace in_`i' = 1 in `cursor'
                  }
              }
              local ++cursor
          }
          This gives you your results as a data set containing the name of each variable that appears in any of the data sets, and three dichotomous indicators for appearance in each of the three data sets. I think that's an easier way to organize the outputs here. If you then want to know, for example, which variables appear in both file1 and file3 (regardless of whether they appear in file 2 or not) you can run -list varname if in_1 & in_3-. If you wanted those in files 1 and 3 that are specifically not in file 2, -list varname if in_1 & in_3 & !in_2- does it. And so on.



          Comment

          Working...
          X