Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combine rows (observations) and columns (variables)

    I have two datasets and want to keep only some variables (columns) in one dataset that are found as rows in another dataset. For example,

    Code:
    clear
    input a_a b_b c_c d_d e_e f_f g_g
    1 2 3 4 5 6 7
    end
    save temp1, replace
    
    clear
    input str16 variable_name age
    "a_a" 22
    "c_c" 26
    "g_g" 29
    end
    save temp2, replace

    The dataset temp1 currently has 7 variables (columns), but I only want to keep the 3 variables listed in variable_name of the temp2 dataset. My actual problem is much larger, need to keep/drop 1000s of variables, else I could just do it by hand. (You can disregard the age variable in the temp2 dataset as well as the variable values in temp1 dataset.)

    The resulting dataset should look like this:

    Code:
    clear
    input a_a c_c g_g
    1 3 7
    end
    save temp3, replace
    Thanks in advance.
    Last edited by Kyle Smith; 08 Mar 2022, 17:27.

  • #2
    Code:
    clear
    input str16 variable_name age
    "a_a" 22
    "c_c" 26
    "g_g" 29
    end
    save temp2, replace
    
    use temp2, clear
    levelsof variable_name, local(vbles) clean
    
    use temp1, clear
    ds `vbles', not
    drop `r(varlist)'
    des
    Note: This code assumes that everything that appeares in variable_name in file temp2 is actually a variable in temp1. If the values of variable_name in temp2 contain among them things that are not names of variables in temp1, a slightly different approach is needed:

    Code:
    clear
    input a_a b_b c_c d_d e_e f_f g_g
    1 2 3 4 5 6 7
    end
    save temp1, replace
    
    clear
    input str16 variable_name age
    "a_a" 22
    "c_c" 26
    "g_g" 29
    "h_h" 31
    end
    save temp2, replace
    
    use temp2, clear
    levelsof variable_name, local(vbles) clean
    
    use temp1, clear
    quietly ds
    local temp1_vars `r(varlist)'
    local to_drop: list temp1_vars - vbles
    drop `to_drop'
    des

    Comment


    • #3
      @Clyde: Thank you for your quick reply.

      I get one of two errors (the exact error maybe depends on if I run it all at once or bit by bit):

      first:

      . ds `vbles', not
      variable a_a not found
      r(111);

      .
      . drop `r(varlist)'
      varlist, if exp, or in range required
      r(100);


      second:

      '`' cannot be read as a number


      when I run your exact code on my machine (my example data, not actual data). Not sure what is going on?

      Comment


      • #4
        Because these codes use local macros, they cannot be run in pieces and must be run all at once.

        But doing that, I cannot reproduce your error messages. Both approaches run correctly on my setup:

        Code:
        . clear
        
        . input a_a b_b c_c d_d e_e f_f g_g
        
                   a_a        b_b        c_c        d_d        e_e        f_f        g_g
          1. 1 2 3 4 5 6 7
          2. end
        
        . save temp1, replace
        file temp1.dta saved
        
        .
        . clear
        
        . input str16 variable_name age
        
                variable_name        age
          1. "a_a" 22
          2. "c_c" 26
          3. "g_g" 29
          4. end
        
        . save temp2, replace
        file temp2.dta saved
        
        .
        . use temp2, clear
        
        . levelsof variable_name, local(vbles) clean
        a_a c_c g_g
        
        .
        . use temp1, clear
        
        . ds `vbles', not
        b_b  d_d  e_e  f_f
        
        . drop `r(varlist)'
        
        . des
        
        Contains data from temp1.dta
         Observations:             1                  
            Variables:             3                  8 Mar 2022 15:21
        ------------------------------------------------------------------------------------------------------------------------------------------
        Variable      Storage   Display    Value
            name         type    format    label      Variable label
        ------------------------------------------------------------------------------------------------------------------------------------------
        a_a             float   %9.0g                 
        c_c             float   %9.0g                 
        g_g             float   %9.0g                 
        ------------------------------------------------------------------------------------------------------------------------------------------
        Sorted by:
             Note: Dataset has changed since last saved.
        
        .
        end of do-file
        and

        Code:
        . clear
        
        . input a_a b_b c_c d_d e_e f_f g_g
        
                   a_a        b_b        c_c        d_d        e_e        f_f        g_g
          1. 1 2 3 4 5 6 7
          2. end
        
        . save temp1, replace
        file temp1.dta saved
        
        .
        . clear
        
        . input str16 variable_name age
        
                variable_name        age
          1. "a_a" 22
          2. "c_c" 26
          3. "g_g" 29
          4. "h_h" 31
          5. end
        
        . save temp2, replace
        file temp2.dta saved
        
        .
        . use temp2, clear
        
        . levelsof variable_name, local(vbles) clean
        a_a c_c g_g h_h
        
        .
        . use temp1, clear
        
        . quietly ds
        
        . local temp1_vars `r(varlist)'
        
        . local to_drop: list temp1_vars - vbles
        
        . drop `to_drop'
        
        . des
        
        Contains data from temp1.dta
         Observations:             1                  
            Variables:             3                  8 Mar 2022 15:23
        ------------------------------------------------------------------------------------------------------------------------------------------
        Variable      Storage   Display    Value
            name         type    format    label      Variable label
        ------------------------------------------------------------------------------------------------------------------------------------------
        a_a             float   %9.0g                 
        c_c             float   %9.0g                 
        g_g             float   %9.0g                 
        ------------------------------------------------------------------------------------------------------------------------------------------
        Sorted by:
             Note: Dataset has changed since last saved.
        
        .
        end of do-file
        Scrutinize your code to see what you are doing differently. The '`' cannot be read as a number message is something that is often encountered when reading in data from a -dataex- output that is missing the -end- statement, which is the case for your first -dataex- in #1.

        Comment


        • #5
          Clyde Schechter : My apologies. It worked when I ran it all at the same time. You're the man. Thanks for the very speedy and effective help.

          Comment

          Working...
          X