Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting variables on basis of digits in variable name

    Dear users,

    I was wondering about the following. I have a large dataset with many different variables (about 14,000). Now, I need to keep about 150 of them, the rest can be deleted (so I thought about using the 'keep' command).
    One of the groups of variables that I have is s1age, s2age, ...., s14age. I need to keep only s5age, ..., s14age. Of course, I can write them down separately but the problem is that I need to do something similar for many variables so this would be very inconvenient. Also, ordering the variables sequentially will not work because there are going to be other variables in between. Anyone who knows a fast way to solve this?

    Hope someone knows a solution to my problem.

    Thanks in advance.

  • #2
    Code:
    local vars
    forval i=5/14{
        local vars "`vars' s`i'age"
    }
    keep `vars'

    Comment


    • #3
      You really do not want 14000 variables, or even 10000 variables after getting rid of the 1's 2's 3's and 4's.

      For purposes of this discussion I am going to assume each observation of your data represents data from one individual over 14 waves of a survey, but the details are not important.

      Your data is organized in what is called a "wide layout", with one observation for each individual. A "long layout" consists of 14 observations for each individual - one for each of the 14 waves.

      The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. The sort of problems you will encounter trying to use your wide data will almost certainly be solved by reshaping the data to a long layout. It is much easier, for example, to keep all the observations from waves 5 through 14 in the long layout than it is to keep all the variables from waves 5 through 14 in the wide layout.

      Here is some technique, with just two individuals and 5 waves with the objective of keeping waves 3-5. There are three basic variables, of which only two are desired in the final results.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int(id s1age s1emp s1junk s2age s2emp s2junk s3age s3emp s3junk s4age s4emp s4junk s5age s5emp s5junk)
      1001 24 0 . 25 0 . 26 1 . 27 1 . 28 1 .
      1002 33 1 . 34 1 . 35 1 . 36 1 . 37 0 .
      end
      keep id s*age s*emp
      reshape long s@age s@emp, i(id) j(wave)
      sort id wave
      keep if wave>2
      list, sepby(id)
      Code:
      . keep id s*age s*emp
      
      . reshape long s@age s@emp, i(id) j(wave)
      (j = 1 2 3 4 5)
      
      Data                               Wide   ->   Long
      -----------------------------------------------------------------------------
      Number of observations                2   ->   10          
      Number of variables                  11   ->   4           
      j variable (5 values)                     ->   wave
      xij variables:
                        s1age s2age ... s5age   ->   sage
                        s1emp s2emp ... s5emp   ->   semp
      -----------------------------------------------------------------------------
      
      . sort id wave
      
      . keep if wave>2
      (4 observations deleted)
      
      . list, sepby(id)
      
           +---------------------------+
           |   id   wave   sage   semp |
           |---------------------------|
        1. | 1001      3     26      1 |
        2. | 1001      4     27      1 |
        3. | 1001      5     28      1 |
           |---------------------------|
        4. | 1002      3     35      1 |
        5. | 1002      4     36      1 |
        6. | 1002      5     37      0 |
           +---------------------------+
      Now you should read the output of help reshape to understand what I have demonstrated. In doing so, you will see that it would be simple to use reshape wide to turn this back into a wide layout. Again, that almost certainly would be a mistake, so I do not demonstrate it.
      Last edited by William Lisowski; 26 Feb 2022, 08:33.

      Comment

      Working...
      X