Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep variables based on their position in the dataset

    I know how to specify keeping observations based on their ordered row position in a dataset, e.g.,
    Code:
    keep if _n==1
    But can one do the same for columns, i.e., keep columns based on their position in the dataset? So something like keep if column==1.
    I don't know how STATA refers to columns in this context and haven't been able to find a post or documentation on this.

  • #2
    There are a couple of indirect ways to approach this. Here is an example:
    Code:
    sysuse auto,clear
    qui unab allvar:_all // or use qui ds, which stores all variable names in r(varlist)
    loc myvar `: word 5 of `allvar''
    keep `myvar'
    This thread over at Stack Overflow discusses this in more detail.

    Comment


    • #3
      By rows and columns in a dataset, I assume you mean observations and variables. Stata doesn't use the terminology of rows and columns except with regard to matrices (or vectors). You can get what you ask for. Here is one way:

      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . ds
      make          mpg           headroom      weight        turn          gear_ratio
      price         rep78         trunk         length        displacement  foreign
      
      . unab varlist : _all
      
      . tokenize "`varlist'"
      
      . mac li
      <stuff>
      _12:            foreign
      _11:            gear_ratio
      _10:            displacement
      _9:             turn
      _8:             length
      _7:             weight
      _6:             trunk
      _5:             headroom
      _4:             rep78
      _3:             mpg
      _2:             price
      _1:             make
      _varlist:       make price mpg rep78 headroom trunk weight length turn displacement
                      gear_ratio foreign
      
      . drop `7'
      
      . ds
      make          mpg           headroom      length        displacement  foreign
      price         rep78         trunk         turn          gear_ratio
      We put the complete variable list into a local macro, split it into tokens which were assigned to macros with names 1 up and then arbitrarily dropped the 7th variable, which was weight. keep could be used in the same way.

      Note that you would now need to renumber to get "column" numbers without a break in the sequence of positive integers. That's it's awkward to work with an unattractive scheme seems no vice to me.

      I've got to say that I am puzzled at the thought that anyone might want this. It seems an enormous step backwards not to work with evocative names. 30 years ago I worked with statistical software in which columns of data were indeed numbered. That was already a backward step: programming languages from the 1950s allowed meaningful names, within modest length limits.
      Last edited by Nick Cox; 16 Oct 2014, 17:23.

      Comment


      • #4
        Another approach would be renaming variables with a prefix. Then you can drop the prefix when you're finished:

        Code:
        rename * c(##)_=, renumber
        
        keep c01 c02 c03
        
        rename c(##)_* .*
        Be sure to give all variable names the same number of digits, or Stata will not be able to determine if c1 is an abbreviation for c1_varname or c10_varname.

        Comment


        • #5
          I saw this code and was wondering if there was a way to drop all variables that have a location higher than a value.
          I am working with a datasett imported from Excel with a weird structure. There are 3 matrices located a bit randomly. The first matrix has is MxM. While importing i got the first row to be variable names. But there are sums, bits of information etc. spread over the sheet. So i thought i could clean it up by counting how many observations there are in the first row (=M) and put it in a local (I figured out that code) and than telling Stata to keep the first M variables, the number M and the number of total variables varies for each file. Is there a way?

          Comment

          Working...
          X