How to keep variables based on their position in the dataset

Samir Kelada

Join Date: Jun 2014

Posts: 11
#1

How to keep variables based on their position in the dataset

16 Oct 2014, 16:50

I know how to specify keeping observations based on their ordered row position in a dataset, e.g.,

Code:

keep if _n==1

But can one do the same for columns, i.e., keep columns based on their position in the dataset? So something like keep if column==1.
I don't know how STATA refers to columns in this context and haven't been able to find a post or documentation on this.
Tags: None
Aspen Chen

Join Date: Apr 2014

Posts: 114
#2

16 Oct 2014, 17:16

There are a couple of indirect ways to approach this. Here is an example:

Code:

sysuse auto,clear qui unab allvar:_all // or use qui ds, which stores all variable names in r(varlist) loc myvar `: word 5 of `allvar'' keep `myvar'

This thread over at Stack Overflow discusses this in more detail.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35454
#3

16 Oct 2014, 17:18

By rows and columns in a dataset, I assume you mean observations and variables. Stata doesn't use the terminology of rows and columns except with regard to matrices (or vectors). You can get what you ask for. Here is one way:

Code:

. sysuse auto (1978 Automobile Data) . ds make mpg headroom weight turn gear_ratio price rep78 trunk length displacement foreign . unab varlist : _all . tokenize "`varlist'" . mac li <stuff> _12: foreign _11: gear_ratio _10: displacement _9: turn _8: length _7: weight _6: trunk _5: headroom _4: rep78 _3: mpg _2: price _1: make _varlist: make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign . drop `7' . ds make mpg headroom length displacement foreign price rep78 trunk turn gear_ratio

We put the complete variable list into a local macro, split it into tokens which were assigned to macros with names 1 up and then arbitrarily dropped the 7th variable, which was weight. keep could be used in the same way.

Note that you would now need to renumber to get "column" numbers without a break in the sequence of positive integers. That's it's awkward to work with an unattractive scheme seems no vice to me.

I've got to say that I am puzzled at the thought that anyone might want this. It seems an enormous step backwards not to work with evocative names. 30 years ago I worked with statistical software in which columns of data were indeed numbered. That was already a backward step: programming languages from the 1950s allowed meaningful names, within modest length limits.

Last edited by Nick Cox; 16 Oct 2014, 17:23.
1 like
Comment
Mike Barker

Join Date: Apr 2014

Posts: 37
#4

17 Oct 2014, 07:13

Another approach would be renaming variables with a prefix. Then you can drop the prefix when you're finished:

Code:

rename * c(##)_=, renumber keep c01 c02 c03 rename c(##)_* .*

Be sure to give all variable names the same number of digits, or Stata will not be able to determine if c1 is an abbreviation for c1_varname or c10_varname.
Comment
Elisabeth Revdahl

Join Date: Aug 2019

Posts: 2
#5

08 Aug 2019, 00:07

I saw this code and was wondering if there was a way to drop all variables that have a location higher than a value.
I am working with a datasett imported from Excel with a weird structure. There are 3 matrices located a bit randomly. The first matrix has is MxM. While importing i got the first row to be variable names. But there are sums, bits of information etc. spread over the sheet. So i thought i could clean it up by counting how many observations there are in the first row (=M) and put it in a local (I figured out that code) and than telling Stata to keep the first M variables, the number M and the number of total variables varies for each file. Is there a way?
Comment

Announcement

How to keep variables based on their position in the dataset

Comment

Comment

Comment

Comment