Giving a label to a large list of variables based on another dataset

Mike Major

Join Date: Jun 2019
Posts: 10

Giving a label to a large list of variables based on another dataset

21 Nov 2019, 15:53

I have 250 variables that I need to give a label to. e.g. subj101, subj102 subj103 subj104

I have another simple spreadsheet of the variable names and labels i would like:

varname	desired_label
subj101	Any foreign language
subj102	Any foreign language Extension unit
subj103	Aboriginal Studies
subj104	Agriculture
subj105	Ancient History
subj106	Biology
subj107	Business Studies
subj108	Ceramics
subj109	Chemistry
subj110	Community and Family Studies
subj111	Computing Applications
subj112	Dance
subj113	Design and Technology
subj114	Drama
subj115	Earth and Environmental Science
subj116	Economics

How can I easily bulk relabel all of these?

Cheers,
Mike

Tags: None

Mike Lacy

Join Date: Apr 2014
Posts: 2416

21 Nov 2019, 16:55

While there are probably more elegant ways to do this, one easy solution is to append the no-label file to the labeled file. Note that this must be done in this order, i.e., the file with the labels must be resident, and the file w/o labels is appended as the "using file." One then keeps only the observations that came from what was the unlabeled file

Code:

// Make example of original file to be labeled.
sysuse auto, clear
keep weight turn price // just a subset
// strip out labels to model user's situation
label var weight ""
label var turn ""
label var price ""
tempfile orig
save `orig'
desc weight turn price // look ma no labels
// Make example file that has labels for illustration.  It has variables outside the subset
// of interest but this will not be a problem.
sysuse auto, clear
//
// Actual solution starts here.  Note that labeled file is resident.
keep in 1  // Only 1 observation needed to carry the labels
append using `orig', gen(source)
keep if source == 1
keep weight turn price // Just the variables of interest
desc

Comment

Mike Major

Join Date: Jun 2019

Posts: 10
#3

21 Nov 2019, 22:27

Thanks very much!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#4

22 Nov 2019, 03:37

Mike Lacy outlined a nice solution. Note that you would not even have to keep a single observation in the labeled dataset. Thus,

Code:

// Actual solution starts here. Note that labeled file is resident. keep in 1 // Only 1 observation needed to carry the labels append using `orig', gen(source) keep if source == 1 keep weight turn price // Just the variables of interest

could be reduced to

Code:

// Actual solution starts here. Note that labeled file is resident. drop in 1/l // <- new; lose all observations will still keep variables and labels append using `orig' // <- no observations get appended so no need to track anything keep weight turn price // Just the variables of interest

Roger Newson uses this trick in his vallabsave package (SSC) for value labels.

From the initial query, I got the impression that there was no labeled dataset but a spreadsheet that maps variable names to variable labels. Obviously, the outlined approach will not work in this situation.

Best
Daniel
1 like
Comment

Announcement

Giving a label to a large list of variables based on another dataset

Comment

Comment

Comment