Renaming duplicate observations

Amie Osborn

Join Date: Sep 2015

Posts: 27
#1

Renaming duplicate observations

17 Sep 2015, 13:32

Hello Statalist users,

I have found much information on the duplicate command; however, most are regarding dropping duplicate observations or renaming variables. Instead, I want to rename my duplicate observations. The data I received has categories (i.e. crops, livestock, etc.) with multiple subcategories. Each category and subcategory is imported as an observation and I eventually will be transposing this data. However, there are some subcategories that are labeled the same under different categories. I need to relabel these subcategory observations in order to transpose.

Below is a simplified version of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str23(statename B) "Crop" "1" "consumption" "2" "adjustment" "3" "Animal" "4" "consumption" "5" "adjustment" "6" end

I have found a quick fix; however, this is not dynamic.

Code:

sort statename quietly by statename: gen dup=cond(_N==1,0,_n) tabulate dup replace statename = "consumption_crop" if dup>1 sort statename quietly by statename: replace dup=cond(_N==1,0,_n) tabulate dup replace statename = "adjustment_crop" if dup>1 sort B

Again, this works for this situation, but I want this code to be dynamic and allow for changes in duplicated names not just the current names.

I was trying to mend the information provided in the discussion about renaming variables as it seemed relevant. However, I could not identify the proper way to edit Daniel Klein's suggested posted.

Code:

// Create data set clear input str23 A str23 B // note str23 "This is my desired name" "This is my desired name" "9098" "8676878" end // rename foreach var of var A-B { loc original_text : di `var'[1] loc newname = strtoname(`"`original_text'"') loc newname : permname `newname' ren `var' `newname' char `newname'[original_text] `"`original_text'"' } d ,f l char l

I recognize the permname would assist in this process, but I am not sure how to use it to replace a duplicated observation instead of a variable.

I appreciate any and all information regarding this topic.

Regards,
Amie

Renaming variables using observations and handling duplicates - Statalist

http://www.statalist.org

Hi, I'm interested in renaming variables using values from the first observations. The problem with the particular data set I'm currently working on is that
Tags: None

Robert Picard

Join Date: Mar 2014
Posts: 1536

17 Sep 2015, 19:11

I think the following will create your unique variable names:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str23(statename B)
"Crop"        "1"
"consumption" "2"
"adjustment"  "3"
"Animal"      "4"
"consumption" "5"
"adjustment"  "6"
end

* rename the variable, these are not state names
rename statename category

* you need an observation identifier
gen long obs = _n

* tag observations that are unique, we assume there are main categories
bysort category: gen tag = _N == 1

* restore the sort order
sort obs

* use a running sum to create a main category identifier
gen mainid = sum(tag) 

* create a unique name
bysort mainid (obs): gen newname = category[1]
by mainid: replace newname = newname + " " +  category if _n > 1

* make the name a valid Stata name that can be used as a variable name
gen goodname = strtoname(newname, 20)

list, sepby(mainid)

Note that if this is a continuation of yesterday's thread, I suspect that you would be better off making a master list of all possible variable labels across all your files and decide manually of an appropriate and unique name (i.e. create a dataset that contains unique labels and the variable name you want to use and then use merge to add the new variable names to your original data).

Comment

Amie Osborn

Join Date: Sep 2015

Posts: 27
#3

18 Sep 2015, 11:49

Hi Robert,

Thank you for the suggestion of a master list with all variable labels. I originally tried to do this; however, I am worried that in the future the dataset would add in additional names or edit current categories labels. Therefore, I think your code added to the code you provided in yesterday's thread will provide a dynamic model that will account for any duplicate variables that may appear in the future.

I did not think about replacing based on a main identifier! Thank you for the suggestion; I was able to modify it to my data.

Best,
Amie
Comment

Announcement

Renaming duplicate observations

Comment

Comment