Changing a Large List of Variable Names

Andrew Bernal

Join Date: Feb 2022

Posts: 31
#1

Changing a Large List of Variable Names

04 Apr 2022, 13:49

Hello! I am working with country-panel data, and wanted to know if I can replace the variables names (country names) of my original data set with something that is more uniform (i.e. ISO3 country-naming format). My data is formatted as below:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(day argentina saudiarabia unitedkingdom austria france) 19512 29.860113 . 1.8448875 . 29.91344 19513 30.020096 1.644 1.848199 6.3942 30.02937 19514 30.179895 1.6503 1.8629556 6.4046 29.98533 19515 30.23022 . 1.875309 6.4124 29.99614 19516 30.35522 1.6421 1.880363 . 29.983854 19517 . . 1.8915282 . 30.059195 19518 . . 1.8915282 . 30.059195 19519 30.459717 1.6385 1.8915282 . 30.059195 19520 30.410173 1.64 1.8727467 6.4417 29.91063 19521 30.51963 1.6341 1.8819928 6.4096 29.801025 19522 30.65406 1.6361 1.888474 6.4244 29.826956 19523 30.589745 1.6274 1.8838612 6.4232 29.728506 end format %tdNN/DD/CCYY day

I have daily data stretching from roughly 1990 to 2021. I also have about 60 countries. I am aware of kountry, but this only works if the countries are observations, and not variables. With daily data, the data set is naturally long, and for this reason, reshape and xpose don't seem to be able to help solve my problem.

So far, I have created a separate .dta which just contains the countries as observations. Using kountry there gives me the desired country codes (ARG for Argentina, FRA for France, etc). But now, how can I import those country codes as my new variable names? Or am I going about this all wrong? Thanks in advance!

Last edited by Andrew Bernal; 04 Apr 2022, 13:53.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#2

04 Apr 2022, 14:23

In the code below, the first -dataex-, which creates a tempfile `name_crosswalk' is my equivalent here of your separate .dta which contains the countries as observations along with their country codes. You should replace the line -use `name_crosswalk'- with a command to use that file you already have. Similarly you should replace the references oldname[1] and newname[1] (but not the corresponding local macros) by whatever variable names you gave to those data elements in your file. The following code works with your example data:

Code:

* Example generated by -dataex-. For more info, type help dataex clear* input str13 oldname str3 newname "argentina" "ARG" "saudiarabia" "SAU" "unitedkingdom" "GBR" "austria" "AUS" "france" "FRA" end tempfile name_crosswalk save `name_crosswalk' * Example generated by -dataex-. To install: ssc install dataex clear input float(day argentina saudiarabia unitedkingdom austria france) 19512 29.860113 . 1.8448875 . 29.91344 19513 30.020096 1.644 1.848199 6.3942 30.02937 19514 30.179895 1.6503 1.8629556 6.4046 29.98533 19515 30.23022 . 1.875309 6.4124 29.99614 19516 30.35522 1.6421 1.880363 . 29.983854 19517 . . 1.8915282 . 30.059195 19518 . . 1.8915282 . 30.059195 19519 30.459717 1.6385 1.8915282 . 30.059195 19520 30.410173 1.64 1.8727467 6.4417 29.91063 19521 30.51963 1.6341 1.8819928 6.4096 29.801025 19522 30.65406 1.6361 1.888474 6.4244 29.826956 19523 30.589745 1.6274 1.8838612 6.4232 29.728506 end format %tdNN/DD/CCYY day capture program drop rename_one program define rename_one local oldname =oldname[1] local newname =newname[1] frame default: rename `oldname' `newname' exit end frame create crosswalk frame crosswalk { use `name_crosswalk' runby rename_one, by(oldname) }

-runby- is by Robert Picard and me and is available from SSC. If there are any variables that cannot be renamed using your own crosswalk file, either because the oldname values don't match correctly or because the country code is not a legal Stata variable name, that particular -rename- will be skipped. -runby- will tel you that these are "by-groups with errors- in its summary table, but it will not halt execution nor will it tell you where to find them in the resulting data: you'll have to inspect the data results yourself.

That said, your data is currently in a wide layout. Yes you have a lot of dates, which makes it "long" in the colloquial sense. But the layout, in the Stata sense, remains wide because the countries are all separate variables. Except for a handful of things, you will find that this layout makes it difficult or impossible to accomplish data management and analysis in Stata. So rather than taking this approach, I would convert this data set to a long layout. Because you already have almost 12,000 observations, the resulting fully long layout data set will have about 720,000 observations. Using -reshape- will probably take a long time. You might want to use the user-written command -tolong- or -greshape- instead, as these are much faster on large data sets. Both are available from SSC. (-greshape- comes as part of the -gtools- package.) Then you can just use -kountry- directly. Working with the fully long layout will make your life a lot more pleasant.

Last edited by Clyde Schechter; 04 Apr 2022, 14:27.
Comment
Andrew Bernal

Join Date: Feb 2022

Posts: 31
#3

04 Apr 2022, 15:02

Thank you, Clyde, for your in-depth explanation. Huge thanks for not only answering the concern I directly had, but also offering an explanation for re-shaping the data. Once a couple of operations are done to the data, I do intend to reshape it, as this will be necessary for the estimation I will run.
Comment

Announcement

Changing a Large List of Variable Names

Comment

Comment