Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing a Large List of Variable Names

    Hello! I am working with country-panel data, and wanted to know if I can replace the variables names (country names) of my original data set with something that is more uniform (i.e. ISO3 country-naming format). My data is formatted as below:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(day argentina saudiarabia unitedkingdom austria france)
    19512 29.860113      . 1.8448875      .  29.91344
    19513 30.020096  1.644  1.848199 6.3942  30.02937
    19514 30.179895 1.6503 1.8629556 6.4046  29.98533
    19515  30.23022      .  1.875309 6.4124  29.99614
    19516  30.35522 1.6421  1.880363      . 29.983854
    19517         .      . 1.8915282      . 30.059195
    19518         .      . 1.8915282      . 30.059195
    19519 30.459717 1.6385 1.8915282      . 30.059195
    19520 30.410173   1.64 1.8727467 6.4417  29.91063
    19521  30.51963 1.6341 1.8819928 6.4096 29.801025
    19522  30.65406 1.6361  1.888474 6.4244 29.826956
    19523 30.589745 1.6274 1.8838612 6.4232 29.728506
    end
    format %tdNN/DD/CCYY day
    I have daily data stretching from roughly 1990 to 2021. I also have about 60 countries. I am aware of kountry, but this only works if the countries are observations, and not variables. With daily data, the data set is naturally long, and for this reason, reshape and xpose don't seem to be able to help solve my problem.

    So far, I have created a separate .dta which just contains the countries as observations. Using kountry there gives me the desired country codes (ARG for Argentina, FRA for France, etc). But now, how can I import those country codes as my new variable names? Or am I going about this all wrong? Thanks in advance!
    Last edited by Andrew Bernal; 04 Apr 2022, 13:53.

  • #2
    In the code below, the first -dataex-, which creates a tempfile `name_crosswalk' is my equivalent here of your separate .dta which contains the countries as observations along with their country codes. You should replace the line -use `name_crosswalk'- with a command to use that file you already have. Similarly you should replace the references oldname[1] and newname[1] (but not the corresponding local macros) by whatever variable names you gave to those data elements in your file. The following code works with your example data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear*
    input str13 oldname str3 newname
    "argentina"     "ARG"
    "saudiarabia"  "SAU"
    "unitedkingdom" "GBR"
    "austria"       "AUS"
    "france"        "FRA"
    end
    tempfile name_crosswalk
    save `name_crosswalk'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(day argentina saudiarabia unitedkingdom austria france)
    19512 29.860113      . 1.8448875      .  29.91344
    19513 30.020096  1.644  1.848199 6.3942  30.02937
    19514 30.179895 1.6503 1.8629556 6.4046  29.98533
    19515  30.23022      .  1.875309 6.4124  29.99614
    19516  30.35522 1.6421  1.880363      . 29.983854
    19517         .      . 1.8915282      . 30.059195
    19518         .      . 1.8915282      . 30.059195
    19519 30.459717 1.6385 1.8915282      . 30.059195
    19520 30.410173   1.64 1.8727467 6.4417  29.91063
    19521  30.51963 1.6341 1.8819928 6.4096 29.801025
    19522  30.65406 1.6361  1.888474 6.4244 29.826956
    19523 30.589745 1.6274 1.8838612 6.4232 29.728506
    end
    format %tdNN/DD/CCYY day
    
    capture program drop rename_one
    program define rename_one
        local oldname =oldname[1]
        local newname =newname[1]
        frame default: rename `oldname' `newname'
        exit
    end
    
    frame create crosswalk
    frame crosswalk {
        use `name_crosswalk'
        runby rename_one, by(oldname)
    }
    -runby- is by Robert Picard and me and is available from SSC. If there are any variables that cannot be renamed using your own crosswalk file, either because the oldname values don't match correctly or because the country code is not a legal Stata variable name, that particular -rename- will be skipped. -runby- will tel you that these are "by-groups with errors- in its summary table, but it will not halt execution nor will it tell you where to find them in the resulting data: you'll have to inspect the data results yourself.

    That said, your data is currently in a wide layout. Yes you have a lot of dates, which makes it "long" in the colloquial sense. But the layout, in the Stata sense, remains wide because the countries are all separate variables. Except for a handful of things, you will find that this layout makes it difficult or impossible to accomplish data management and analysis in Stata. So rather than taking this approach, I would convert this data set to a long layout. Because you already have almost 12,000 observations, the resulting fully long layout data set will have about 720,000 observations. Using -reshape- will probably take a long time. You might want to use the user-written command -tolong- or -greshape- instead, as these are much faster on large data sets. Both are available from SSC. (-greshape- comes as part of the -gtools- package.) Then you can just use -kountry- directly. Working with the fully long layout will make your life a lot more pleasant.
    Last edited by Clyde Schechter; 04 Apr 2022, 14:27.

    Comment


    • #3
      Thank you, Clyde, for your in-depth explanation. Huge thanks for not only answering the concern I directly had, but also offering an explanation for re-shaping the data. Once a couple of operations are done to the data, I do intend to reshape it, as this will be necessary for the estimation I will run.

      Comment

      Working...
      X