Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping multiple variables to long format

    Dear all,


    I have different variables in rows for different countries over many years in wide form. A very short part of my data is as below:
    Code:
    clear
    input str48 CountryName str3 CountryCode str67 SeriesName double(YR2019 YR2020 YR2021 YR2022)
    "Belgium"        "BEL" "Life expectancy at birth, total (years)"                              81.99512195121952 80.69512195121952 81.79024390243904  81.69756097560978
    "Belgium"        "BEL" "GDP per capita (current US$)"                                         46641.72140170868 45587.96830831899 51863.83115650539 49942.090642488656
    "Belgium"        "BEL" "Unemployment, total (% of total labor force) (modeled ILO estimate)"               5.36             5.555             6.264              5.558
    "Austria"        "AUT" "Life expectancy at birth, total (years)"                              81.89512195121951 81.19268292682928 81.19024390243904  81.09268292682928
    "Austria"        "AUT" "GDP per capita (current US$)"                                         50067.58572658923 48789.49784988721 53517.89045096115  52084.68119533723
    "Austria"        "AUT" "Unemployment, total (% of total labor force) (modeled ILO estimate)"               4.56             5.201             6.459              4.992
    "United Kingdom" "GBR" "Life expectancy at birth, total (years)"                              81.40487804878049 80.35121951219514              80.7  82.05853658536587
    "United Kingdom" "GBR" "GDP per capita (current US$)"                                        42662.535374031075 40217.00901169857 46869.75905841103  45564.15694480128
    "United Kingdom" "GBR" "Unemployment, total (% of total labor force) (modeled ILO estimate)"              3.737             4.472             4.826               3.73
    end
    I want to convert the data into long form. If there was only one variable, the below code would have get the job done:
    Code:
    reshape long YR, i(CountryName CountryCode) j(date)
    However, in the above dataset I have 3 different variables: life expectancy, GDP per capita and unemployment rate. Here is my question: how can I convert this dataset into long format that variables are listed in columns?
    Many thanks in advance.
    Last edited by yilmaz kemal; 06 Nov 2024, 02:36.

  • #2
    See Stata Tip 45: Getting those Data into Shape where this case is discussed.

    Code:
    clear
    input str48 CountryName str3 CountryCode str67 SeriesName double(YR2019 YR2020 YR2021 YR2022)
    "Belgium"        "BEL" "Life expectancy at birth, total (years)"                              81.99512195121952 80.69512195121952 81.79024390243904  81.69756097560978
    "Belgium"        "BEL" "GDP per capita (current US$)"                                         46641.72140170868 45587.96830831899 51863.83115650539 49942.090642488656
    "Belgium"        "BEL" "Unemployment, total (% of total labor force) (modeled ILO estimate)"               5.36             5.555             6.264              5.558
    "Austria"        "AUT" "Life expectancy at birth, total (years)"                              81.89512195121951 81.19268292682928 81.19024390243904  81.09268292682928
    "Austria"        "AUT" "GDP per capita (current US$)"                                         50067.58572658923 48789.49784988721 53517.89045096115  52084.68119533723
    "Austria"        "AUT" "Unemployment, total (% of total labor force) (modeled ILO estimate)"               4.56             5.201             6.459              4.992
    "United Kingdom" "GBR" "Life expectancy at birth, total (years)"                              81.40487804878049 80.35121951219514              80.7  82.05853658536587
    "United Kingdom" "GBR" "GDP per capita (current US$)"                                        42662.535374031075 40217.00901169857 46869.75905841103  45564.15694480128
    "United Kingdom" "GBR" "Unemployment, total (% of total labor force) (modeled ILO estimate)"              3.737             4.472             4.826               3.73
    end
    
    egen which = group(SeriesName), label 
    
    reshape long YR, i(CountryName which) j(year)
    
    drop SeriesName
    
    forval j = 1/3 {
        local varlabel`j' : label (which) `j'
    }
    
    rename YR Series
    
    reshape wide Series , i(CountryName year) j(which)
    
    forval j = 1/3 {
     label var Series`j' "`varlabel`j''"
    }
    
    d

    Comment


    • #3
      Waw. The exact same problem was already explained 17 years ago. Completely missed that tip in Stata Journal. Thank you so much for the code as well as the reference, Nick Cox.

      Comment


      • #4
        it is easy to miss papers, let alone posts. This data layout seems eccentric but it is quite often used for many-country economic time series,

        Comment


        • #5
          Originally posted by Nick Cox View Post
          it is easy to miss papers, let alone posts. This data layout seems eccentric but it is quite often used for many-country economic time series,
          This layout is the default format of the World Bank's World Development Indicators (WDI). When downloading multiple variables for various countries across several years, this is the layout provided. Stata tip 45 by Baum and Cox offers an excellent guidance on how to handle this format.

          A minor contribution is that WDI already includes an option that allows users to customize the arrangement of rows and columns for variables, years, and countries. By selecting years and countries for rows and variables for columns, no data conversion is needed. That said, addressing issues through coding is often preferable, as it helps build problem-solving skills that are beneficial for future challenges.

          Nick Cox , I have a follow-up question. Each country has both a name and a code as identifiers. Your code below uses only CountryName as i, yet it works without any error message. How does Stata recognize that "BEL" corresponds to Belgium? I expected an error, but none appears. Does Stata automatically match other identifiers that change only based on the one used in the reshape command?


          Code:
          reshape long YR, i(CountryName which) j(year)

          Comment


          • #6
            As there is a one-to-one correspondence between name and code, the code gets carried along automagically in the reshape — that’s the way I understand it.

            Comment


            • #7
              I have tried to use the loop, however, I get the error r(9); and do not know what is going wrong. I have the following data structure for the years 1989 up to 2023 for every country and every HS product code. How can I reshape it from wide to long?

              DataType Country HTSNumber QuantityDescription Year1989 Year1990 Year1991 Year1992 Year1993 up to Year2023
              Customs Value China 0101.11.00 number 0 0 0 0 0 etc.
              etc....
              Customs Value Taiwan 0101.11.00 number 0 0 0 0 0 etc.
              etc....
              First Unit of Quantity China 0101.19.00 number 10000 0 10000 0 0 etc.
              etc....
              First Unit of Quantity Taiwan 0101.19.00 number 10000 0 10000 0 0 etc.
              etc....
              Second Unit of Quantity China 0101.20.20 kilograms 0 0 0 0 0 etc.
              etc....
              Second Unit of Quantity Taiwan 0101.20.20 number 0 0 0 0 0 etc.
              etc....
              CIF Import Value China 0101.21.00 dozens 0 0 0 0 0 etc.
              etc....
              CIF Import Value Taiwan 0101.21.00 liters 0 0 0 0 0 etc.
              etc....

              Comment

              Working...
              X