Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Merging Yearly Data Sets with Different Formats in Stata

    Hello Stata Community,

    I am currently working with two datasets that I need to merge, but I'm encountering some challenges due to their differing formats.

    The first dataset contains annual data for various countries with several variables including a unique country code, year, economy, adult population, and two other variables (Var1 and Var2). Here's how the data is structured:

    Code:
    Country code | Year | Economy | Adult population | Var1 | Var2 AFG | 2009 | Afghanistan | 14,039,528 | 3,653 | 0.26 AFG | 2010 | Afghanistan | 14,444,996 | 4,759 | 0.33 ...

    The second dataset is in a matrix format where each row represents a country and each column represents a year, with the cell values corresponding to a specific variable. For example:

    Code:
    Country | 2008 | 2009 | 2010 | ... | 2015 Afghanistan | 3.153 | 3.246 | 3.087 | ... | 3.108 ...

    I would like to merge the second dataset into the first one by year and country, effectively transforming the matrix into a long format and appending it to the corresponding country and year in the first dataset. Additionally, I need to create a unique country code for each country to facilitate the merge.

    Could you please advise on the most efficient way to achieve this in Stata? Specifically, I am looking for guidance on:
    1. Transforming the second dataset from a wide to a long format to match the first dataset's structure.
    2. Generating a unique country code for each country
    3. Merging the two datasets based on the country code and year.
    Any suggestions on the appropriate Stata commands or procedures to use would be greatly appreciated.

    Thank you for your assistance!

  • #2
    The examples you show are not possible Stata data sets, because what presumably would be variable names include things that cannot be legal variable names in Stata. It would have been far more helpful had you followed the advice in the Forum FAQ, which all members are asked to read before posting, and used the -dataex- command to show brief examples from the two data sets. Here I will try to give you some useful guidance on how to proceed, but because the real Stata data sets may differ from what I'm assuming they are, this code may not actually work. You may need to adapt it. If so, and if you need assistance with that, do use -dataex- to show examples from both data sets when posting back, and whenever you seek help with code in the future. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    I'm going to assume that the actual names of the variables in data set 2 are country, v2008, v2009, v2010, ..., v2015. Then
    Code:
    use data_set_2, clear
    reshape long v, i(country) j(year)
    You will probably want to then -rename- v to something more descriptive of what it represents in the real world.

    As for generating a new variable containing a country code, I recommend you get the -kountry- package, by Rafal Raciborski, available from SSC, which does this job nicely. Do this in both data sets. Use the same variable name for the newly created variable in both sets.

    In data set 1, rename the year variable, if necessary, to match that of data set 2. With all of that done, you should be able to run -merge 1:1 country_code year- to combine the two data sets.


    Comment

    Working...
    X