Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Method for Merging Datasets with Differing Structures of Time, Country, and Indicator Variables



    I am in the process of merging two datasets. In the first dataset, countries are listed in the rows, time periods are in the columns, and indicators are presented as separate columns.

    Example generated by -dataex-. For more info, type help dataex
    clear
    input str32 country double(year project historical) str160 histname

    "Mexico" 1985 0 1 "United Mexican States"
    "Mexico" 1986 0 1 "United Mexican States"
    "Mexico" 1987 0 1 "United Mexican States"
    "Mexico" 1988 0 1 "United Mexican States"
    "Mexico" 1989 0 1 "United Mexican States"
    "Mexico" 1990 0 1 "United Mexican States"
    "Mexico" 1991 0 1 "United Mexican States"
    "Suriname" 1960 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1961 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1962 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1963 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1964 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1965 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1966 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1967 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1968 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1969 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1970 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1971 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1972 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1973 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1974 0 0 "Suriname [Dutch colony with self-rule]"
    "Suriname" 1975 0 0 "Republic of Suriname [independent state]"



    In the second dataset, both countries and indicators are listed in the rows, while the time periods are in the columns.

    Example generated by -dataex-. For more info, type help dataex
    clear
    input str44 country strL indicator str51 industry float(v13 v14 v15 v16 v17 v18)
    "Advanced Economies" "Annual Net Emissions/Removals" "1. Energy" 9105.274 8979.709 9386.947 9893.827 9625.024 9257.178
    "Advanced Economies" "Annual Net Emissions/Removals" "1. Energy" 9916.035 9763.397 10172.023 10685.204 10369.862 10015.04
    "Advanced Economies" "Annual Net Emissions/Removals" "1. Energy" 762.8778 735.9459 735.0223 738.6352 692.7999 706.2634
    "Advanced Economies" "Annual Net Emissions/Removals" "1. Energy" 47.88281 47.74208 50.05363 52.7423 52.03802 51.5982
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A. Fuel Combustion Activities" 8925.621 8828.691 9230.592 9726.309 9477.962 9133.35
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A. Fuel Combustion Activities" 9050.106 8947.372 9349.474 9847.461 9597.735 9250.203
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A. Fuel Combustion Activities" 76.69251 71.03089 68.92302 68.50697 67.822235 65.33432
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A. Fuel Combustion Activities" 47.79301 47.65005 49.95861 52.6452 51.95139 51.51926
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A.1. Energy Industries" 2870.688 2946.689 3143.719 3408.1006 3377.95 3264.486
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A.1. Energy Industries" 2879.577 2955.719 3153.2786 3418.302 3388.0764 3274.641
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A.1. Energy Industries" 1.2052807 1.240148 1.3665546 1.4806463 1.5166067 1.465962
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A.1. Energy Industries" 7.683632 7.789904 8.192912 8.720531 8.60963 8.688967
    "Advanced Economies" "Annual Net Emissions/Removals" "1.A.2. Manufacturing Industries and Construction" 2471.111 2208.369 2241.1594 2355.3918 2280.1516 2031.683


    here v13 to upwared years as 1970 to up

    Could you please advise on the best method to merge these two datasets while ensuring consistency in the structure of the merged data?"
    Last edited by Shahla Akram; 25 Sep 2024, 12:41.

  • #2
    Well, before you deal with the divergent structures of the two data sets, there is a bigger problem you have to solve. Although the second data set has a variable whose name is country, it clearly does not actually contain the names of countries. So it is not going to pair up with the country variable in the first data set, no matter how you structure things.

    From the example data, it appears that the variable named country in the second data set contains names of groups of countries. Before you can combine these data sets, you need to create a third data set which is a crosswalk between the groups of countries that appear in the country variable of the second data set and the actual countries in those groups. In creating that third data set, be sure to use country names that are exact typographical matches to the values of the country variable in the first data set and, of course, also preserve the exact typography of the groups of countries from the variable named country in the second data set.

    Once you have that crosswalk created, post back with example data from it, and then we can tackle how to combine everything.

    Comment


    • #3
      I understand now that dataset has a group of countries (or multiple regions/country groupings) along with individual countries,

      clear
      input str44 country strL indicator float(year1970 year1971 year1973)
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.5348403 1.543143 1.5190207
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 2.581285 2.591579 2.625535
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.0386106 1.0405015 1.0983613
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .007834311 .007934365 .008152826
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.1970067 1.1970067 1.2646064
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.2300406 1.230784 1.2999568
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .027024414 .027699016 .02873681
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .006009511 .006078296 .006613667
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .05719284 .05719284 .17570156
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .05733995 .05734001 .1761165
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .0000458 .0000459 .000124228
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .000101272 .000101306 .000290724
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .10166592 .10166592 .0962198
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .10216317 .10216317 .09665548
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .000120769 .000120769 .0000856
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .000376481 .000376481 .000350098
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .2428442 .2428442 .25243843
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .24658427 .24675468 .25678575
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .001777283 .001940697 .002186885
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .001962795 .001969801 .002160436
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .042318 .042318 .04495672
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .04264381 .04264381 .04530201
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 8.38e-06 8.38e-06 8.89e-06
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .000317425 .000317425 .000336409
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .19358265 .19358265 .1997972
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .19692695 .19709736 .203722
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .001750007 .00191342 .002157147
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .001594286 .001601292 .001767643
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .006943528 .006943528 .007684528
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .007013504 .007013504 .007761764
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .0000189 .0000189 .0000209
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .0000511 .0000511 .0000564
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .7953037 .7953037 .7402465
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .8239532 .8245261 .770399
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .025080524 .025591686 .02634012
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .003568963 .003630707 .003812409
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .3378337 .3461365 .25441444
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.3512447 1.360795 1.325578
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.0115861 1.0128024 1.0696244
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .0018248 .001856069 .001539159
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .16998406 .16147064 .18563357
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .1708725 .1623824 .18659404
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .000888462 .000911755 .000960466
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .04699206 .03663264 .067291126
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .04699206 .03663264 .067291126
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .09293332 .09425876 .0857248
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .09293332 .09425876 .0857248
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .03005868 .03057925 .032617643
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .03094714 .031491008 .03357811
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .000888462 .000911755 .000960466
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .029228566 .029228566 .029228566
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 12.998352 12.900493 11.42146
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 10.531366 10.470376 9.336032
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 2.437757 2.400889 2.0562
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" . . .
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.5779673 1.6306072 1.7240137
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" 1.520051 1.5696425 1.6569526
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .05791641 .06096468 .06706113
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .007715499 .007731861 .008457641
      "Afghanistan, Islamic Rep. of" "Annual Net Emissions/Removals" .007715499 .007731861 .008457641
      "Afghanistan, Islamic Rep. of" "Baseline Energy CO2 emissions" . . .
      "Afghanistan, Islamic Rep. of" "Baseline GHG emissions excluding land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Baseline GHG emissions including land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "IMF estimated CO2 emissions under a Business as Usual assumption" . . .
      "Afghanistan, Islamic Rep. of" "IMF estimated GHG emissions excluding land-use, land-use change and forestry under a Business as Usual assumption" . . .
      "Afghanistan, Islamic Rep. of" "IMF estimated GHG emissions including land-use, land-use change and forestry under a Business as Usual assumption" . . .
      "Afghanistan, Islamic Rep. of" "Implied average (of conditional & unconditional) economy-wide Nationally Determined Contribution 2030 target level of Energy CO2 emissions" . . .
      "Afghanistan, Islamic Rep. of" "Implied average (of conditional & unconditional) economy-wide Nationally Determined Contribution 2030 target level of GHG emissions excluding land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Implied average (of conditional & unconditional) economy-wide Nationally Determined Contribution 2030 target level of GHG emissions including land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Implied conditional economy-wide Nationally Determined Contribution 2030 target level of Energy CO2 emissions" . . .
      "Afghanistan, Islamic Rep. of" "Implied conditional economy-wide Nationally Determined Contribution 2030 target level of GHG emissions excluding land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Implied conditional economy-wide Nationally Determined Contribution 2030 target level of GHG emissions including land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Implied unconditional economy-wide Nationally Determined Contribution 2030 target level of Energy CO2 emissions" . . .
      "Afghanistan, Islamic Rep. of" "Implied unconditional economy-wide Nationally Determined Contribution 2030 target level of GHG emissions excluding land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Implied unconditional economy-wide Nationally Determined Contribution 2030 target level of GHG emissions including land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Total GHG emissions excluding land-use, land-use change and forestry" 1.734053 1.7338424 1.733883
      "Afghanistan, Islamic Rep. of" "Total GHG emissions excluding land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Total GHG emissions excluding land-use, land-use change and forestry" 17.336191 17.292793 15.96606
      "Afghanistan, Islamic Rep. of" "Total GHG emissions excluding land-use, land-use change and forestry" 13.090028 13.08052 12.091345
      "Afghanistan, Islamic Rep. of" "Total GHG emissions excluding land-use, land-use change and forestry" 2.5121114 2.478431 2.1408322
      "Afghanistan, Islamic Rep. of" "Total GHG emissions including land-use, land-use change and forestry" 1.734053 1.7338424 1.733883
      "Afghanistan, Islamic Rep. of" "Total GHG emissions including land-use, land-use change and forestry" . . .
      "Afghanistan, Islamic Rep. of" "Total GHG emissions including land-use, land-use change and forestry" 17.336191 17.292793 15.96606
      "Afghanistan, Islamic Rep. of" "Total GHG emissions including land-use, land-use change and forestry" 13.090028 13.08052 12.091345
      "Afghanistan, Islamic Rep. of" "Total GHG emissions including land-use, land-use change and forestry" 2.5121114 2.478431 2.1408322
      "Albania" "Annual Net Emissions/Removals" 4.2379475 4.24625 4.230293
      "Albania" "Annual Net Emissions/Removals" 5.446569 5.455218 5.71591
      "Albania" "Annual Net Emissions/Removals" 1.1789168 1.1792321 1.4558942
      end
      [/CODE]
      ------------------ copy up to and including the previous line ------------------






      and the problem is that when I import the dataset into Stata, year variables (like 2001, 2002) are automatically renamed to v13, v14, etc. I need to reshape the data from wide to long format so that you can perform panel data analysis, but the data does not change as expected.



      forvalues i = 13/73 {
      local year = `i' + 1957 // Assuming v13 corresponds to the year 1970
      rename v`i' y`year'

      }




      reshape long y1970 y1971 y1972 y1973 y1974 y1975 y1976 y1977 y1978 y1979 y1980 y1981 y1982 y1983 y1984 y1985 y1986 y1987 y1988 y1989 y1990 y1991 y1992 y1993 y1994 y1995 y1996 y1997 y1998 y1999 y2000 y2001 y2002 y2003 y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011 y2012 y2013 y2014 y2015 y2016 y2017 y2018 y2019 y2020 y2021 y2022 y2023 y2024 y2025 y2026 y2027 y2028 y2029 y2030 , i( country indicator ) j(year). But it show error

      also write command reshape long y*, i( country indicator ) j(year)
      Attached Files
      Last edited by Shahla Akram; 25 Sep 2024, 15:45.

      Comment


      • #4
        The renaming of the v* variables in the second data set to get a series that begins with 1970 can be done with the following command (no loops needed):
        Code:
        rename v# value#, renumber(1970)
        But you will not be able to -reshape- this data sensibly. You have no uniquely identifying variables in this data set. Notice, for example, that the first four observations in that data set all refer to "country" Advanced Economies, indicator "Annual Net Emissions/Removals" and industry "1.Energy". But these four observations provide completely contradict each other on what the values in each year are. What is it in 1970 (v13)? Is it 9105.274, or 9916.035, or 762.8778, or 47.88281? Which of these observations has the correct values? Or perhaps each is correct as applied to some subsector of the Energy sector. Or perhaps to quarter of the year? Fine, but since you don't have a subsector or quarter variable, you just can't use this data in any sensible way. And the same is true of the next four oobservations: they also contradict each other. And the next four after that. I suspect the entire data set is like that. So you have to figure out what is going on: either you have to remove a lot of the data that is incorrect, or you need to find the additional variable(s) that distinguish the observations in these conflicting groups of observations, and add them to the data.

        Only after you do that can you -reshape- this data. To do that, you will need to get the syntax of the -reshape- command right. The command will look like this:
        Code:
        reshape long value, i(?????) j(year) // NOTE NO * IN THIS SYNTAX, NOR ANY LIST OF THE value VARIABLES
        The -?????- will have to be replaced with the variables that uniquely identify observations. If you fix your data by eliminating all but one of the observations in each conflicting group, then -?????- will be replaced by -country indicator industry-. If you fix your data by adding one or more variables that specify more detail about the applicability of the data, then the replacement for -?????- will need to include those variables as well.

        At that point, to make further progress, you will have to confront the problem I raised in #2: the country variables in the two data sets are not compatible with each other and you need a crosswalk dataset so you can harmonize them. Once that is done, bear in mind that these data sets will require the use of -joinby-, not -merge-, to correctly combine them, because the only variable they will have in common is country, and country does not uniquely identify observations in either data set. So -merge- is out of the question--don't even waste any time thinking about using it. If you are not familiar with -joinby-, do read -help joinby- to understand its syntax and how it works.
        Last edited by Clyde Schechter; 25 Sep 2024, 15:53.

        Comment


        • #5
          Since there are four identifiers available, I have only used two. Thank you very much, Clyde, I have completed the task.

          Comment

          Working...
          X