Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • putting together two stata files with the same variables

    Dear Community,

    I got stuck with my stata work. I downloaded data from world bank in a stata format. Since I don't have a premium account I faced the limit of 100 000 rows per query. I had to run few queries. Now I want to put my data together. The variable names are the same. I tried the command:
    Code:
     append using part 2
    However something bizzare happened to my data. I should have 200 000 rows since each file has 100 000 rows. Now I have 300 000. I have data on bilateral trade disaggregated at the sectoral level. I saw I don't have sectors from the second file.

    Here is the sample of my data.

    part1:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(ReporterISO3 PartnerISO3 Year TradeFlowName) double TradeValuein1000USD long ProductCode
    1  1 2004 2          0 1
    1  4 1990 2    108.335 1
    1  4 1991 1      4.311 1
    1  4 1991 2    269.155 1
    1  4 1992 2    196.942 1
    1  4 1994 2     37.628 1
    1  4 2001 2      9.091 1
    1  6 1992 1      2.934 1
    1  6 1993 1    157.479 1
    1  6 1996 1   1699.839 1
    1  6 1997 1    421.818 1
    1  6 1998 1    550.912 1
    1  6 2000 1    875.758 1
    1  6 2001 1   2425.481 1
    1  6 2002 1    8856.15 1
    1  6 2003 1  48804.716 1
    1  6 2004 1 100278.489 1
    1  6 2005 1   43344.21 1
    1  6 2006 1  11241.089 1
    1  6 2007 1  12753.576 1
    1  6 2008 1   21328.78 1
    1  6 2009 1  55453.883 1
    1  6 2010 1 105251.078 1
    1  6 2011 1 122027.211 1
    1  6 2012 1 127850.221 1
    1  6 2013 1 134083.634 1
    1  6 2014 1 221128.477 1
    1  6 2015 1 136909.302 1
    1  6 2016 1 121948.976 1
    1  6 2017 1 108338.913 1
    1  6 2018 1 189768.543 1
    1  6 2019 1 233712.047 1
    1  6 2020 1  220276.65 1
    1  6 2021 1 221025.737 1
    1  9 1991 2    175.965 1
    1  9 1995 2    155.665 1
    1 12 1990 2    311.575 1
    1 12 1991 2    195.714 1
    1 15 1996 1    642.202 1
    1 15 1998 1   3582.161 1
    1 15 1999 1   2387.899 1
    1 15 2000 1   5183.388 1
    1 15 2001 1   8715.962 1
    1 15 2002 1   12984.31 1
    1 15 2003 1  14827.414 1
    1 15 2004 1    8744.01 1
    1 15 2005 1  15719.219 1
    1 15 2006 1  43873.718 1
    1 15 2007 1  19505.286 1
    1 15 2008 1  25399.049 1
    1 15 2009 1  16234.385 1
    1 15 2010 1  26054.696 1
    1 15 2011 1  41450.256 1
    1 15 2012 1  39827.533 1
    1 15 2013 1  70122.522 1
    1 15 2014 1  54217.375 1
    1 15 2015 1  65022.081 1
    1 15 2016 1  63298.564 1
    1 15 2017 1  32811.559 1
    1 15 2018 1  55200.422 1
    1 15 2019 1  53931.855 1
    1 15 2020 1  36783.712 1
    1 15 2021 1  26576.476 1
    1 17 1990 1  17376.426 1
    1 17 1991 1  18625.824 1
    1 17 1992 1  10922.764 1
    1 17 1993 1   7803.298 1
    1 17 1994 1   5507.758 1
    1 17 1995 1   5068.351 1
    1 17 1996 1    8133.01 1
    1 17 1997 1  11665.709 1
    1 17 1998 1   7469.783 1
    1 17 1999 1   5017.518 1
    1 17 2000 1   5688.788 1
    1 17 2001 1   7742.825 1
    1 17 2002 1   6110.914 1
    1 17 2003 1  12150.883 1
    1 17 2004 1  11952.083 1
    1 17 2005 1  17292.559 1
    1 17 2006 1  14911.658 1
    1 17 2007 1  15406.586 1
    1 17 2008 1  14665.026 1
    1 17 2009 1   10922.82 1
    1 17 2010 1   16465.32 1
    1 17 2011 1  20145.976 1
    1 17 2012 1   16522.59 1
    1 17 2013 1  19184.654 1
    1 17 2014 1  12469.467 1
    1 17 2015 1   7899.563 1
    1 17 2016 1  14007.982 1
    1 17 2017 1  29287.195 1
    1 17 2018 1  25674.329 1
    1 17 2019 1  28284.491 1
    1 17 2020 1  23302.405 1
    1 17 2021 1  21617.871 1
    1 18 2001 1    976.877 1
    1 18 2002 1    555.283 1
    1 18 2003 1   1991.759 1
    1 18 2004 1    791.957 1
    1 18 2005 1   1462.872 1
    end
    label values ReporterISO3 ReporterISO3
    label def ReporterISO3 1 "AUS", modify
    label values PartnerISO3 PartnerISO3
    label def PartnerISO3 1 "AUS", modify
    label def PartnerISO3 4 "CAN", modify
    label def PartnerISO3 6 "CHN", modify
    label def PartnerISO3 9 "DNK", modify
    label def PartnerISO3 12 "FRA", modify
    label def PartnerISO3 15 "ISR", modify
    label def PartnerISO3 17 "JPN", modify
    label def PartnerISO3 18 "KOR", modify
    label values TradeFlowName TradeFlowName
    label def TradeFlowName 1 "Export", modify
    label def TradeFlowName 2 "Import", modify
    label values ProductCode ProductCode
    label def ProductCode 1 "0011", modify
    part2:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(ReporterISO3 PartnerISO3 Year TradeFlowName) double TradeValuein1000USD long ProductCode
    1  1 2005 2         0 1
    1  1 2006 2         0 1
    1  1 2007 2         0 1
    1  1 2008 2         0 1
    1  1 2009 2         0 1
    1  1 2010 2         0 1
    1  1 2011 2         0 1
    1  1 2012 2         0 1
    1  1 2014 2         0 1
    1  1 2017 2         0 1
    1  1 2019 2         0 1
    1  3 1999 1  2817.592 1
    1  4 1995 2      .539 1
    1  4 1997 2      2.51 1
    1  4 2008 2       .92 1
    1  4 2013 1    10.952 1
    1  4 2014 1    55.902 1
    1  5 2011 2   178.768 1
    1  6 2001 2    10.344 1
    1  6 2002 1  1720.889 1
    1  6 2002 2    25.491 1
    1  6 2003 2   344.118 1
    1  6 2004 2    468.66 1
    1  6 2005 2   766.071 1
    1  6 2006 2   455.664 1
    1  6 2007 2   443.602 1
    1  6 2008 2   915.295 1
    1  6 2009 1 27827.017 1
    1  6 2009 2   785.758 1
    1  6 2010 1 79782.544 1
    1  6 2010 2  1659.316 1
    1  6 2011 1 65345.852 1
    1  6 2011 2  1795.163 1
    1  6 2012 1 41655.541 1
    1  6 2012 2  1932.152 1
    1  6 2013 1 19289.655 1
    1  6 2013 2  2188.272 1
    1  6 2014 2  2579.145 1
    1  6 2015 2  2900.595 1
    1  6 2016 2  2226.117 1
    1  6 2017 2  3056.925 1
    1  6 2018 2  2968.075 1
    1  6 2019 2  1091.788 1
    1  6 2020 2  1308.864 1
    1  6 2021 2  2083.728 1
    1  8 1990 2     2.983 1
    1  8 1991 2     4.424 1
    1  8 1992 2      .988 1
    1  8 1994 2      3.78 1
    1  8 1995 2    12.016 1
    1  8 1997 2     4.394 1
    1  8 2000 2     2.996 1
    1  8 2012 2    50.332 1
    1  8 2014 2    17.912 1
    1  8 2018 2    35.404 1
    1  8 2021 2     1.078 1
    1 10 1998 1  1404.303 1
    1 10 1999 1   8020.24 1
    1 10 2000 1  6740.957 1
    1 10 2001 1  8239.368 1
    1 10 2002 1  7912.797 1
    1 10 2003 1 11160.311 1
    1 10 2004 1 12530.759 1
    1 10 2005 1 17654.176 1
    1 10 2006 1 19484.108 1
    1 10 2007 1 16500.667 1
    1 10 2020 2  3919.537 1
    1 12 1991 1   5476.53 1
    1 12 1992 1  7492.556 1
    1 12 1993 1  4595.625 1
    1 12 1995 1   530.849 1
    1 12 1995 2       .72 1
    1 12 2000 1   996.083 1
    1 12 2001 1  3704.596 1
    1 12 2002 1  1206.225 1
    1 12 2003 1  3055.375 1
    1 13 1990 2    303.05 1
    1 13 1991 2   228.725 1
    1 13 1992 2   400.466 1
    1 13 1993 2   353.085 1
    1 13 1994 2    130.42 1
    1 13 1996 2       .85 1
    1 13 2001 1  1968.841 1
    1 13 2001 2     8.356 1
    1 13 2003 2     23.84 1
    1 13 2004 2    28.293 1
    1 13 2005 2    18.046 1
    1 13 2006 2    22.922 1
    1 13 2007 2    24.919 1
    1 13 2008 2    14.833 1
    1 13 2009 2    25.279 1
    1 13 2010 2    15.744 1
    1 13 2011 2    21.255 1
    1 13 2012 2    14.008 1
    1 13 2013 2    11.789 1
    1 13 2014 2    22.041 1
    1 13 2015 2    22.898 1
    1 13 2016 2    60.994 1
    1 13 2017 2      30.5 1
    1 13 2018 2      33.6 1
    end
    label values ReporterISO3 ReporterISO3
    label def ReporterISO3 1 "AUS", modify
    label values PartnerISO3 PartnerISO3
    label def PartnerISO3 1 "AUS", modify
    label def PartnerISO3 3 "BEL", modify
    label def PartnerISO3 4 "CAN", modify
    label def PartnerISO3 5 "CHE", modify
    label def PartnerISO3 6 "CHN", modify
    label def PartnerISO3 8 "DEU", modify
    label def PartnerISO3 10 "ESP", modify
    label def PartnerISO3 12 "FRA", modify
    label def PartnerISO3 13 "GBR", modify
    label values TradeFlowName TradeFlowName
    label def TradeFlowName 1 "Export", modify
    label def TradeFlowName 2 "Import", modify
    label values ProductCode ProductCode
    label def ProductCode 1 "3221", modify



  • #2
    Perhaps you appended the same data twice? Check for duplicates.

    Code:
    duplicates drop *, force
    If not, you have to inspect the combined dataset.

    Comment


    • #3
      Thank you for your answer. I tried it one more time. This time I didn't have any duplicates. and obtained 200 000 rows. Now I see why stata didn't combine it like I wished. The problem is that I have long variables. E.g. I have sector 0011 as 1 and label is "0011". In the part 1 I have the data till sector 2929 which is according to stata number 208 with label "2929". In part 2 I have data from sector 3221 which stata reads as 1 and labels "3221" till sector 6129 which is for stata a number 141. When I combine part1 with part2 stata takes sectors 3221-6129 and gives them labels "0011" for the first etc.

      I tried to convert long into string by typing
      Code:
      tostring ReporterISO3, gen(reporter)
      But I got numbers from 1 till 208 for part 1 instead of 0011-2929.

      Comment


      • #4
        Apparently the variable was converted from a string variable to a numeric variable using encode separately in each of the two datasets, so in each case the encoding started at 1.

        If you did the encode, you should put off doing so until after the datasets are appended. If the World Bank did the encode in each dataset then shame on them.

        In any event, you can reverse the effects of encode with decode. The following command should be run separately in each dataset before appending them
        Code:
        decode ReporterISO3, generate(reporter)
        and this should do what you need, see the output of
        Code:
        help decode
        for details.

        Or, if it was you who did the encode, I'd recommend that you append the downloaded datasets as they were provided, then do the encode and other data preparation tasks on the appended data.
        Last edited by William Lisowski; 18 Mar 2022, 10:04.

        Comment


        • #5
          Thank you William. It worked! No it wasn't me who did encode

          Comment

          Working...
          X