Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loops for Copying Variable names?

    Good Evening Statalist,

    I have this dataset below (small snippet of a larger set).

    Code:
    input str13 A str17 B str13 C str17(D E F)
    "United States" "United States"     "United States" "United States"     "United States"     "United States"    
    "Latvia"        "Lebanon"           "Lesotho"       "Liberia"           "Libya"             "Lithuania"        
    "306930.679"    "79745.083"         "146463.621"    "47212.263"         ""                  "148638.653"       
    "160166.035"    "92366.66099999999" "224044.424"    "44717.257"         ""                  "181852.287"       
    "209911.079"    "65999.47"          "342916.727"    "47565.578"         ""                  "319909.315"       
    "402127.868"    "96023.288"         "419667.164"    "62020.299"         ""                  "377984.417"       
    "393820.304"    "78064.427"         "492916.794"    "89327.53200000001" "345890.234"        "526050.688"       
    "386179.552"    "91382.83"          "421162.558"    "97128.84"          "1657014.694"       "684444.485"       
    "316645.799"    "92733.285"         "428555.478"    "144436.479"        "2629364.881"       "611151.251"       
    "354748.64"     "110106.382"        "461830.937"    "118431.626"        "3540204.282"       "495967.455"       
    "242113.62"     "102171.373"        "393199.228"    "147594.515"        "4349963.119"       "795659.618"       
    "151863.228"    "79335.322"         "315889.223"    "84722.024"         "1981281.659"       "619556.909"       
    "205856.497"    "87770.742"         "311795.845"    "184387.544"        "2186739.543"       "665087.0699999999"
    "374815.397"    "82917.895"         "394700.104"    "161518.682"        "666411.2120000001" "1093273.328"      
    "246245.171"    "81186.78999999999" "310571.285"    "143977.124"        "2493042.856"       "1176197.244"      
    "272668.88"     "91677.058"         "351402.897"    "96490.694"         "2558215.497"       "1554352.566"      
    "286506.172"    "75856.026"         "372885.118"    "87367.33500000001" "231534.712"        "1137632.301"      
    "317724.024"    "96389.17200000001" "344158.896"    "47175.347"         "167148.641"        "1114446.726"      
    "350260.698"    "110837.231"        "319856.592"    "67780.60400000001" "245277.491"        "1241041.518"      
    end
    The numerical values in the columns are values for, say, US-Latvia 2000-2016. I have to put together a dataset, which is rather easy to do. But these values come from a large dataset. I would like to put the US in one column, and the other countries in another column respectively. I also would like another column with the years 2000-2016. For example, it should look more or less like this
    Code:
    input str4 A str13 B str11 C double D
    "2000" "United States" "Afghanistan"   901.169
    "2001" "United States" "Afghanistan"   791.872
    "2002" "United States" "Afghanistan"  4746.212
    "2003" "United States" "Afghanistan" 62735.495
    "2004" "United States" "Afghanistan" 25091.199
    "2005" "United States" "Afghanistan" 67649.574
    "2006" "United States" "Afghanistan"  45827.15
    "2007" "United States" "Afghanistan" 76810.325
    "2008" "United States" "Afghanistan" 86906.964
    "2009" "United States" "Afghanistan"  124107.6
    "2010" "United States" "Afghanistan" 87474.254
    "2011" "United States" "Afghanistan" 26636.658
    "2012" "United States" "Afghanistan" 36759.941
    "2013" "United States" "Afghanistan" 45514.058
    "2014" "United States" "Afghanistan" 72809.599
    "2015" "United States" "Afghanistan" 24493.917
    "2016" "United States" "Afghanistan" 34458.766
    end
    How would I begin to go about this process? I couldn't find too much about copying string variables. I recently started reading about loops. Would this be a case for "foreach?"

  • #2
    Here I make heavy use of local macros, so you need to run the entire example in one go.

    Code:
    clear
    input str13 A str17 B str13 C str17(D E F)
    "United States" "United States"     "United States" "United States"     "United States"     "United States"    
    "Latvia"        "Lebanon"           "Lesotho"       "Liberia"           "Libya"             "Lithuania"        
    "306930.679"    "79745.083"         "146463.621"    "47212.263"         ""                  "148638.653"       
    "160166.035"    "92366.66099999999" "224044.424"    "44717.257"         ""                  "181852.287"       
    "209911.079"    "65999.47"          "342916.727"    "47565.578"         ""                  "319909.315"       
    "402127.868"    "96023.288"         "419667.164"    "62020.299"         ""                  "377984.417"       
    "393820.304"    "78064.427"         "492916.794"    "89327.53200000001" "345890.234"        "526050.688"       
    "386179.552"    "91382.83"          "421162.558"    "97128.84"          "1657014.694"       "684444.485"       
    "316645.799"    "92733.285"         "428555.478"    "144436.479"        "2629364.881"       "611151.251"       
    "354748.64"     "110106.382"        "461830.937"    "118431.626"        "3540204.282"       "495967.455"       
    "242113.62"     "102171.373"        "393199.228"    "147594.515"        "4349963.119"       "795659.618"       
    "151863.228"    "79335.322"         "315889.223"    "84722.024"         "1981281.659"       "619556.909"       
    "205856.497"    "87770.742"         "311795.845"    "184387.544"        "2186739.543"       "665087.0699999999"
    "374815.397"    "82917.895"         "394700.104"    "161518.682"        "666411.2120000001" "1093273.328"      
    "246245.171"    "81186.78999999999" "310571.285"    "143977.124"        "2493042.856"       "1176197.244"      
    "272668.88"     "91677.058"         "351402.897"    "96490.694"         "2558215.497"       "1554352.566"      
    "286506.172"    "75856.026"         "372885.118"    "87367.33500000001" "231534.712"        "1137632.301"      
    "317724.024"    "96389.17200000001" "344158.896"    "47175.347"         "167148.641"        "1114446.726"      
    "350260.698"    "110837.231"        "319856.592"    "67780.60400000001" "245277.491"        "1241041.518"      
    end
    
    // store number of variables (in this case country pairs) in the data
    local k = c(k)
    
    // store the names of the two countries in local macros
    // and rename the variables for reshape
    local i = 1
    foreach var of varlist * {
        local ego`i'   = `var'[1]
        local alter`i' = `var'[2]
        rename `var' trade`i++'
    }
    
    // we stored the variable names, so we don't need the first two observations
    drop in 1/2
    
    // create year
    gen year = 1999 + _n
    
    // put the values underneath one another
    reshape long trade, i(year) j(pair)
    
    //create two variables for the country pairs
    gen str40 ego = ""
    gen str40 alter = ""
    forvalues i = 1/`k' {
        replace ego = "`ego`i''" if pair == `i'
        replace alter = "`alter`i''" if pair == `i'
    }
    
    //clean up
    compress
    order year pair ego alter trade
    sort ego alter year
    
    // turn trade from string (text) to numeric (numbers)
    destring trade, replace
    
    // admire the result
    list , sepby(alter)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Originally posted by Maarten Buis View Post
      Here I make heavy use of local macros, so you need to run the entire example in one go.

      Code:
      clear
      input str13 A str17 B str13 C str17(D E F)
      "United States" "United States" "United States" "United States" "United States" "United States"
      "Latvia" "Lebanon" "Lesotho" "Liberia" "Libya" "Lithuania"
      "306930.679" "79745.083" "146463.621" "47212.263" "" "148638.653"
      "160166.035" "92366.66099999999" "224044.424" "44717.257" "" "181852.287"
      "209911.079" "65999.47" "342916.727" "47565.578" "" "319909.315"
      "402127.868" "96023.288" "419667.164" "62020.299" "" "377984.417"
      "393820.304" "78064.427" "492916.794" "89327.53200000001" "345890.234" "526050.688"
      "386179.552" "91382.83" "421162.558" "97128.84" "1657014.694" "684444.485"
      "316645.799" "92733.285" "428555.478" "144436.479" "2629364.881" "611151.251"
      "354748.64" "110106.382" "461830.937" "118431.626" "3540204.282" "495967.455"
      "242113.62" "102171.373" "393199.228" "147594.515" "4349963.119" "795659.618"
      "151863.228" "79335.322" "315889.223" "84722.024" "1981281.659" "619556.909"
      "205856.497" "87770.742" "311795.845" "184387.544" "2186739.543" "665087.0699999999"
      "374815.397" "82917.895" "394700.104" "161518.682" "666411.2120000001" "1093273.328"
      "246245.171" "81186.78999999999" "310571.285" "143977.124" "2493042.856" "1176197.244"
      "272668.88" "91677.058" "351402.897" "96490.694" "2558215.497" "1554352.566"
      "286506.172" "75856.026" "372885.118" "87367.33500000001" "231534.712" "1137632.301"
      "317724.024" "96389.17200000001" "344158.896" "47175.347" "167148.641" "1114446.726"
      "350260.698" "110837.231" "319856.592" "67780.60400000001" "245277.491" "1241041.518"
      end
      
      // store number of variables (in this case country pairs) in the data
      local k = c(k)
      
      // store the names of the two countries in local macros
      // and rename the variables for reshape
      local i = 1
      foreach var of varlist * {
      local ego`i' = `var'[1]
      local alter`i' = `var'[2]
      rename `var' trade`i++'
      }
      
      // we stored the variable names, so we don't need the first two observations
      drop in 1/2
      
      // create year
      gen year = 1999 + _n
      
      // put the values underneath one another
      reshape long trade, i(year) j(pair)
      
      //create two variables for the country pairs
      gen str40 ego = ""
      gen str40 alter = ""
      forvalues i = 1/`k' {
      replace ego = "`ego`i''" if pair == `i'
      replace alter = "`alter`i''" if pair == `i'
      }
      
      //clean up
      compress
      order year pair ego alter trade
      sort ego alter year
      
      // turn trade from string (text) to numeric (numbers)
      destring trade, replace
      
      // admire the result
      list , sepby(alter)
      Hello Maarten,

      Thank you for your help! Because I am new to Stata programming, I've been trying to go line-by-line in order to understand the code. My first question revolves around
      Code:
      local i = 1
      .

      Why is the local here (i) being stored as 1?

      Comment


      • #4
        While Maarten's solution is a reasonable one using loop, stack provides a concise, direct (...and might-be easier) solution.

        Code:
        stack *, into(trade) clear
        bys _stack: gen year =1997+_n
        by _stack: gen ego =trade[1]
        by _stack: gen alter =trade[2]
        by _stack: drop if _n<3

        Comment


        • #5
          Originally posted by Romalpa Akzo View Post
          While Maarten's solution is a reasonable one using loop, stack provides a concise, direct (...and might-be easier) solution.

          Code:
          stack *, into(trade) clear
          bys _stack: gen year =1997+_n
          by _stack: gen ego =trade[1]
          by _stack: gen alter =trade[2]
          by _stack: drop if _n<3
          Wow. Thanks! Is there a reason why the year begins at 1997? Just curious.

          Comment


          • #6
            Since the starting year (2000) is corresponding with the "row" 3 (_n=3), after names of two relevant countries.

            Comment


            • #7
              Originally posted by Romalpa Akzo View Post
              Since the starting year (2000) is corresponding with the "row" 3 (_n=3), after names of two relevant countries.
              Thank you, Romapla. I found another dataset that is similar, but organized differently.
              Code:
              Example generated by -dataex-. To install: ssc install dataex
              clear
              input str11 A str20 B double(C D E F G H I J K F)
              "Afghanistan" "Australia"                10.692      8.462    648.457    111.502          .          .          .    1296.26    455.832    111.502
              "Afghanistan" "Austria"                       .          .     13.975          .          .          .          .     51.916          .          .
              "Afghanistan" "Azerbaijan"                    .          .          .          .          .          .          .       3.41          .          .
              "Afghanistan" "Bangladesh"                    .          .          .          .          .          .          .      618.7    177.188          .
              "Afghanistan" "Belgium"                2765.806   1365.903    839.983    923.526          .          .          .    354.154   1231.243    923.526
              "Afghanistan" "Brazil"                 1207.107    305.216          .        494          .          .          .      107.5          .        494
              "Afghanistan" "Bulgaria"                      .          .          .          .          .          .          .      7.906          .          .
              "Afghanistan" "Canada"                        .    315.234    478.863    882.438          .          .          .    771.752    609.003    882.438
              "Afghanistan" "China"                  1942.528   3590.181  11691.878   5872.095   4796.438  20277.023  15452.716  10150.949   4754.161   5872.095
              "Afghanistan" "Czech Republic"                .          .          .          .          .          .          .      9.182          .
              I'm trying to get a similar organization again. In other words to look like what I need throughout this thread (i.e. second box in first post). Would stack help here too, or is the loop better? I think what keeps throwing me off (for some reason) is when I try to make connections from https://www.stata.com/manuals13/dstack.pdf is that I need to copy "Afghanistan" multiple times and "Australia" (and every country) 17 times vertically each time.

              Comment


              • #8
                For this dataset, reshape is the appropriate tool.
                Code:
                ren (C-L) (trade#), addnumber
                reshape long trade, i(A B) j(year)
                replace year = year + 1999

                Comment


                • #9
                  Originally posted by Romalpa Akzo View Post
                  For this dataset, reshape is the appropriate tool.
                  Code:
                  ren (C-L) (trade#), addnumber
                  reshape long trade, i(A B) j(year)
                  replace year = year + 1999
                  Hello Romalpa

                  Thank you, but unfortunately, it doesn't give the output that I need. This is what is looks like when I run the code.
                  Code:
                  input str7 A str30 B int year double(trade M N O P Q R S)
                  "Denmark" "Afghanistan"         2000    722.256  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2001    812.268  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2002   9439.604  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2003   7412.093  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2004  17684.347  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2005   8416.686  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2006   9288.185  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2007  11312.212  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2008  10741.655  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Afghanistan"         2009  17623.357  14255.143  22323.391  19154.118  20752.544  20199.518  12421.702   7934.244
                  "Denmark" "Albania"             2000    3933.69   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2001   2909.744   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2002   2002.912   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2003   5037.053   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2004   3385.042   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2005   4758.789   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2006    2143.04   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2007   4086.104   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2008   5184.356   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Albania"             2009   3696.328   5503.063   3413.328   3314.605   3006.301   4705.196   4874.977   6632.136
                  "Denmark" "Algeria"             2000  20936.033  75171.552  97218.348  86001.219  99211.435  102271.77  81500.511   79502.
                  It has to look similar to
                  Code:
                  input int(year pair) str14 host str30 sending double export
                  2000 1 "Czech Republic" "Afghanistan"       680.626
                  2001 1 "Czech Republic" "Afghanistan"       156.715
                  2002 1 "Czech Republic" "Afghanistan"        832.13
                  2003 1 "Czech Republic" "Afghanistan"      1449.339
                  2004 1 "Czech Republic" "Afghanistan"      5226.131
                  2005 1 "Czech Republic" "Afghanistan"      4741.751
                  2006 1 "Czech Republic" "Afghanistan"          4452
                  2007 1 "Czech Republic" "Afghanistan"      7562.583
                  2008 1 "Czech Republic" "Afghanistan"     47223.861
                  2009 1 "Czech Republic" "Afghanistan"     30416.753
                  2010 1 "Czech Republic" "Afghanistan"     20352.205
                  2011 1 "Czech Republic" "Afghanistan"     13554.582
                  2012 1 "Czech Republic" "Afghanistan"     11019.709
                  2013 1 "Czech Republic" "Afghanistan"      8221.139
                  2014 1 "Czech Republic" "Afghanistan"      8659.594
                  2015 1 "Czech Republic" "Afghanistan"      6300.698
                  2016 1 "Czech Republic" "Afghanistan"      2573.769
                  I'm wondering if there should be an extra line of code missing for the reshape example that you gave

                  Comment


                  • #10
                    My code in #8 provide the exact output for your example in #7, which has not included all the "columns" of your original data (only A to L, not including M-S). The below code should help you out.
                    Code:
                    ds A B, not
                    ren(`r(varlist)') (trade#), addnumber(2000)
                    reshape long trade, i(A B) j(year)

                    Comment


                    • #11
                      Originally posted by Romalpa Akzo View Post
                      My code in #8 provide the exact output for your example in #7, which has not included all the "columns" of your original data (only A to L, not including M-S). The below code should help you out.
                      Code:
                      ds A B, not
                      ren(`r(varlist)') (trade#), addnumber(2000)
                      reshape long trade, i(A B) j(year)
                      Thank you, Romalpa! This is very helpful

                      Comment

                      Working...
                      X