Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Wide to Long

    I have downloaded Primary Energy Consumption data from Estat.
    The first row bears dates. I want the date to appear long and have PEC as variable name as column header.
    I have checked other responses in the forum and tried to manipulate v2 etc. as d2005, d2006 etc. and continue like that dropping first row and changing strings to numeric etc but still I could not manage.
    Dataex command output is like below. Thanks for helping.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str25 CountryName float(v2 v3) str7(v4 v5) float v6
    ""   2005  2006 "2007 "  "2008 "   2009
    "AT"  100  99.7 "98.4 "  "99.2 "   93.7
    "BE"  100  99.7 "97.6 "  "99.2 "     97
    "BG"  100 103.3 "101.6 " "99.0 "     88
    "CY"  100   104 "109.2 " "115.2 " 111.9
    "CZ"  100 102.3 "102.7 " "100.0 "  94.5
    "DE"  100 103.5 "98.2 "  "99.7 "   93.3
    "DK"  100 107.2 "104.7 " "102.1 "    98
    "EE"  100  98.9 "116.5 " "101.8 "  81.9
    "GR"  100  99.9 "100.1 " "100.6 "  97.2
    "ES"  100 100.4 "102.1 " "98.4 "   90.4
    "FI"  100 109.3 "107.4 " "102.9 "  96.4
    "FR"  100  98.2 "96.9 "  "97.9 "   94.4
    "HR"  100  99.6 "103.2 " "100.6 "  97.9
    "HU"  100  98.7 "96.3 "  "95.5 "   90.9
    "IE"  100 101.2 "106.9 " "104.7 "  99.7
    "IT"  100    99 "98.8 "  "97.4 "   90.7
    "LT"  100  98.1 "100.6 " "102.7 "  98.3
    "LU"  100  98.3 "96.6 "  "96.6 "   90.9
    "LV"  100 103.7 "106.2 " "101.9 "  98.7
    "MT"  100   101 "103.6 " "104.6 "  96.5
    "NL"  100  99.2 "99.0 "  "99.7 "   96.5
    "PL"  100   105 "104.5 " "105.8 " 101.8
    "PT"  100  96.8 "96.0 "  "95.0 "   95.1
    "RO"  100 104.1 "103.8 " "103.5 "  90.5
    "SE"  100  97.4 "96.8 "  "96.4 "   87.9
    "SI"  100  99.2 "100.5 " "106.6 "  94.1
    "SK"  100    99 "94.3 "  "97.5 "   89.1
    end

  • #2
    Hi Ferit
    You were close. A couple of things:
    1. The variable names go into your 'input' line (you had put them in the first line of your data);
    2. It looks like only CountryName should be string as everything else is numeric.
    Here is reshape code.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 CountryName float(PEC2005 PEC2006 PEC2007 PEC2008 PEC2000)
    "AT"  100  99.7  98.4  99.2  93.7
    "BE"  100  99.7  97.6  99.2    97
    "BG"  100 103.3 101.6 99.0     88
    "CY"  100   104 109.2 115.2 111.9
    "CZ"  100 102.3 102.7 100.0  94.5
    "DE"  100 103.5  98.2  99.7  93.3
    "DK"  100 107.2 104.7 102.1    98
    "EE"  100  98.9 116.5 101.8  81.9
    "GR"  100  99.9 100.1 100.6  97.2
    "ES"  100 100.4 102.1  98.4  90.4
    "FI"  100 109.3 107.4 102.9  96.4
    "FR"  100  98.2  96.9  97.9  94.4
    "HR"  100  99.6 103.2 100.6  97.9
    "HU"  100  98.7  96.3  95.5  90.9
    "IE"  100 101.2 106.9 104.7  99.7
    "IT"  100    99  98.8  97.4  90.7
    "LT"  100  98.1 100.6 102.7  98.3
    "LU"  100  98.3  96.6  96.6  90.9
    "LV"  100 103.7 106.2 101.9  98.7
    "MT"  100   101 103.6 104.6  96.5
    "NL"  100  99.2  99.0  99.7  96.5
    "PL"  100   105 104.5 105.8 101.8
    "PT"  100  96.8  96.0  95.0  95.1
    "RO"  100 104.1 103.8 103.5  90.5
    "SE"  100  97.4  96.8  96.4  87.9
    "SI"  100  99.2 100.5 106.6  94.1
    "SK"  100    99  94.3  97.5  89.1
    end
    
    reshape long PEC, i(CountryName) j(year)

    Comment


    • #3
      The fact that v2 and v3 are strings suggests to me that there are some missing values that the dataprovider used symbols for that Stata did not recognize as missing values (common ones are ".." or "NA" or "N/A"). First I would check if that is really what is happening. If I am right, then you want to use destring with the force option. This will make a value numeric if it can (e.g. "10" becomes 10), and missing if the string cannot be turned into a number (e.g. "NA" becomes .)

      Alternatively, this sometimes happens when the dataprovider uses a spreadsheet and adds information (copyright, disclaimers, etc.) at the bottom of the table. If you import the data in Stata, then this "extra information" tends to end up in "extra observations" in one or more variables, forcing those variables to be strings. In those cases you delete those extra "observations" and than use destring.

      The next steps are in the code. Let me know if they need further clarification.

      Code:
      clear
      input str25 CountryName float(v2 v3) str7(v4 v5) float v6
      ""   2005  2006 "2007 "  "2008 "   2009
      "AT"  100  99.7 "98.4 "  "99.2 "   93.7
      "BE"  100  99.7 "97.6 "  "99.2 "     97
      "BG"  100 103.3 "101.6 " "99.0 "     88
      "CY"  100   104 "109.2 " "115.2 " 111.9
      "CZ"  100 102.3 "102.7 " "100.0 "  94.5
      "DE"  100 103.5 "98.2 "  "99.7 "   93.3
      "DK"  100 107.2 "104.7 " "102.1 "    98
      "EE"  100  98.9 "116.5 " "101.8 "  81.9
      "GR"  100  99.9 "100.1 " "100.6 "  97.2
      "ES"  100 100.4 "102.1 " "98.4 "   90.4
      "FI"  100 109.3 "107.4 " "102.9 "  96.4
      "FR"  100  98.2 "96.9 "  "97.9 "   94.4
      "HR"  100  99.6 "103.2 " "100.6 "  97.9
      "HU"  100  98.7 "96.3 "  "95.5 "   90.9
      "IE"  100 101.2 "106.9 " "104.7 "  99.7
      "IT"  100    99 "98.8 "  "97.4 "   90.7
      "LT"  100  98.1 "100.6 " "102.7 "  98.3
      "LU"  100  98.3 "96.6 "  "96.6 "   90.9
      "LV"  100 103.7 "106.2 " "101.9 "  98.7
      "MT"  100   101 "103.6 " "104.6 "  96.5
      "NL"  100  99.2 "99.0 "  "99.7 "   96.5
      "PL"  100   105 "104.5 " "105.8 " 101.8
      "PT"  100  96.8 "96.0 "  "95.0 "   95.1
      "RO"  100 104.1 "103.8 " "103.5 "  90.5
      "SE"  100  97.4 "96.8 "  "96.4 "   87.9
      "SI"  100  99.2 "100.5 " "106.6 "  94.1
      "SK"  100    99 "94.3 "  "97.5 "   89.1
      end
      
      // make v4 and v5 numeric
      destring v4 v5, force replace
      
      // make the names correspond to year
      rename v2 PEC2005
      rename v3 PEC2006
      rename v4 PEC2007
      rename v5 PEC2008
      rename v6 PEC2009
      
      // now we no longer need the first observation (years)
      drop in 1
      
      // now we can do the reshape
      reshape long PEC , i(CountryName) j(year)
      Last edited by Maarten Buis; 20 Jan 2025, 05:44.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thank you for your prompt , to the point helpful responses.

        Comment


        • #5
          As a small tweak to earlier advice, I recommend looking at whatever is inhibiting reading in variables as numeric.


          Code:
          clear
          input str25 CountryName float(v2 v3) str7(v4 v5) float v6
          ""   2005  2006 "2007 "  "2008 "   2009
          "AT"  100  99.7 "98.4 "  "99.2 "   93.7
          "BE"  100  99.7 "97.6 "  "99.2 "     97
          "BG"  100 103.3 "101.6 " "99.0 "     88
          "CY"  100   104 "109.2 " "115.2 " 111.9
          "CZ"  100 102.3 "102.7 " "100.0 "  94.5
          "DE"  100 103.5 "98.2 "  "99.7 "   93.3
          "DK"  100 107.2 "104.7 " "102.1 "    98
          "EE"  100  98.9 "116.5 " "101.8 "  81.9
          "GR"  100  99.9 "100.1 " "100.6 "  97.2
          "ES"  100 100.4 "102.1 " "98.4 "   90.4
          "FI"  100 109.3 "107.4 " "102.9 "  96.4
          "FR"  100  98.2 "96.9 "  "97.9 "   94.4
          "HR"  100  99.6 "103.2 " "100.6 "  97.9
          "HU"  100  98.7 "96.3 "  "95.5 "   90.9
          "IE"  100 101.2 "106.9 " "104.7 "  99.7
          "IT"  100    99 "98.8 "  "97.4 "   90.7
          "LT"  100  98.1 "100.6 " "102.7 "  98.3
          "LU"  100  98.3 "96.6 "  "96.6 "   90.9
          "LV"  100 103.7 "106.2 " "101.9 "  98.7
          "MT"  100   101 "103.6 " "104.6 "  96.5
          "NL"  100  99.2 "99.0 "  "99.7 "   96.5
          "PL"  100   105 "104.5 " "105.8 " 101.8
          "PT"  100  96.8 "96.0 "  "95.0 "   95.1
          "RO"  100 104.1 "103.8 " "103.5 "  90.5
          "SE"  100  97.4 "96.8 "  "96.4 "   87.9
          "SI"  100  99.2 "100.5 " "106.6 "  94.1
          "SK"  100    99 "94.3 "  "97.5 "   89.1
          end
          
          ds v*, has(type string)
          
          foreach v in `r(varlist)' { 
              di "`v'"
              list `v' if missing(real(`v'))
              di _n 
          }
          The logic is that real() applied to a string variable will yield missing if the string value doesn't have a numeric equivalent.

          See e.g. https://www.statalist.org/forums/for...iable-problems if you want to read more on this approach.

          I am optimistic that Maarten Buis is right, so that whatever the force option would force to numeric missing is really missing in some sense, but exceptions are what you should want to know about.

          Comment


          • #6
            Dear Maarten, is there an easier way to rename variables if the years are so long to do it one by one.

            // make the names correspond to year rename v2 PEC2005 rename v3 PEC2006 rename v4 PEC2007 rename v5 PEC2008 rename v6 PEC2009

            Comment


            • #7
              Code:
              help rename groups
              Code:
              . clear 
              
              . set obs 1 
              Number of observations (_N) was 0, now 1.
              
              . forval j = 2/21 {
                2.         gen v`j' = 42 
                3. }
              
              . 
              . ds 
              v2   v3   v4   v5   v6   v7   v8   v9   v10  v11  v12  v13  v14  v15  v16  v17  v18  v19  v20  v21
              
              . 
              . rename (v*) (PEC#), renumber(2005)
              
              . 
              . ds 
              PEC2005  PEC2007  PEC2009  PEC2011  PEC2013  PEC2015  PEC2017  PEC2019  PEC2021  PEC2023
              PEC2006  PEC2008  PEC2010  PEC2012  PEC2014  PEC2016  PEC2018  PEC2020  PEC2022  PEC2024

              Comment


              • #8
                Nick, I'm embarrassed to admit I only just learned there is a 'help rename group'. The renumber() option is cool!

                Comment

                Working...
                X