Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshaping dataset

    Hello StataList experts,
    I'm trying to re-shape the dataset in Sheet2 (attached), which is basically the energy mix in the import bill, as I'm trying to estimate the oil price elasticity using rolling window regression.
    First, stata couldn't read the date variable, so it gave him an letter.
    then, Stata complains that "variable key contains missing values
    r(498);"

    the code:////////
    reshape wide E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP, i(key item variable) j(key)
    ///////////

    Any idea how to do it?
    Attached Files
    Last edited by sarah altalafha; 10 Feb 2024, 11:53.

  • #2
    You can't reshape a spreadsheet. You have to have a Stata data set for that. So, even if I were to download your Excel file (which I won't, for security reasons), I still wouldn't be able to help you because I would not know how it looked once it was imported to Stata. So, to get help with your problem, you need to post example data from your Stata import of this spreadsheet. Please use the -dataex- command to do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Note: You have too many variables for -dataex- to deal with, but I'm sure they aren't all necessary to figure out the problem. When you use -dataex-, specify that you want to show the variables named key, item, variable, and a few others, such as E through H. That should give us a good enough sample to work with and help you.

    I will give you one specific piece of advice about your -reshape- code: the same variable cannot appear in both the -i()- and -j()- options. So if key really is the -j()- variable, you have to remove it from -i()-. If it really belongs in -i()- then you have to have some other variable for -j()-. But let's see the data example and we can probably figure out what variables go where.

    Finally, before you post back, I strongly urge you to read the Forum FAQ. It has excellent advice on ways to maximize the benefit you get from your Statalist interactions. Please pay specific attention to the parts that describe the best ways to share example data and code.

    Comment


    • #3
      Sorry, I know that I have to post the data using dataex, but I didn't know that I can specify the variables.

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int key str18 item str8 variable str7 unit double(E F G H I J)
        1 "crude_oil"          "quantity" "toe"      550.7274352795955  375.0898057540679  375.5555322115111  540.2786098289057 378.83281226821657  372.5758877546652
        1 "crude_oil"          "price "   "usd/ton"  .6466636291576937  .6742155943813479  .7053021194367592  .6839703608278963  .6210151899989299  .5402104648396386
        2 "petroleum_products" "quantity" "toe"      321533.0917068233  382479.9388703484  501435.9487273465   382619.410919943   334996.168716302  276691.7414267565
        2 "petroleum_products" "price "   "usd/ton"  733.4396112416946  706.9150315495351  768.4559787313432  844.1550996139392  720.3837940181993  649.4219989422651
       21 "deisel"             "quantity" "toe"     178071.78869878696 202388.22222262446 249784.14641399772  319588.0010249162 175493.79491172012 104236.71421483332
       21 "deisel"             "price "   "usd/ton"  805.0000005474001     805.0000005474     805.0000005474  805.0000005474001     805.0000005474     805.0000005474
       22 "fuel_oil"           "quantity" "toe"      96119.19745016689  178512.8532947603 178584.17130713418 15798.136635219998 106031.29682230865 119295.48563316478
       22 "fuel_oil"           "price "   "usd/ton"     805.0000005474  805.0000005474001  805.0000005474001  805.0000005474001  805.0000005474001  805.0000005474001
       23 "octan"              "quantity" "toe"      45866.76145949383   162.481987467152  71404.08939243901  45221.90804378327  45539.10059015143 45388.418602676255
       23 "octan"              "price "   "usd/ton"     805.0000005474     805.0000005474     805.0000005474     805.0000005474     805.0000005474     805.0000005474
       24 "other"              "quantity" "toe"      1475.344098375648  1416.381365496483 1663.5416137756342  2011.365216023541  7931.976392121803  7771.122976082109
       24 "other"              "price "   "usd/ton" 1625.2095379396592 1692.8656072671786 1526.1337636245157 1823.2078307796187 1102.4630467952627  1179.731945980824
        3 "gas"                "quantity" "toe"          68238828.5859      48955189.7799  51554556.35639999      27911573.4843       13661556.291       9107115.8211
        3 "gas"                "price "   "usd/toe"  .9073748175849035  .9824476209417969  1.036873847920222  .8713494221686452  1.641537140238395  1.688104792935639
       31 "natural_gas"        "quantity" "toe"             34936.3719 16758.171899999998          8524.6344           263.9763          51261.993 39548.303100000005
       31 "natural_gas"        "price "   "usd/toe" 234.15528062172666 235.65961031348843 231.63599013234227 231.88815065052054 233.87144745066107 235.38015603465726
       32 "liquified_gas"      "quantity" "toe"      68216.85336976161  48947.73164373418 51555.827294134586 27916.613645612975 13612.880736084857  9069.290677062696
       32 "liquified_gas"      "price "   "usd/ton"  787.7475436005558  901.9148383385344  998.5478228021194  868.9993979678922  766.7176217412558  668.7270624343065
      321 "LPG"                "quantity" "toe"      68216.85336976161  48947.73164373418 51555.827294134586 27916.613645612975 13612.880736084857  9069.290677062696
      321 "LPG"                "price "   "usd/ton"  787.7475436005558  901.9148383385344  998.5478228021194  868.9993979678922  766.7176217412558  668.7270624343065
      322 "LNG"                "quantity" "toe"                      0                  0                  0                  0                  0                  0
      322 "LNG"                "price "   "usd/ton"                  0                  0                  0                  0                  0                  0
        4 "electricity"        "quantity" "toe"     11541.874462596796 8186.7583834910165  9988.478073946744           791.4215 465.70009000000005  984.2352700000001
        4 "electricity"        "price "   "usd/ton" 1979.6683190276271  2084.621256746199 2118.0963082182243  15682.98014746701 13931.737233501251 12180.742620730969
        5 "lubricants"         "quantity" "toe"      59703.57759915312  28878.80619154254  32752.97761747953  37008.82606179052 27087.193770400587  34713.55277142584
        5 "lubricants"         "price "   "usd/ton" 167.73039028899797 166.05556923092863 172.25148223351755 144.82117012074076  135.3826772716152  138.1445060499145
        . ""                   ""         ""                         .                  .                  .                  .                  .                  .
        . ""                   ""         ""                         .                  .                  .                  .                  .                  .
        . ""                   ""         ""                         .                  .                  .                  .                  .                  .
        . ""                   ""         ""                         .                  .                  .                  .                  .                  .
      end
      ------------------ copy up to and including the previous line ------------------

      So E is suppose to be 01jan2012
      F is 01feb2012
      G is 01mar2012...and so on. the data goes till 01aug2023, but I couldn't manage to figure out how to declare my date variable

      I have two main variables; quantity and price.!

      The idea is that I need to reshape the panel using 'item" (in the energy mix), (but stata complains that i has missing value, which it doesn't).
      so I created a key for that, and it didn't work as well.

      and then I need to run the following regression :
      bys item: asreg dlog(quantity) dlog(price), wind(month 60) se fit

      //ssc install lgraph


      generate upper = _b_dlog(price) + 1.96*_se_dlog(price)
      generate lower =_b_dlog(price) - 1.96*_se_dlog(price)
      lgraph _b_dlog(price) year, also(upper lower) addplot(rarea lower upper year, color(gs10%50))

      //////////////////////thanks for your patience and help
      Last edited by sarah altalafha; 10 Feb 2024, 13:00.

      Comment


      • #4
        Well, in fact you do have observations with missing values of key. In fact, these observations, at least in your example data, have missing values for every variable, so you don't notice them visually. But they are in the data set and Stata can see them. But since they contain no information, the obvious solution is to just get rid of them. The next hurdle to overcome is to get the monthly dates into the variable names. Then it's two -reshape-s, one long and the other wide. To accomplish the wide -reshape- the variable named variable has to be cleaned up, because there are trailing blanks in some of them, and blanks are not admissible in variable names. And then a little optional clean-up at the end.
        Code:
        //    STEP 0: GET RID OF EMPTY OBSERVATIONS
        foreach v of varlist _all {
            assert missing(`v') if missing(key)
        }
        drop if missing(key)
        
        //    STEP 1: INTRODUCE MONTHS INTO THE VARIABLE NAMES FROM E ONWARD
        local month = tm(2012m1)
        ds key-unit, not
        local values `r(varlist)'
        foreach v of varlist `values' {
            rename `v' value`month'
            local ++ month
        }
        
        //    STEP 2: FIRST RESHAPE TO GET MONTH LONG
        reshape long value, i(item variable) j(month)
        format month %tm
        
        //    STEP 3: TAKE QUANTITY AND PRICE WIDE
        replace variable = trim(variable)
        reshape wide value unit, i(key month) j(variable) string
        rename unit* *_unit
        rename value* *
        
        // STEP 4 (OPTIONAL): REORDER THE VARIABLES
        order price, before(price_unit)
        order quantity, before(quantity_unit)
        order key item month, first
        Note: In Step 0, if Stata halts with an error message, it means that somewhere in your data there is at least one observation where key is missing value but the other variables are not all missing. If that happens, you need to find the offending observations to see what is going wrong: there will be some kind of erroneous data that needs to be fixed there. You can find those as follows:

        Code:
        egen nmcount = rownonmiss(_all)
        browse if missing(key) & nmcount > 0
        If Stata halts with errors in step 2 or 3, post back with example data that illustrates the problem. The example data you show is, except for the problems I have mentioned, quite well behaved, if a bit complicated. But the full data set may contain additional problems that need to be fixed before the -reshape-s can go through. If that happens, I need to see example data that illustrates those problems.

        Added: I'm puzzled by the unit variable in your data. There is the quantity unit toe, which I have never heard of. What is it? Next, this same quantity variable toe apparently applies to every item in the example data, and I cannot think of any unit of quantity that is applicable to both electricity and materials such as crude oil, fuel oil, lubricants, and also to the non-specific category other. Also, the price unit for electricity is USD/ton, which puzzles me because I cannot figure how one can buy electricity by the ton.
        Last edited by Clyde Schechter; 10 Feb 2024, 15:00.

        Comment


        • #5
          Thank you much for your quick fix. I have tried the code and it worked smoothly, however, when i cleaned the sheet from empty cells , and ignore the the first few lines of code, it runs with errors.
          In term of this exercise, well, the original dataset has every component in the energy mix in different units. so the idea was to calculate the sensitivity of the energy import bill to changes in energy prices,
          therefore, I converted these units to Tons of Oil Equivalent (TOE) and calculate the price per TOE accordingly, which allows me to analyze the elasticity over time. that's why I need the rolling window regressions to do the job.
          elasticity = %change in Quantity/ %change in Price

          to calculate for 1% increase in energy price, there will be "elasticity" % change in energy import bill

          if you have any suggestions for the main question of interest, please tell me.

          Comment


          • #6
            however, when i cleaned the sheet from empty cells , and ignore the the first few lines of code, it runs with errors.
            My only suggestion is, don't do that. Leave the spreadsheet alone, and use those first few lines of code to remove the empty observations.

            I am not familiar with the inner workings of how Stata imports data from Excel. (And you haven't even disclosed how you got the data from Excel to Stata.) But I do know that sometimes the process results in dragging in extra empty observations at the end of the Stata data set. My experience in this situation has been like yours: trying to edit the spreadsheet does not solve the problem. Whatever it is that Stata "sees" that leads it to create those rows is, as best I can tell, not something that you can edit in Excel. So the solution, when this happens, is to remove the empty observations within the Stata data set. The code I have shown does that safely: it checks that all the observations it deletes truly contain no data whatsoever.

            Thank you for the explanation about tons of oil equivalent. I have never heard of that before. But it's an interesting solution to what must be a common and important difficulty working with energy economics data.

            Comment

            Working...
            X