Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis of stock market data - Excel or Stata?

    Hi,

    I have daily stock market data of 728 firms from 2000 to 2015 (about 4200 prices per stock). I have the raw data in excel and I am asking myself if I shall first calculate returns and variances in Excel as I do not know how to do that in Stata.

    I know how to calculate a return from two variables in Stata but I have not found a solution to deal with such a large matrix.

    Do you have suggestion?

    Michael

  • #2
    Michael:
    as usual, posting an excerpt/example of your data via -dataex- would increase your chances of getting helpful replies.
    As far as the support for your statistical analysis is concerned, Stata outperforms spreadsheet in many ways.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I do not even know how physically you can do this in Excel. How are you going to drag the formula over 4200 cells?

      Stata (and any other statistical program/language really) are good exactly for such situations where you have to operate on large datasets and have a programmed solution instead of a sequence of clicks on different buttons.

      You will have to learn Stata (or any other statistical program/language) if you are to work with such data.

      Comment


      • #4
        It basically looks like this

        1/6/2000 2135.114 101.15 107.08 .....
        1/7/2000 2135.114 101.15 107.08
        1/10/2000 2135.114 102.78 105.89
        1/11/2000 2135.114 101.81 104.7
        1/12/2000 2135.114 103.11 103.51
        1/13/2000 2135.114 110.94 102.32
        1/14/2000 2135.114 110.94 103.03

        ....


        12/31/2015

        You have the dates and then you have 728 stocks in the columns with the respective prices per date on a daily basis (some are blank as not all stocks were listed on all dates).

        Comment


        • #5
          @Kolev

          You simply copy and paste the formula but it will result in a very large Excel file.

          I want to learn how to do it in Stata so if you have a solution it would be great!

          Comment


          • #6
            Micheal, the copy and paste is simply when your data table is more or less fitting on the computer screen. If your table is huge and you cannot see it, it is copying and pasting, and dragging, are not simple anymore... Trust me, I also open excel files from time to time .

            I can show you how to calculate returns in plenty of ways, even for your specific data, but you need to read the help of -dataex- and post some more useful subsample of your data.

            Here is one way to calculate returns on a dataset you can access from your Stata:

            Code:
            . webuse grunfeld, clear
            
            . xtset company year
                   panel variable:  company (strongly balanced)
                    time variable:  year, 1935 to 1954
                            delta:  1 year
            
            . gen return = (mvalue - l.mvalue)/l.mvalue
            (10 missing values generated)

            Comment


            • #7
              Hi Joro,
              thanks for the help. I am actually asking myself how to optimally import the data in the first place. It looks like this (second column with the index has changed). First row is the stock ID (after "Date").

              Does that work or do I need the dataex?

              It is probably not good to arrange it this way and use a variable for each firm?


              Date 1 135490 142439 142586 142789 143170 143189 143207 143635
              DAX30 IFA HOTEL & TOURISTIK - TOT RETURN IND MAX AUTOMATION - TOT RETURN IND SCHALTBAU HOLDING - TOT RETURN IND BERENTZEN-GRUPPE PREF. DEAD - 30/09/15 - TOT RETURN IND FIELMANN - TOT RETURN IND WASGAU PDK.& HANDELS - TOT RETURN IND ALNO - TOT RETURN IND HUCKE DEAD - 05/12/12 - TOT RETURN IND
              31.12.99 6958,14 101,15 98,75 15,74 36,17 130,58 78,43 28,39 39,16
              03.01.00 6750,76 101,15 104,7 15,74 34,97 131,9 78,43 28,75 40,14
              04.01.00 6586,95 101,15 103,51 15,74 35,87 134,53 74,51 26,96 43,57
              05.01.00 6502,07 98,54 104,7 15,74 34,85 133,66 77,45 28,39 46,02
              06.01.00 6474,92 101,15 107,08 16,25 34,13 134,1 78,43 28,03 46,02
              07.01.00 6780,96 101,15 107,08 16,2 34,31 138,49 78,43 27,85 47
              10.01.00 6925,52 102,78 105,89 16,2 34,38 146,85 78,33 27,85 45,04
              11.01.00 6891,25 101,81 104,7 14,82 34,38 146,41 78,33 28,03 45,53
              12.01.00 6912,81 103,11 103,51 14,82 34,69 147,72 78,33 27,89 41,61
              13.01.00 6955,98 110,94 102,32 14,82 34,27 147,28 73,53 26,96 47,49
              14.01.00 7173,22 110,94 103,03 15,74 34,51 149,04 73,04 26,96 47,78
              17.01.00 7258,9 101,15 101,13 16,3 35,9 149,48 72,55 27,14 47,49
              18.01.00 7072,12 101,15 99,94 15,74 36,15 146,85 73,53 26,96 46,02
              19.01.00 7091,04 101,22 98,75 16,02 48,28 147,94 73,53 28,03 46,02
              20.01.00 7112,66 101,22 99,7 15,74 42,55 148,16 73,53 27,67 44,06
              21.01.00 6992,75 101,22 102,32 16,02 43,58 149,48 73,04 26,6 44,06
              24.01.00 6931,99 108,98 99,94 15,93 43,76 147,28 74,41 30,19 43,57
              25.01.00 6809,64 104,74 98,75 15,93 43,16 147,28 74,41 29,65 43,57
              26.01.00 6969,37 104,42 99,94 15,93 39,66 147,28 73,53 29,65 43,57
              27.01.00 7126,13 104,42 98,99 16,39 40,45 147,72 73,53 26,96 42,1
              28.01.00 7066,6 108,33 97,8 16,3 40,15 147,28 72,55 27,32 43,08
              31.01.00 6835,6 104,42 97,56 16,02 41,06 147,28 72,55 27,32 42,1
              01.02.00 7050,46 104,42 94,23 15,74 40,76 148,16 72,55 25,95 41,71
              02.02.00 7171,95 104,42 92,33 15,84 39,98 147,28 72,55 26,24 41,71
              03.02.00 7354,26 104,74 95,18 16,49 38,05 145,75 70,59 27,85 41,91
              04.02.00 7444,61 105,07 94,94 16,21 36,84 145,97 73,53 28,75 41,61
              07.02.00 7296,32 104,42 93,04 16,49 36,24 144,21 73,53 26,96 42,1
              08.02.00 7549,88 105,07 92,8 16,67 35,64 146,19 72,55 26,96 42,59
              09.02.00 7629,11 105,07 92,8 16,67 35,65 148,38 72,55 28,68 41,61
              10.02.00 7709,27 104,74 91,61 16,39 36,25 147,72 72,55 26,78 37,21
              11.02.00 7611,55 104,48 91,14 16,39 35,65 144,65 72,55 26,24 35,25
              14.02.00 7644,8 104,42 90,42 15,93 35,65 141,13 72,55 26,42 37,21

              Comment


              • #8
                Maybe it is better to have the stocks as one variable with something like an identifier or so? That may then be used to calculate something like

                generate ret = log (Stock n)/log (stock n-1)

                or similar?

                But then the data needs to be sorted in some way?

                Comment


                • #9
                  Michael, this arrangement that you show is known in Stata parlance as wide and what (I think) you have on your mind is called long.

                  The beauty of Stata is that you can move from one to another at will and at your own convenience (-reshape-).

                  For the time being your data is arranged perfectly fine.

                  Your problem now is how to import in Stata the information, for example all the headers of the columns are not valid Stata names.

                  Comment


                  • #10
                    I found this:

                    https://researchfinancial.wordpress....om-datastream/

                    Seems to be helpful (first picture is what I have above and last picture is exactly what I need to make the calculations I guess).

                    Guess I need 1 variable (company or CUSIP or ID or so).

                    My thinking problem is probably to consider Stata as some form of spreadsheet program which is not right.

                    Comment


                    • #11
                      Hi Michael,

                      1) You will definitely want to share data using Stata's dataex command. If you need help with it, I created a video tutorial on YouTube here (I would watch at 2x)

                      2) Before you import the data from Excel into Stata, you will want to add the letter "s" (or other letter of your choice ) in front of the stock_id in the first row. Stata doesn't allow variables to begin with a number, plus it makes reshaping the data from wide (how it is now) to long a lot easier. (This is the answer to the "does the data needs to be sorted in some way?" in post #8.

                      3) To see an example of someone calculating various stock returns, see here

                      I took a stab at what reshaping your data would look like. I used the stock data from point #3 and then reshaped it to loo more like yours:
                      Code:
                      dataex
                      * NOTE: This is fictional stock price data
                      * Data shared using  -dataex-. To install: ssc install dataex
                      clear
                      input double date float(apple tesla pluralsight dropbox)
                      21458 227.26  310.7  31.1 26.22
                      21459 229.28 301.02 29.21 26.32
                      21460 232.07  294.8 28.99 26.25
                      21461 227.99 281.83 26.81 24.54
                      21462 224.29 261.95 26.48 24.07
                      end
                      format %tdN/D/CY date
                      format apple- dropbox  %10.2fc
                      list, noobs abbrev(14)
                      
                        +------------------------------------------------------+
                        |       date    apple    tesla   pluralsight   dropbox |
                        |------------------------------------------------------|
                        | 10/01/2018   227.26   310.70         31.10     26.22 |
                        | 10/02/2018   229.28   301.02         29.21     26.32 |
                        | 10/03/2018   232.07   294.80         28.99     26.25 |
                        | 10/04/2018   227.99   281.83         26.81     24.54 |
                        | 10/05/2018   224.29   261.95         26.48     24.07 |
                        +------------------------------------------------------+
                      Code:
                      * Reshaping the data 
                      rename (apple-dropbox) s_=   // adding prefix s_ to the stock names so can reshape
                      reshape long s_, i(date) j(stock) string  // actually reshaping the data
                      sort stock date
                      rename s_ close_price
                      
                      gen month1 = month(date)  // Number 1-12
                      gen month2 = mofd(date)  // Number of months since Jan1960
                      gen year = year(date)
                      format month2 %tm
                      
                      . list, noobs abbrev(14) sepby(stock)
                      
                        +------------------------------------------------------------------+
                        |       date         stock   close_price   month1    month2   year |
                        |------------------------------------------------------------------|
                        | 10/01/2018         apple        227.26       10   2018m10   2018 |
                        | 10/02/2018         apple        229.28       10   2018m10   2018 |
                        | 10/03/2018         apple        232.07       10   2018m10   2018 |
                        | 10/04/2018         apple        227.99       10   2018m10   2018 |
                        | 10/05/2018         apple        224.29       10   2018m10   2018 |
                        |------------------------------------------------------------------|
                        | 10/01/2018       dropbox         26.22       10   2018m10   2018 |
                        | 10/02/2018       dropbox         26.32       10   2018m10   2018 |
                        | 10/03/2018       dropbox         26.25       10   2018m10   2018 |
                        | 10/04/2018       dropbox         24.54       10   2018m10   2018 |
                        | 10/05/2018       dropbox         24.07       10   2018m10   2018 |
                        |------------------------------------------------------------------|
                        | 10/01/2018   pluralsight         31.10       10   2018m10   2018 |
                        | 10/02/2018   pluralsight         29.21       10   2018m10   2018 |
                        | 10/03/2018   pluralsight         28.99       10   2018m10   2018 |
                        | 10/04/2018   pluralsight         26.81       10   2018m10   2018 |
                        | 10/05/2018   pluralsight         26.48       10   2018m10   2018 |
                        |------------------------------------------------------------------|
                        | 10/01/2018         tesla        310.70       10   2018m10   2018 |
                        | 10/02/2018         tesla        301.02       10   2018m10   2018 |
                        | 10/03/2018         tesla        294.80       10   2018m10   2018 |
                        | 10/04/2018         tesla        281.83       10   2018m10   2018 |
                        | 10/05/2018         tesla        261.95       10   2018m10   2018 |
                        +------------------------------------------------------------------+
                      
                      * The if year==2018 and month1==10 won't matter here, but wanted to show that you could do it
                      tabstat close_price if year==2018 & month1==10 , by(stock) stats(n mean median min max sd var)
                      
                      Summary for variables: close_price
                           by categories of: stock 
                      
                            stock |         N      mean       p50       min       max        sd  variance
                      ------------+----------------------------------------------------------------------
                            apple |         5   228.178    227.99    224.29    232.07  2.844152  8.089199
                          dropbox |         5     25.48     26.22     24.07     26.32  1.086025  1.179449
                      pluralsight |         5    28.518     28.99     26.48      31.1   1.89994  3.609771
                            tesla |         5    290.06     294.8    261.95     310.7  18.88454  356.6259
                      ------------+----------------------------------------------------------------------
                            Total |        20   143.059   127.695     24.07     310.7  121.4979  14761.74
                      -----------------------------------------------------------------------------------
                      Finally, once you have your data organized like that, there are a bunch of user-written commands for use with daily stock data. For example, asreg (SSC install asreg) let's you calculate cross-sectional returns using Fama & MacBeth (1973) two-step model, asrol calculates stock returns over a rolling window, astile will divide stocks into size-based portfolio's (i.e. the smallest 10% of firms in portfolio 1, etc), and ascol converts daily stock returns or stock prices data to weekly, monthly, quarterly, or yearly returns.


                      Comment


                      • #12
                        Sorry for butting in on what is, in some sense, a digression.

                        That may then be used to calculate something like

                        generate ret = log (Stock n)/log (stock n-1)
                        generate ret = log(Stock)/log(Stock[_n-1])
                        (The italicized line is the formula written with correct Stata syntax, and is not in the original.)

                        This formula (never mind the syntax issues: we all know what is intended here) keeps popping up here on the list. Where on earth is this coming from? Who teaches this formula? It is
                        1. An approximation.
                        2. Not a very good approximation unless the return is small.
                        3. Enormously computationally inefficient when carried out by computer.

                        I mean, I can see using an approximate formula, if the approximation is reasonably good, and if it save a lot of computational effort. But the exact formula:
                        Code:
                        generate ret = (Stock/Stock[_n-1]) - 1
                        is exactly correct, simpler to code, and executes in about 1% of the time. Why on earth is anybody still using/teaching that other formula?

                        Comment


                        • #13
                          Thanks guys,

                          definitely help me in understanding and I am gonna check it out.

                          Special thanks to David: The other commands are also interesting for me. Guess the 10% dividing helps to build momentum portfolios (for example), which is my next project.

                          There is a lot of trial and error for me when using Stata as I am absolutely new to the program. There is so much information that I am currently suffering from something like "analysis paralysis" or so.

                          If you have a good learning source focussing on analyzing financial data like stock returns or so, feel free to mention it (book, video etc.).

                          Comment


                          • #14
                            I tried to reshape the data but Stata does make more than 700 variables so the reshape command does not work. Do I need to define the variables first (date, stock, price)?

                            . dataex A B C D E

                            ----------------------- copy starting from the next line -----------------------
                            [CODE]
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str10 A str17(B C D E)
                            "Date" "Company1" "Company2" "Company3" "Company4"
                            "12/31/1999" "6958.14" "101.15" "98.75" "15.74"
                            " 1/3/2000" "6750.76" "101.15" "104.7" "15.74"
                            " 1/4/2000" "6586.95" "101.15" "103.51" "15.74"
                            " 1/5/2000" "6502.07" "98.54000000000001" "104.7" "15.74"
                            " 1/6/2000" "6474.92" "101.15" "107.08" "16.25"
                            " 1/7/2000" "6780.96" "101.15" "107.08" "16.2"
                            " 1/10/2000" "6925.52" "102.78" "105.89" "16.2"
                            " 1/11/2000" "6891.25" "101.81" "104.7" "14.82"
                            " 1/12/2000" "6912.81" "103.11" "103.51" "14.82"
                            " 1/13/2000" "6955.98" "110.94" "102.32" "14.82"
                            " 1/14/2000" "7173.22" "110.94" "103.03" "15.74"
                            " 1/17/2000" "7258.9" "101.15" "101.13" "16.3"
                            " 1/18/2000" "7072.12" "101.15" "99.94" "15.74"
                            " 1/19/2000" "7091.04" "101.22" "98.75" "16.02"
                            " 1/20/2000" "7112.66" "101.22" "99.7" "15.74"
                            " 1/21/2000" "6992.75" "101.22" "102.32" "16.02"
                            " 1/24/2000" "6931.99" "108.98" "99.94" "15.93"
                            " 1/25/2000" "6809.64" "104.74" "98.75" "15.93"
                            " 1/26/2000" "6969.37" "104.42" "99.94" "15.93"
                            " 1/27/2000" "7126.13" "104.42" "98.98999999999999" "16.39"
                            " 1/28/2000" "7066.6" "108.33" "97.8" "16.3"
                            " 1/31/2000" "6835.6" "104.42" "97.56" "16.02"
                            " 2/1/2000" "7050.46" "104.42" "94.23" "15.74"
                            " 2/2/2000" "7171.95" "104.42" "92.33" "15.84"
                            " 2/3/2000" "7354.26" "104.74" "95.18000000000001" "16.49"
                            " 2/4/2000" "7444.61" "105.07" "94.94" "16.21"
                            " 2/7/2000" "7296.32" "104.42" "93.04000000000001" "16.49"
                            " 2/8/2000" "7549.88" "105.07" "92.8" "16.67"
                            " 2/9/2000" "7629.11" "105.07" "92.8" "16.67"
                            " 2/10/2000" "7709.27" "104.74" "91.61" "16.39"
                            " 2/11/2000" "7611.55" "104.48" "91.14" "16.39"
                            " 2/14/2000" "7644.8" "104.42" "90.42" "15.93"
                            " 2/15/2000" "7396.13" "104.42" "88.76000000000001" "15.93"
                            " 2/16/2000" "7490.32" "107.03" "88.28" "16.21"
                            " 2/17/2000" "7580.53" "104.74" "91.61" "15.93"
                            " 2/18/2000" "7573.78" "104.74" "92.33" "15.84"
                            " 2/21/2000" "7590.53" "104.42" "90.90000000000001" "16.12"
                            " 2/22/2000" "7607.94" "104.48" "89.70999999999999" "16.21"
                            " 2/23/2000" "7698.97" "106.37" "89.47" "16.3"
                            " 2/24/2000" "7640.53" "107.68" "85.66" "16.3"
                            " 2/25/2000" "7738.68" "105.07" "88.76000000000001" "16.21"
                            " 2/28/2000" "7587.13" "104.74" "88.04000000000001" "15.84"
                            " 2/29/2000" "7644.55" "104.74" "86.38" "15.84"
                            " 3/1/2000" "7727.93" "104.74" "85.66" "15.74"
                            " 3/2/2000" "7945.77" "105.07" "86.14" "15.74"
                            " 3/3/2000" "7960.03" "105.07" "86.84999999999999" "15.74"
                            " 3/6/2000" "7975.78" "105.07" "86.84999999999999" "15.74"

                            Comment


                            • #15

                              1) If you imported it from Excel, specify firstrow and Stata will treat first row of Excel data as variable names. (This is true even if you specify cellrange(A4), it will treat row 4 as the "first row" and treat its values as variable names.
                              Code:
                              import excel using my_excel_file.xlsx, sheet("Data") firstrow  cellrange(A4)
                              2) Please remember to include the "end" and "[/CODE]" when you paste in from dataex. (Where it says, "copy up to and including the previous line")

                              3) Since you've already imported it into Stata, you could use a loop to rename all variables to the value in the first row (rather than change them all by hand) using:
                              Code:
                              * Loop to rename all variables to their value in the first row (1st obs)
                              * NOTE: the ` is on the key to the left of the 1 on your keyboard. It usually shares with ~
                              foreach var of varlist * {
                                  rename `var' `=`var'[1]'
                              }
                              
                              * Your date variable came in as a string, this creates a Stata date variable with it
                              gen double date2 = daily(Date, "MDY", 2020)
                              format %tdN/D/CY date2
                              drop in 1  // dropping that 1st obs (that had the variable names)
                              drop Date  // String date not needed
                              rename date2 date
                              order date, first  // placing date as 1st variable
                              
                              * Converting variable Company from string to numeric
                              destring Company*, replace
                              format Company* %10.2fc  // Just formatting to comma format, showing 2 decimal places
                              
                              * Reshaping to long
                              reshape long Company, i(date) j(firm_id)
                              sort firm_id date
                              bysort firm_id (date): gen n = _n  // Creating a within firm counter variable
                              order n, before(Company)
                              list if firm_id<=4 & n<=5, sepby(firm_id) noobs
                              
                              * Renaming Company to stock_price and calculating daily return
                              rename Company stock_price
                              bysort firm_id (date): gen ret = (stock_price / stock_price[_n-1]) - 1  // this is from Clyde Schechter code above
                              list if firm_id<=4 & n<=5, sepby(firm_id) noobs abbrev(14)
                              
                                +----------------------------------------------------+
                                |       date   firm_id   n   stock_price         ret |
                                |----------------------------------------------------|
                                | 12/31/1999         1   1      6,958.14           . |
                                | 01/03/2000         1   2      6,750.76   -.0298039 |
                                | 01/04/2000         1   3      6,586.95   -.0242654 |
                                | 01/05/2000         1   4      6,502.07   -.0128861 |
                                | 01/06/2000         1   5      6,474.92   -.0041756 |
                                |----------------------------------------------------|
                                | 12/31/1999         2   1        101.15           . |
                                | 01/03/2000         2   2        101.15           0 |
                                | 01/04/2000         2   3        101.15           0 |
                                | 01/05/2000         2   4         98.54   -.0258033 |
                                | 01/06/2000         2   5        101.15    .0264867 |
                                |----------------------------------------------------|
                                | 12/31/1999         3   1         98.75           . |
                                | 01/03/2000         3   2        104.70    .0602532 |
                                | 01/04/2000         3   3        103.51   -.0113658 |
                                | 01/05/2000         3   4        104.70    .0114965 |
                                | 01/06/2000         3   5        107.08    .0227316 |
                                |----------------------------------------------------|
                                | 12/31/1999         4   1         15.74           . |
                                | 01/03/2000         4   2         15.74           0 |
                                | 01/04/2000         4   3         15.74           0 |
                                | 01/05/2000         4   4         15.74           0 |
                                | 01/06/2000         4   5         16.25    .0324015 |
                                +----------------------------------------------------+

                              4) Since you have so many stocks, read this post on Stack Overflow Stata: using reshape command to convert large number of variables to long format. Essentially you're going to split the file into two, reshape each sub-file, and then append them afterwards. They give you the code on how to do that.
                              Last edited by David Benson; 06 Jan 2019, 20:41.

                              Comment

                              Working...
                              X