Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Storing Intercepts of Rolling Window Regressions

    Hi everyone,

    I would like to get some answers to the doubts I have with regard to the rolling command. First of all, I will explain what I want to get with the rolling window regressions. I have 9,630 columns (each of one representing a dependent variable) plus 4 columns (each representing an independent variable). Moreover, each column is divided into different rows, each row representing one month from January 2000 to December 2013.

    I want to estimate the intercepts of the rolling window regression with window equal to three years or 36 months, and by regressing each dependent variable on the four independent variables mentioned.

    Finally, I want to store all intercepts on a file, where each column displays the intercepts associated to each of the 9,630 dependent variables plus a column indicating the end date of each intercept estimated. Thank you for the help.

  • #2
    Any idea? Thank you in advance.

    Comment


    • #3
      I think you can get what you want by modifying the following code to suit your data:

      Code:
      clear*
      tempfile building
      gen depvar = ""
      save `building', emptyok
      
      webuse grunfeld, clear
      keep if company == 1
      tsset year
      
      local depvars invest mvalue
      local indvars kstock time
      
      tempfile rolling_results
      
      foreach d of local depvars {
          rolling _b[_cons], window(5) saving(`rolling_results', replace): regress `d' `indvars'
          preserve
          use `building', clear
          append using `rolling_results'
          replace depvar = "`d'" if missing(depvar)
          save `"`building'"', replace
          restore
      }
      
      use `building', clear
      rename _stat_1 intercept
      The idea is just to loop over the dependent variables, running the rolling regressions for each, and appending the results to a file as we go along. At the end of this code, the data set in memory will have what you asked for. You can then save it, or do whatever you need with it.

      Comment


      • #4
        Thank you very much for your help. I would like to ask another question. Could you explain me the first ten lines of code, please? I am not sure if I would have to write them for my data. Thank you in advance.

        Comment


        • #5
          The first four lines of code generate an empty output file to save the results in. You will need to do that.

          The next three lines of code are simply building a sample dataset for Clyde to demonstrate his technique on.

          Strive to understand how to apply the technique that follows to your data. Which I expect would consist of changing the lists of dependent and independent variables, and replacing the three lines of sample dataset construction with a use command for your data.
          Last edited by William Lisowski; 16 May 2016, 12:28.

          Comment


          • #6
            Is it really necessary to write a code where I list the dependent and independent variables? If so, what code should I wrote considering that I have 9,630 variables? Thank you again.

            Comment


            • #7
              Well, anything you do with 9,630 variables is going to be cumbersome unless there is some pattern in their names that you can exploit. If they are named, for example v1 through v9630 and they appear consecutively in your data set, then

              Code:
              unab indvars: v1-v9630
              will get you the desired local macro. If they are not consecutive in your data set, you might precede that with -order v1-v9630, first- in order to make them consecutive and then apply the above.

              If the names are not that simple, but say they all have some common part, as, for example if they were variables like varA, varB, ..., varZ, varAA,... then you could use
              Code:
              unab indvars: var*
              Perhaps there are two different such series of names etc. You can exploit wildcards to get the list of names from the -unab- command.

              It really all depends on how the variables were named. If you have 9,630 variables whose names have no common features, then you are stuck with just listing them.

              But remember that even just to do the regressions, you would face this same problem.

              As for the dependent variables, putting them in a macro is optional. You could dispense with that and just list them in the appropriate place in the -regress- part of the -rolling:...- command.

              Comment


              • #8
                Expanding slightly on Clyde's answer here.

                While Clyde wrote indvars for the macro he thought would contain your 9,630 variables, he really meant to write depvars - we don't often see 9,630 dependent variables here on Statalist.

                If the names of your 9,630 dependent variables have no common features, but the variables occur in your dataset as 9,630 consecutive variables (no independent variable or identifier or other variable stuck into the middle of the list) then if the first dependent variable is foo and the 9,630th is bar, you can use
                Code:
                unab depvars: foo-bar
                And if there are a few unwanted variables stuck into the middle of the list, say gnxl and xkcd, you can use
                Code:
                order gnxl xkcd, last
                to relocate them out of the middle of the list, after which the unab command will do what you need.

                More details on these commands can be found in the output of help unab and help order.

                Comment


                • #9
                  Yes, apologies. I mis-remembered what was written in the original post and thought there were 9,630 independent variables and 4 dependent, when, in fact, the reverse is true. It is only the dependent variables that are being looped over. William Lisowski's comments in #8 are good ways to deal with the challenge of expressing those 9,630 predictors in a macro so it can be compactly included in the -rolling...regress...- command..
                  Last edited by Clyde Schechter; 16 May 2016, 14:23. Reason: Correct error.

                  Comment


                  • #10
                    Hi again,

                    I would like to confirm if the code I am about to enter is correct. Here we go:
                    Code:
                    Code:
                     set maxvar 10000
                      clear*
                      tempfile building
                      gen depvar = ""
                      save `building', emptyok
                    I then have copy and paste my sample database from Excel to include all the variables

                    Code:
                    Code:
                     generate date2 = monthly(date, "M20Y")
                      format %tmMonth_CCYY date2
                      tsset date2
                      
                      unab depvars: fund1-fund9630
                      local indvars market small high momentum
                      
                      tempfile rolling_results
                      
                      foreach d of local depvars {
                      rolling _b[_cons], window(36) saving(`rolling_results', replace): regress `d' `indvars'
                      preserve
                      use `building', clear
                      append using `rolling_results'
                      replace depvar = "`d'" if missing(depvar)
                      save `"`building'"', replace
                      restore
                      }
                      
                      use `building', clear
                      rename _stat_1 intercept
                    Thank you in advance.

                    Comment


                    • #11
                      In Stata, most analyses are better performed using data in long form. Here's a simulated dataset that mimics your data setup in wide form and then code to convert it to long form. With data this size, reshape is very slow so it's better to manually code the reshape to long. The following runs under 2 minutes on my computer:

                      Code:
                      * set up fake data with 9630 fund variables
                      clear all
                      set seed 32154231
                      set maxvar 10000
                      set obs 168
                      gen ym = ym(1999,12) + _n
                      format %tm ym
                      
                      gen market = 100 + runiform() * _n
                      gen small = runiform()
                      gen high = runiform() + small
                      gen momentum = runiform()
                      
                      forvalues i = 1/9630 {
                          local base = 100 * runiform()
                          gen fund`i' = `base' + runiform() * _n
                      }
                      
                      save "data_wide.dta", replace
                      
                      * -reshape- is too slow for this large dataset, do it manually
                      use "data_wide.dta", clear
                      forvalues i = 1/9630 {
                          use ym market small high momentum fund`i' using "data_wide.dta"
                          rename fund`i' fund
                          tempfile f`i'
                          qui save "`f`i''"
                      }
                      clear
                      gen fundid = .
                      forvalues i = 1/9630 {
                          append using "`f`i''"
                          qui replace fundid = `i' if mi(fundid)
                      }
                      save "data_long.dta", replace
                      Now with data in long form, you can perform the rolling regressions all at once using rangestat (from SSC). To install rangestat, type in Stata's command window:
                      Code:
                      ssc install rangestat
                      With rangestat, you can create a custom Mata function to calculate any statistic you want and rangestat will use it to calculate results for each observations based only on data that is within the specified interval. In the following example, I define in Mata myreg to perform a linear regression. Then, it's just a matter of calling rangestat with the desired variables. I added code afterwards to spot check results for 3 observations. Note that there are 1,617,840 observations in the data, which means that rangestat will calculate 1,617,840 regressions! On my computer, this takes a little over 30 seconds!

                      Code:
                      clear all
                      * define a linear regression using quadcross() - help mata cross(), example 2
                      mata:
                      mata set matastrict on
                      
                      real rowvector myreg(real matrix Xall)
                      {
                          real colvector y, b, Xy
                          real matrix X, XX
                      
                          y    = Xall[.,1]
                          X     = Xall[.,2::cols(Xall)]
                          
                          XX = quadcross(X, X)
                          Xy = quadcross(X, y)
                          b  = invsym(XX) * Xy
                      
                           return(rows(X), b')
                      
                      }
                      
                      end
                      
                      use "data_long.dta"
                      
                      * add a constant
                      gen double one = 1
                      
                      rangestat (myreg) fund market small high momentum one, by(fundid) interval(ym -35 0) casewise
                      
                      * spot check a few cases, use obs 50, 500, 5000
                      regress fund market small high momentum if fundid == fundid[50] & inrange(ym,ym[50]-35,ym[50])
                      list myreg* in 50
                      
                      regress fund market small high momentum if fundid == fundid[500] & inrange(ym,ym[500]-35,ym[500])
                      list myreg* in 500
                      
                      regress fund market small high momentum if fundid == fundid[5000] & inrange(ym,ym[5000]-35,ym[5000])
                      list myreg* in 5000

                      Comment


                      • #12
                        What you have looks like what was suggested. But the way to know for sure is to test it and review the results.

                        Let me advise that before testing the code, you get your data into Stata and save it as a Stata dataset, and then use that dataset as your second step, so all your commands can appear in a single do-file. To get the data into Stata, you can possibly copy and paste from Excel into Stata's data editor, but you would be better advised to use Stata's File > Import menu, or the import excel command. (If you master the import excel command, you can put it into your do-file as the second step, instead of the use command I suggested. The point is to have a command read your dataset into Stata for the program to use.)

                        Let me also advise that you first test by replacing
                        Code:
                        unab depvars: fund1-fund9630
                        with
                        Code:
                        unab depvars: fund1-fund3
                        Just process three of your dependent variables and examine the results carefully. No point in waiting for 9630 x 120 36-month rolling regression to complete to find out you've made a mistake.

                        Comment


                        • #13
                          Hi again,

                          I have run the regressions with a subsample (120 dependent variables). The data make sense. Afterwards I write the following code to rearrange the data as I want (I dropped the start variable created with the rolling window regressions):

                          Code:
                          reshape wide intercept, i(depvar) j(end)
                          The issue is that after writing the above code, the dependent variables are not properly ordered. For instance, I get data on the following form:

                          Fund1
                          Fund10
                          Fund11
                          .......
                          Fund2

                          I would like to have the data sorted as follows:

                          Fund1
                          Fund2
                          Fund3
                          .....
                          ....
                          Fund9630

                          Thank you in advance..

                          Comment


                          • #14
                            You don't say which code you tried but I'm guessing it's not my example in #11.

                            If you insist on results in wide format, here's a complete example that does this using rangestat. The first part creates fake data for 20 funds. The second part defines a Mata function to perform a regression. Then a loop is used to perform the rolling regressions by fund. A couple of spot checks are at the end to show that the results are correct.

                            Code:
                            * set up fake data with 20 fund variables
                            clear all
                            set seed 32154231
                            set maxvar 10000
                            set obs 168
                            gen ym = ym(1999,12) + _n
                            format %tm ym
                            
                            gen market = 100 + runiform() * _n
                            gen small = runiform()
                            gen high = runiform() + small
                            gen momentum = runiform()
                            
                            forvalues i = 1/20 {
                                local base = 100 * runiform()
                                gen fund`i' = `base' + runiform() * _n
                            }
                            
                            * define a linear regression using quadcross() - help mata cross(), example 2
                            mata:
                            mata set matastrict on
                            
                            real rowvector myreg(real matrix Xall)
                            {
                                real colvector y, b, Xy
                                real matrix X, XX
                            
                                y    = Xall[.,1]
                                X     = Xall[.,2::cols(Xall)]
                                
                                XX = quadcross(X, X)
                                Xy = quadcross(X, y)
                                b  = invsym(XX) * Xy
                            
                                 return(rows(X), b')
                            
                            }
                            
                            end
                            
                            * add a constant
                            gen double one = 1
                            
                            forvalues i = 1/20 {
                                rangestat (myreg) fund`i' market small high momentum one, interval(ym -35 0) casewise
                                rename myreg6 alpha`i'
                                replace alpha`i' = . if myreg1 < 36
                                drop myreg*
                            }
                            
                            * spot check a few cases
                            regress fund1 market small high momentum if inrange(ym,ym[50]-35,ym[50])
                            list alpha1 in 50
                            
                            regress fund2 market small high momentum if inrange(ym,ym[55]-35,ym[55])
                            list alpha2 in 55

                            Comment


                            • #15
                              Thank you, but I already entered the code in #10, I guess it is more intuitive. Can anyone explain me an easier way to sort funds (depvar) in ascending order like I explained in #13? Thank you.

                              Comment

                              Working...
                              X