Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I just fixed in #14 the last line of code where I broke my rule not to use an if conditional when using an egen command. The correct way to set the first 32 observations per group is:
    Code:
    by series: replace MSFE = . if _n < 33
    Again, I do not see your data, code, and results but if you run the example in #12, it generates results based on the value of Error^2 starting on 1995q1 (that's 33 weeks from 1987q1) for both series 1 and 2 and the results match the rolling results. The data for series 2 is derived from the data for series 1 but is not the same and the results are different for series 2.

    If you exclude the mata function definition, my rangestat solution in #12 spans 6 lines of code, 5 of which are basic Stata commands. You have the tools in hand to get there. I did my part, now it's time for you to figure things out.

    Comment


    • #17
      Thanks Robert.

      I attach my data and I explain my concerns with the rangestat part of the code so you may be able to give a quick solution.

      I am trying to understand how rangestat works. I guess the rangestat part of the code does not imply a rule that the first window is the first 33 quarters. I understand that it implies that the regression expands by one quarter each time but I need 33 quarters initially to estimate the first regression (and then it will expands to 34, 35, 35 and so on).

      That is the part I mean:
      Code:
      gen qdate1 = quarter_date[1]
      rangestat (myreg) rgdp_first grs, interval(quarter_date qdate1 quarter_date) by(series) casewise  
      rename myreg1 obs
      rename myreg2 b_grs
      rename myreg3 b_cons
      Note: I write the rename command consistent with my Stata11

      The solution that I set the first 32 observations to zero is not sufficient in this case as the code will have different initial samples and hence the results are not correct.

      can I add any condition to rangestat to impose a first window of 33 observations or ( a start from 1987q1 to 1994q4) ?
      Attached Files

      Comment


      • #18
        The data you posted show all series starting in 1987q1 and ending in 2015q4 (and the data is fully rectangularized, i.e. N == 166):

        Code:
        . use "forstatalist.dta", clear
        
        . by series: gen N = _N
        
        . isid series quarter_date, sort
        (data already sorted by series quarter_date)
        
        . by series: gen first = _n == 1
        
        . list year-N if first, noobs sep(0)
        
          +----------------------------------------------------------------------------------------+
          | year   quarter   month   week        grs   quarte~e   series   DRGDP2   ngdp_f~t     N |
          |----------------------------------------------------------------------------------------|
          | 1987         1       1      1   -1.74006     1987q1        1        3      7.778   116 |
          | 1987         1       1      2   -2.02978     1987q1        2        3      7.778   116 |
          | 1987         1       1      3   -2.10766     1987q1        3        3      7.778   116 |
          | 1987         1       1      4   -1.99883     1987q1        4        3      7.778   116 |
          | 1987         1       2      1   -2.18365     1987q1        5        3      7.778   116 |
          | 1987         1       2      2   -2.43611     1987q1        6        3      7.778   116 |
          | 1987         1       2      3   -2.44687     1987q1        7        3      7.778   116 |
          | 1987         1       2      4   -2.31547     1987q1        8        3      7.778   116 |
          | 1987         1       3      1   -3.58194     1987q1        9        3      7.778   116 |
          | 1987         1       3      2   -3.63065     1987q1       10        3      7.778   116 |
          | 1987         1       3      3   -3.58067     1987q1       11        3      7.778   116 |
          | 1987         1       3      4   -3.47806     1987q1       12        3      7.778   116 |
          +----------------------------------------------------------------------------------------+
        
        . by series: gen last = _n == _N
        
        . list year-N if last, noobs sep(0)
        
          +----------------------------------------------------------------------------------------+
          | year   quarter   month   week        grs   quarte~e   series   DRGDP2   ngdp_f~t     N |
          |----------------------------------------------------------------------------------------|
          | 2015         4       1      1   -2.56989     2015q4        1     2.71      1.515   116 |
          | 2015         4       1      2   -1.40545     2015q4        2     2.71      1.515   116 |
          | 2015         4       1      3   -1.32242     2015q4        3     2.71      1.515   116 |
          | 2015         4       1      4   -1.80394     2015q4        4     2.71      1.515   116 |
          | 2015         4       2      1   -1.51485     2015q4        5     2.71      1.515   116 |
          | 2015         4       2      2   -.211103     2015q4        6     2.71      1.515   116 |
          | 2015         4       2      3   -.142095     2015q4        7     2.71      1.515   116 |
          | 2015         4       2      4   -.456448     2015q4        8     2.71      1.515   116 |
          | 2015         4       3      1   -.691204     2015q4        9     2.71      1.515   116 |
          | 2015         4       3      2    1.47265     2015q4       10     2.71      1.515   116 |
          | 2015         4       3      3      1.627     2015q4       11     2.71      1.515   116 |
          | 2015         4       3      4      1.278     2015q4       12     2.71      1.515   116 |
          +----------------------------------------------------------------------------------------+
        
        . by series: gen Ncalc = quarter_date[_N] - quarter_date[1] + 1
        
        . assert Ncalc == N
        Why do you claim that the sample size varies by series?

        With respect to how rangestat works, a careful reading of the help file will help a lot. If you have not noticed yet, the obs variable contains the number of observations for the current observation's regression. You can use the value in obs to reject results based on a sample that's too small.

        As I understand the situation at this point, the following should do what you want:

        Code:
        use "forstatalist.dta", clear
        
        * define a linear regression in Mata using quadcross() - help mata cross(), example 2
        mata:
        mata clear
        mata set matastrict on
        real rowvector myreg(real matrix Xall)
        {
            real colvector y, b, Xy
            real matrix X, XX
        
            y = Xall[.,1]                // dependent var is first column of Xall
            X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
            X = X,J(rows(X),1,1)         // add a constant
            
            XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
            Xy = quadcross(X, y)
            b  = invsym(XX) * Xy
            
            return(rows(X), b')
        }
        end
        
        * the low and high bounds of the recursive rolling window, don't calculate
        * results if year >= 2005
        by series: gen low = quarter_date[1]
        by series: gen high = cond(year < 2005, quarter_date, .)
        
        rangestat (myreg) ngdp_first grs, interval(quarter_date low high) by(series) casewise
        rename myreg1 obs
        rename myreg2 b_grs
        rename myreg3 b_cons
        by series (quarter_date): gen Forecast = b_cons + b_grs * F.grs if obs >= 33
        gen Error = f.ngdp_first - Forecast
        bysort series: egen MSFE = mean(Error^2)
        by series: replace MSFE = . if obs < 33 | year >= 2005

        Comment


        • #19
          Thanks a lot Robert. Much appreciated.

          There are some puzzling results that need further investigation between the rangestat and the long code. I reply to post #18 and explain below:

          1- I did not mean that the sample size varied by series. I meant that the results using the previous rangesata code were not consistent among series such that series 1 has results for some quarters that are different from series 2, 3 and the rest. However when using the code in post #18, I no longer observe thiss problem. Results appear more consistent.

          2- Your code in post #18 appears to produce very reasonable results. However, when I tried to execute the process for each series independently to check the results using rangestat, I get relatively different results. I am so puzzled for days now as I do not know which one is the correct one. In other words, are the following code identical? I will write exactly your code and my check for series 1:


          Code:
          *
          use forstatalist.dta, clear
          mata:
          mata clear
          mata set matastrict on
          real rowvector myreg(real matrix Xall)
          {
              real colvector y, b, Xy
              real matrix X, XX
          
              y = Xall[.,1]                // dependent var is first column of Xall
              X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
              X = X,J(rows(X),1,1)         // add a constant
              
              XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
              Xy = quadcross(X, y)
              b  = invsym(XX) * Xy
              
              return(rows(X), b')
          }
          end
          
          * the low and high bounds of the recursive rolling window, don't calculate
          * results if year >= 2005
          by series: gen low = quarter_date[1]
          by series: gen high = cond(year < 2005, quarter_date, .)
          
          rangestat (myreg) ngdp_first grs, interval(quarter_date low high) by(series) casewise
          rename myreg1 obs
          rename myreg2 b_grs
          rename myreg3 b_cons
          by series (quarter_date): gen Forecast = b_cons + b_grs * F.grs if obs >= 33
          gen Error = f.ngdp_first - Forecast
          
          bysort series: egen MSFE = mean(Error^2)
          by series: replace MSFE = . if obs < 33 | year >= 2005
          
          
          
          sum MSFE if series==1
          
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                  MSFE |        40    3.843928           0   3.843928   3.843928
          
          
          
          
          sum MSFE if series==2
          
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                  MSFE |        40    3.795713           0   3.795713   3.795713
          Code:
          
          *******
          * CHECK FOR ONLY SERIES 1 TO SEE IF THE RESULTS using rangestat ARE CONSISTENT FOR SERIES 1
          use forstatalist.dta, clear
          keep if series==1
          if series==1 {
          rolling _b, window(33) recursive saving(newdata,replace): reg ngdp_first grs if year<2005
          preserve
          use newdata.dta,clear
          rename end quarter_date
          tsset series quarter_date
          save newdata.dta,replace
          restore
          merge 1:1 series quarter_date using newdata.dta
          keep if _merge==3
          tsset quarter_date
          gen Forecast1=_b_cons + _b_grs* F.grs
          gen Error1=f.ngdp_first-Forecast1
          gen SqError1=(Error1)^2
          egen MSFE1=mean(SqError1)
          
          }
          sum MSFE1
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                 MSFE1 |        40    3.909392           0   3.909392   3.909392
          * CHECK FOR ONLY SERIES 2 TO SEE IF THE RESULTS using rangestat ARE CONSISTENT FOR SERIES 2

          Code:
          *OR
          use forstatalist.dta, clear
          keep if series==2
          if series==2 {
          rolling _b, window(33) recursive saving(newdata,replace): reg ngdp_first grs if year<2005
          preserve
          use newdata.dta,clear
          rename end quarter_date
          tsset series quarter_date
          save newdata.dta,replace
          restore
          merge 1:1 series quarter_date using newdata.dta
          keep if _merge==3
          tsset quarter_date
          gen Forecast2=_b_cons + _b_grs* F.grs
          gen Error2=f.ngdp_first-Forecast2
          gen SqError2=(Error2)^2
          egen MSFE2=mean(SqError2)
          
          }
          
          sum MSFE2
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                 MSFE2 |        40    3.879183           0   3.879183   3.879183

          As you see, the results using rangestat are not similar to those using these. I hope I get some advice.

          Thank you so much and I look forward to hearing from you
          Last edited by Mike Kraft; 15 Mar 2017, 13:25.

          Comment


          • #20
            The following duplicates the results from rangestat for series 1 in #18. I'll let you figure out where you made your mistake.
            Code:
            use forstatalist.dta, clear
            keep if series==1
            rolling _b, window(33) recursive saving(newdata,replace): reg ngdp_first grs if year<2005
            preserve
            use newdata.dta,clear
            rename end quarter_date
            tsset series quarter_date
            save newdata.dta,replace
            restore
            merge 1:1 series quarter_date using newdata.dta, keep(master match) nogen
            tsset quarter_date
            gen Forecast1=_b_cons + _b_grs* F.grs
            gen Error1=f.ngdp_first-Forecast1
            gen SqError1=(Error1)^2
            egen MSFE1=mean(SqError1)
            replace MSFE1 = . if _n < 33 | year >= 2005
            sum MSFE1
            and the summarize results:
            Code:
            . sum MSFE1
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                   MSFE1 |         40    3.843928           0   3.843928   3.843928

            Comment


            • #21
              Dear Robert (and all participants)

              I am glad that you get same results as those using rangestat. This is very good news. Your replication looks very reasonable and consistent with what I needed. I gave you a "Like".

              I can see in #19 that the differences are in :

              1- The loop (while there should not be a loop as only one series is kept each time)
              2- The merge is different but I think it should lead to similar outcome.
              3- replacing MSFE to missing if outside the range but I think it should not make the difference comparing with my code as in my code after keeping _merge3 those observations are no longer there.

              Unfortunately, still can not find mu mistake, although I am sure there is a mistake in my code....Being puzzled for 24 more hours now, I am waiting for more hints!



              One more issue:

              If I want to replicate your #18 code when the regression includes more variables, will I need to make any change to the mata code you posted? or I only add the independent variables to rangestata, add them to the rename code, and the Forecast calculation ?


              An example with additional two independent variables: emp and inf is:


              Code:
              
              use "forstatalist.dta", clear
              
              * define a linear regression in Mata using quadcross() - help mata cross(), example 2
              mata:
              mata clear
              mata set matastrict on
              real rowvector myreg(real matrix Xall)
              {
                  real colvector y, b, Xy
                  real matrix X, XX
              
                  y = Xall[.,1]                // dependent var is first column of Xall
                  X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
                  X = X,J(rows(X),1,1)         // add a constant
                  
                  XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
                  Xy = quadcross(X, y)
                  b  = invsym(XX) * Xy
                  
                  return(rows(X), b')
              }
              end
              
              * the low and high bounds of the recursive rolling window, don't calculate
              * results if year >= 2005
              by series: gen low = quarter_date[1]
              by series: gen high = cond(year < 2005, quarter_date, .)
              
              rangestat (myreg) ngdp_first grs emp inf, interval(quarter_date low high) by(series) casewise
              rename myreg1 obs
              rename myreg2 b_grs
              rename myreg3 b_emp
              rename myreg4 b_inf
              rename myreg5 b_cons
              by series (quarter_date): gen Forecast = b_cons + b_grs * F.grs + b_emp*F.emp + b_inf* F.inf if obs >= 33
              gen Error = f.ngdp_first - Forecast
              bysort series: egen MSFE = mean(Error^2)
              by series: replace MSFE = . if obs < 33 | year >= 2005


              In the meantime, great job Robert, and very efficient rangestat code.

              Look forward to hearing from you (and any participant) so I close this posting.
              Last edited by Mike Kraft; 16 Mar 2017, 08:06.

              Comment


              • #22
                The merge is different and it matters. You are using time-series operators and these will generate missing values if you reduce the data to _merge == 3.

                The Mata function can handle as many variables as you want: it returns first the number of observation (myreg1), as many coefficients as there are independent variables, and then the coefficient of the constant. You just need to rename them and your setup seems correct. Don't take my word for it, spot check the results using the technique I showed earlier in the thread:

                Code:
                local i 50
                reg ngdp_first grs emp inf if quarter_date <= quarter_date[`i'] & series == series[`i']
                list in `i'
                
                local i 150
                reg ngdp_first grs emp inf if quarter_date <= quarter_date[`i'] & series == series[`i']
                list in `i'

                Comment

                Working...
                X