Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summarizing few loops in ONE

    Hi all,

    I have a data set that is basically a collection of 12 subsets of data. Each subset has data for quarterly Gross Domestic Product (GDP) and aggregates of economic variables that are observed at a certain week of the quarter. These certain points are the quarter's weeks. A variable called series corresponds to the week number. Therefore, when series=1, my quarterly aggregates are those observed at week 1 of the quarter. when series=2, my quarterly aggregates are those observed at week 2 of the quarter, and so on.

    My aim is to investigates the evolution of the relations between GDP and the aggregates as at different weeks of the quarter. I give an example of my data at the end of this post.

    I have a code to calculate mean squared forecast errors. This code is to be done for each "series" (i.e. each subset that corresponds to aggregates observed at a week of the quarter). The code is:

    Code:
    gen Lrgdp_first=L.rgdp_first
    
    gen MSFEgrs1=.
    if series==1 {
    rolling _b, window(33) recursive saving(newdata,replace): reg rgdp_first grs if series==1 & year<2005
    preserve
    use newdata.dta,clear
    rename end quarter_date
    tsset series quarter_date
    save newdata.dta,replace
    restore
    merge 1:1 series quarter_date using newdata.dta
    keep if _merge==3
    tsset quarter_date
    gen Forecast=_b_cons + _b_grs* F.grs
    gen Error=f.rgdp_first-Forecast
    gen SqError=(Error)^2
    egen MSFE=mean(SqError) if SqError!=.
    replace MSFEgrs1=MSFE
    
    drop  Forecast Error SqError MSFE
    
    gen spferror=rgdp_first-DRGDP2
    gen sq_spferror=(spferror)^2
    egen MSFEspf1=mean(sq_spferror)
    
    gen RWerror=rgdp_first-Lrgdp_first
    gen sqRWerror=(RWerror)^2
    egen MSFErw1=mean(sqRWerror)
    
    gen spf_rw1=MSFEspf1/MSFErw1
    gen grs_rw1=MSFEgrs1/MSFErw
    
    }
    My question:
    1- How can I improve my loop so that the same steps are done for each series, while the bold variables picks up the new values corresponding to the new series. I imagine something like MSFEgrs1, MSFEgrs2, MSFEgrs3.......till MSFEgrs12 // and also grs_rw1, grs_rw2, grs_rw3.....and so on for all bold variables.

    2- Then displaying a table that has rows spf_rw1, spf_rw2, ....till spf_rw12 in one column,, and grs_rw1 , grs_rw2, grs_rw3 til grs_rw12 in the next column



    an example of my data created using dataex is:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(quarter_date series) double rgdp_first float(DRGDP2 grs)
    108 1  4.299     3 -1.7400645
    109 1  2.591  1.78 -2.8215795
    110 1  3.838  2.46 -3.8739974
    111 1  4.151  2.19  -3.122698
    112 1  2.266   .33 -3.5722666
    113 1  3.089  2.14  -1.486287
    114 1  2.237     3   -3.54055
    115 1   1.99  2.46 -1.0885705
    116 1  5.546   3.9  -2.506453
    117 1  1.676  2.04  -2.459989
    118 1  2.501  1.73  1.6824546
    119 1   .501  1.27  2.3742833
    120 1  2.096  1.61   .9221073
    121 1   1.22  2.31 -1.8166744
    122 1  1.793  1.12  1.9894003
    123 1 -2.131 -1.28  4.6542344
    124 1 -2.811 -1.93   7.097381
    125 1   .418  -.08   9.544506
    126 1  2.371  2.56   5.090754
    127 1   .297  1.55  -.3905046
    128 1  1.978   .72  1.9397452
    129 1  1.386  2.99   2.390985
    130 1   2.65   2.2 -1.3339542
    131 1   3.79  1.98   .8567627
    132 1  1.799   3.1 -1.7540462
    133 1  1.577  2.55 -2.0117745
    134 1  2.844  3.06 -1.0230411
    135 1   5.87  3.39  -2.672692
    136 1  2.581  3.27 -4.1622634
    137 1  3.708   3.9  -2.737124
    138 1  3.438  2.52  -6.106498
    139 1  4.532  3.28  -3.720529
    140 1  2.819  3.11 -4.5580325
    141 1   .528  1.77  -2.739282
    142 1  4.205  2.14  2.9432015
    143 1      .  2.45  -.3020817
    144 1  2.809  1.42  -.7189608
    145 1  4.221  2.92  -.3356397
    146 1  2.171  2.59  -4.650067
    147 1  4.717  2.16  -3.525065
    148 1  5.611  2.31  -1.606285
    149 1  2.163  2.38 -2.3640177
    150 1  3.521  2.73   -3.76021
    151 1  4.298  2.81   -3.24092
    152 1  4.242  2.49  -4.067624
    153 1  1.417  2.59  -1.483128
    154 1  3.288  2.21  -1.802051
    155 1  5.585   2.4  .12005798
    156 1  4.492  2.95 -1.8684953
    157 1  2.288  3.19 -2.4996736
    158 1  4.824  3.33 -2.8045006
    159 1  5.798  3.85  -2.264354
    160 1  5.391     3 -4.6063266
    161 1   5.19  4.19 -2.2261415
    162 1  2.745  3.25  -.8391379
    163 1  1.373  3.22  1.8297783
    164 1  1.982    .8   1.295785
    165 1   .735  1.16   6.782963
    166 1  -.355  1.42   5.368276
    167 1   .224 -1.99   6.823607
    168 1  5.836  1.36  10.673666
    169 1  1.059  2.54   2.260923
    170 1  3.137   2.4   .5870026
    171 1   .744  1.53   .9940487
    172 1  1.598   2.1  1.8984247
    173 1   2.37  1.65   1.793958
    174 1  7.155  3.74   3.913879
    175 1  4.024  3.98 -.25143445
    176 1  4.158  4.42  -4.654116
    177 1  3.044  4.42 -3.2208996
    178 1  3.711  3.43 -2.5053735
    179 1  3.147  3.67 -1.2607688
    180 1  3.088  3.76   -.883205
    181 1  3.414  2.98  -2.734065
    182 1  3.805  4.15 -1.9732573
    183 1  1.119   3.2  -2.869404
    184 1  4.818  4.11  -2.463879
    185 1  2.458  3.28  -3.900098
    186 1  1.583  2.81  -.5839613
    187 1  3.473   2.5  1.2229854
    188 1   1.26  2.55   .6914418
    189 1  3.382  2.41   -.772797
    190 1  3.896   2.6  -1.419048
    191 1   .636  1.78  1.9809887
    192 1   .597   .66  2.1079657
    193 1  1.889   .08   3.705761
    194 1  -.252  1.34   7.907846
    195 1 -3.804 -2.63   9.775786
    196 1 -6.144 -4.91   28.71871
    197 1 -1.017 -1.31   33.14579
    198 1  3.534  2.31  16.748924
    199 1  5.731  2.59  2.3695307
    200 1  3.239  2.63   3.368487
    201 1  2.387  3.33 -1.2413363
    202 1   2.01  2.41 -4.7454147
    203 1  3.173  2.23  -.5648026
    204 1  1.748  3.54  -.6299318
    205 1  1.282  3.24 -1.1670973
    206 1  2.464  2.54  -.8973433
    207 1  2.752   2.5  .04872819
    end
    format %tq quarter_date



    Last edited by Mike Kraft; 08 Mar 2017, 13:26.

  • #2
    Just a token contribution. I note here these lines

    Code:
    egen MSFE=mean(SqError) if SqError!=.
    ...
    egen MSFEspf1=mean(sq_spferror)
    ...
    egen MSFErw1=mean(sqRWerror)
    In each case, you are putting a single mean in a new variable, which seems only to be used immediately afterwards. It is simpler just to summarize and then use r(mean) in what follows: e.g.

    Code:
    su SqError, meanonly  
    replace MSFEgrs1 = r(mean)

    Comment


    • #3
      Thanks Nick.
      I think you meant MSFEspf1 in the second line
      Code:
       
       su SqError, meanonly   replace MSFEspf1 = r(mean)

      I hope I also get some help with regard to the replication of the code for each series.

      Look forward to getting your contributions.

      Comment


      • #4
        No, I don't think so, unless your code was itself wrong earlier. This was the original example segment.

        Code:
        egen MSFE=mean(SqError) if SqError!=.
        replace MSFEgrs1=MSFE
        That can be replaced with

        Code:
        su SqError, meanonly  
        replace MSFEgrs1 = r(mean)
        The summarize will ignore the missings any way.

        As for the rest, it is a choice for me now between catching up on "Game of Thrones" and your problem, and sorry, but GoT gets it.

        Comment


        • #5
          Sorry Nick , you were right. I was confised with sq and spf. But yes, it is fine.


          I hope that others can helps also as you seem to be busy
          Hope you can do so when you come back as well !


          I really look forward to your contributions all as well.

          Comment


          • #6
            Dear All
            I am bringing this up again as the problem is still not solved!

            I think I need to create a variable with missing data for each amount I want to calculate and retain something like:

            Code:
            forval j = 1/12 {
                gen MSFEgrs`j'=.
            }
            starting my loop by

            Code:
            levelsof series, local(levels)
            
            foreach x of local levels {
            And then replace the calculated values inside the loop before dropping them, something like:

            Code:
            forval j = 1/12 {
                gen MSFEgrs`j'=.
            }

            However, I am still not able to construct the code.

            I hope really that someone helps.

            Thanks

            Comment


            • #7
              I'm not sure I understand what is wanted here, but perhaps this will help:

              Code:
              levelsof series, local(levels)
              gen MSFEgrs = .
              foreach x of local levels {
                  regress something something_else etcetera if series == `x'
                  calculate some predictions and errors in new variables
                  replace MSFEgrs = some_expression if series == `x'
                  drop those new variables (but not MSFEgrs!)
              }
              Note: Everything in italics​​​​​​​ needs to be replaced with actual code applicable to your problem.
              This general approach will run a series of regressions, each one using only those observations from a single series. It then calculates some predictions and prediction errors after each regression. Finally, those calculations are used to calculate the target variable MSFEgrs, whose value is then updated in only the observations of the current series. At the end of the loop each observation will have MSFEgrs values that correspond to that observation's series.

              This creates only a single MSFEgrs variable, not 12 of them. But that seems more consistent with the data organization and the overall problem.

              Comment


              • #8
                Looking at this again: much of the awkwardness stems from rolling and the way it works.

                Use rangestat (SSC) to do the regressions and then you can do everything in place. You don't need a loop for all the post-regression processing.

                Comment


                • #9
                  Can you advise please how rangestat would solve this problem?
                  Thanks

                  Comment


                  • #10
                    This is what I have done base on #7 but I got an error message:

                    Code:
                    levelsof series, local(levels)
                    gen MSFEgrs = .
                    foreach x of local levels {
                    rolling _b, window(33) recursive saving(newdata,replace): reg rgdp_first grs if series==`x'& year<2005
                    preserve
                    use newdata.dta,clear
                    rename end quarter_date
                    tsset series quarter_date
                    save newdata.dta,replace
                    restore
                    merge 1:1 series quarter_date using newdata.dta
                    keep if _merge==3
                    tsset quarter_date
                    gen Forecast=_b_cons + _b_grs* F.grs
                    gen Error=f.rgdp_first-Forecast
                    gen SqError=(Error)^2
                    egen MSFE=mean(SqError) if SqError!=.
                        
                    replace MSFEgrs = MSFE if series == `x'
                    drop Forecast Error SqError
                    }
                    The code starts to run before it stops and report the following error message:


                    Code:
                    running regress on estimation sample)
                    
                    -> series = 1
                    
                    Rolling replications (40)
                    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                    ........................................
                    file newdata.dta saved
                    (rolling: regress)
                           panel variable:  series (strongly balanced)
                            time variable:  quarter_date, 1995q1 to 2004q4
                                    delta:  1 quarter
                    file newdata.dta saved
                    
                        Result                           # of obs.
                        -----------------------------------------
                        not matched                         1,352
                            from master                     1,352  (_merge==1)
                            from using                          0  (_merge==2)
                    
                        matched                                40  (_merge==3)
                        -----------------------------------------
                    (1352 observations deleted)
                            time variable:  quarter_date, 1995q1 to 2004q4
                                    delta:  1 quarter
                    (1 missing value generated)
                    (2 missing values generated)
                    (2 missing values generated)
                    (2 missing values generated)
                    (38 real changes made)
                    (running regress on estimation sample)
                    no observations
                    an error occurred when rolling executed regress
                    r(2000);

                    Comment


                    • #11
                      Hi again
                      The only solution I found so far is to repeat the code 12 times
                      The problem is that I will need to estimate 4 types of the regression which will further lead to a repition of 48 time
                      I am sure there is a better way !!

                      Code:
                      ** Week 1
                      use Allfactors_gdp.dta,clear
                      gen Lrgdp_first=L.rgdp_first
                      keep if series==1
                      save series1.dta,replace
                      
                      
                      gen MSFEgrs=.
                      if series==1 {
                      rolling _b, window(33) recursive saving(newdata,replace): reg rgdp_first grs if year<2005
                      preserve
                      use newdata.dta,clear
                      rename end quarter_date
                      tsset series quarter_date
                      save newdata.dta,replace
                      restore
                      merge 1:1 series quarter_date using newdata.dta
                      keep if _merge==3
                      tsset quarter_date
                      gen Forecast=_b_cons + _b_grs* F.grs
                      gen Error=f.rgdp_first-Forecast
                      gen SqError=(Error)^2
                      egen MSFE=mean(SqError) if SqError!=.
                      replace MSFEgrs=MSFE
                      
                      drop  Forecast Error SqError MSFE
                      
                      gen spferror=rgdp_first-DRGDP2
                      gen sq_spferror=(spferror)^2
                      egen MSFEspf=mean(sq_spferror)
                      
                      gen RWerror=rgdp_first-Lrgdp_first
                      gen sqRWerror=(RWerror)^2
                      egen MSFErw=mean(sqRWerror)
                      
                      gen spf_rw=MSFEspf/MSFErw
                      gen grs1_rw=MSFEgrs/MSFErw
                      
                      }
                      
                      sum grs1_rw
                      
                      keep quarter_date series grs1_rw spf_rw
                      save grs1_rw.dta,replace
                      
                      **  Week 2
                      use Allfactors_gdp.dta,clear
                      gen Lrgdp_first=L.rgdp_first
                      keep if series==2
                      save series2.dta,replace
                      
                      
                      gen MSFEgrs=.
                      if series==2 {
                      rolling _b, window(33) recursive saving(newdata,replace): reg rgdp_first grs if year<2005
                      preserve
                      use newdata.dta,clear
                      rename end quarter_date
                      tsset series quarter_date
                      save newdata.dta,replace
                      restore
                      merge 1:1 series quarter_date using newdata.dta
                      keep if _merge==3
                      tsset quarter_date
                      gen Forecast=_b_cons + _b_grs* F.grs
                      gen Error=f.rgdp_first-Forecast
                      gen SqError=(Error)^2
                      egen MSFE=mean(SqError) if SqError!=.
                      replace MSFEgrs=MSFE
                      
                      drop  Forecast Error SqError MSFE
                      
                      gen spferror=rgdp_first-DRGDP2
                      gen sq_spferror=(spferror)^2
                      egen MSFEspf=mean(sq_spferror)
                      
                      gen RWerror=rgdp_first-Lrgdp_first
                      gen sqRWerror=(RWerror)^2
                      egen MSFErw=mean(sqRWerror)
                      
                      gen spf_rw=MSFEspf/MSFErw
                      gen grs2_rw=MSFEgrs/MSFErw
                      
                      }
                      
                      sum grs2_rw
                      
                      keep quarter_date series grs2_rw  spf_rw
                      save grs2_rw.dta, replace
                      
                      
                      **Week 3
                      use Allfactors_gdp.dta,clear
                      gen Lrgdp_first=L.rgdp_first
                      keep if series==3
                      save series3.dta,replace
                      
                      
                      gen MSFEgrs=.
                      if series==3{
                      rolling _b, window(33) recursive saving(newdata,replace): reg rgdp_first grs if year<2005
                      preserve
                      use newdata.dta,clear
                      rename end quarter_date
                      tsset series quarter_date
                      save newdata.dta,replace
                      restore
                      merge 1:1 series quarter_date using newdata.dta
                      keep if _merge==3
                      tsset quarter_date
                      gen Forecast=_b_cons + _b_grs* F.grs
                      gen Error=f.rgdp_first-Forecast
                      gen SqError=(Error)^2
                      egen MSFE=mean(SqError) if SqError!=.
                      replace MSFEgrs=MSFE
                      
                      drop  Forecast Error SqError MSFE
                      
                      gen spferror=rgdp_first-DRGDP2
                      gen sq_spferror=(spferror)^2
                      egen MSFEspf=mean(sq_spferror)
                      
                      gen RWerror=rgdp_first-Lrgdp_first
                      gen sqRWerror=(RWerror)^2
                      egen MSFErw=mean(sqRWerror)
                      
                      gen spf_rw=MSFEspf/MSFErw
                      gen grs3_rw=MSFEgrs/MSFErw
                      
                      }
                      
                      sum grs3_rw
                      
                      keep quarter_date series grs3_rw  spf_rw
                      save grs3_rw.dta, replace
                      The spf_rw variable does not change among series anyways. I will then need to combine all the 48 resulting dataset....of course the possibility of errors will increase. I hope there is a better way to summarize all that

                      Comment


                      • #12
                        As far as I can tell, the need to loop here is a red herring as rolling handles panel data. The sample data provided in #1 includes only one series. In the following, I expand it to create a second series. I've modified the rolling code accordingly and I show how to do the same using rangestat. Finally, I show how to spot check the regression results, something that I think is important to do whether you use rolling or rangestat.

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input float(quarter_date series) double rgdp_first float(DRGDP2 grs)
                        108 1  4.299     3 -1.7400645
                        109 1  2.591  1.78 -2.8215795
                        110 1  3.838  2.46 -3.8739974
                        111 1  4.151  2.19  -3.122698
                        112 1  2.266   .33 -3.5722666
                        113 1  3.089  2.14  -1.486287
                        114 1  2.237     3   -3.54055
                        115 1   1.99  2.46 -1.0885705
                        116 1  5.546   3.9  -2.506453
                        117 1  1.676  2.04  -2.459989
                        118 1  2.501  1.73  1.6824546
                        119 1   .501  1.27  2.3742833
                        120 1  2.096  1.61   .9221073
                        121 1   1.22  2.31 -1.8166744
                        122 1  1.793  1.12  1.9894003
                        123 1 -2.131 -1.28  4.6542344
                        124 1 -2.811 -1.93   7.097381
                        125 1   .418  -.08   9.544506
                        126 1  2.371  2.56   5.090754
                        127 1   .297  1.55  -.3905046
                        128 1  1.978   .72  1.9397452
                        129 1  1.386  2.99   2.390985
                        130 1   2.65   2.2 -1.3339542
                        131 1   3.79  1.98   .8567627
                        132 1  1.799   3.1 -1.7540462
                        133 1  1.577  2.55 -2.0117745
                        134 1  2.844  3.06 -1.0230411
                        135 1   5.87  3.39  -2.672692
                        136 1  2.581  3.27 -4.1622634
                        137 1  3.708   3.9  -2.737124
                        138 1  3.438  2.52  -6.106498
                        139 1  4.532  3.28  -3.720529
                        140 1  2.819  3.11 -4.5580325
                        141 1   .528  1.77  -2.739282
                        142 1  4.205  2.14  2.9432015
                        143 1      .  2.45  -.3020817
                        144 1  2.809  1.42  -.7189608
                        145 1  4.221  2.92  -.3356397
                        146 1  2.171  2.59  -4.650067
                        147 1  4.717  2.16  -3.525065
                        148 1  5.611  2.31  -1.606285
                        149 1  2.163  2.38 -2.3640177
                        150 1  3.521  2.73   -3.76021
                        151 1  4.298  2.81   -3.24092
                        152 1  4.242  2.49  -4.067624
                        153 1  1.417  2.59  -1.483128
                        154 1  3.288  2.21  -1.802051
                        155 1  5.585   2.4  .12005798
                        156 1  4.492  2.95 -1.8684953
                        157 1  2.288  3.19 -2.4996736
                        158 1  4.824  3.33 -2.8045006
                        159 1  5.798  3.85  -2.264354
                        160 1  5.391     3 -4.6063266
                        161 1   5.19  4.19 -2.2261415
                        162 1  2.745  3.25  -.8391379
                        163 1  1.373  3.22  1.8297783
                        164 1  1.982    .8   1.295785
                        165 1   .735  1.16   6.782963
                        166 1  -.355  1.42   5.368276
                        167 1   .224 -1.99   6.823607
                        168 1  5.836  1.36  10.673666
                        169 1  1.059  2.54   2.260923
                        170 1  3.137   2.4   .5870026
                        171 1   .744  1.53   .9940487
                        172 1  1.598   2.1  1.8984247
                        173 1   2.37  1.65   1.793958
                        174 1  7.155  3.74   3.913879
                        175 1  4.024  3.98 -.25143445
                        176 1  4.158  4.42  -4.654116
                        177 1  3.044  4.42 -3.2208996
                        178 1  3.711  3.43 -2.5053735
                        179 1  3.147  3.67 -1.2607688
                        180 1  3.088  3.76   -.883205
                        181 1  3.414  2.98  -2.734065
                        182 1  3.805  4.15 -1.9732573
                        183 1  1.119   3.2  -2.869404
                        184 1  4.818  4.11  -2.463879
                        185 1  2.458  3.28  -3.900098
                        186 1  1.583  2.81  -.5839613
                        187 1  3.473   2.5  1.2229854
                        188 1   1.26  2.55   .6914418
                        189 1  3.382  2.41   -.772797
                        190 1  3.896   2.6  -1.419048
                        191 1   .636  1.78  1.9809887
                        192 1   .597   .66  2.1079657
                        193 1  1.889   .08   3.705761
                        194 1  -.252  1.34   7.907846
                        195 1 -3.804 -2.63   9.775786
                        196 1 -6.144 -4.91   28.71871
                        197 1 -1.017 -1.31   33.14579
                        198 1  3.534  2.31  16.748924
                        199 1  5.731  2.59  2.3695307
                        200 1  3.239  2.63   3.368487
                        201 1  2.387  3.33 -1.2413363
                        202 1   2.01  2.41 -4.7454147
                        203 1  3.173  2.23  -.5648026
                        204 1  1.748  3.54  -.6299318
                        205 1  1.282  3.24 -1.1670973
                        206 1  2.464  2.54  -.8973433
                        207 1  2.752   2.5  .04872819
                        end
                        format %tq quarter_date
                        
                        * create a second week in the series, change the numbers a bit
                        expand 2
                        bysort quarter_date: replace series = 2 if _n == 1
                        replace rgdp_first = rgdp_first + runiform() if series == 2
                        replace DRGDP2 = DRGDP2 + runiform() if series == 2
                        replace grs = grs + runiform() if series == 2
                        sort series quarter_date
                        
                        * -------------- solution using rolling ------------
                        tsset series quarter_date
                        rolling _b, window(33) recursive saving(newdata,replace): reg rgdp_first grs
                        preserve
                        use newdata.dta,clear
                        rename end quarter_date
                        tsset series quarter_date
                        save newdata.dta,replace
                        restore
                        merge 1:1 series quarter_date using newdata.dta, assert(master match) nogen
                        tsset series quarter_date
                        gen Forecast=_b_cons + _b_grs* F.grs
                        gen Error=f.rgdp_first-Forecast
                        gen SqError=(Error)^2
                        bysort series (quarter_date): egen MSFE = mean(SqError)
                        drop start SqError
                        rename (Forecast Error MSFE) (Forecast0 Error0 MSFE0)
                        
                        * ---------------- solution using rangestat ----------
                        * define a linear regression in Mata using quadcross() - help mata cross(), example 2
                        mata:
                        mata clear
                        mata set matastrict on
                        real rowvector myreg(real matrix Xall)
                        {
                            real colvector y, b, Xy
                            real matrix X, XX
                        
                            y = Xall[.,1]                // dependent var is first column of Xall
                            X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
                            X = X,J(rows(X),1,1)         // add a constant
                            
                            XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
                            Xy = quadcross(X, y)
                            b  = invsym(XX) * Xy
                            
                            return(rows(X), b')
                        }
                        end
                        
                        * a recursive rolling window starts at the first observation
                        gen qdate1 = quarter_date[1]
                        rangestat (myreg) rgdp_first grs, interval(quarter_date qdate1 quarter_date) by(series) casewise
                        rename (myreg1 myreg2 myreg3) (obs b_grs b_cons)
                        
                        by series (quarter_date): gen Forecast = b_cons + b_grs * F.grs if _n >= 33
                        gen Error = f.rgdp_first - Forecast
                        bysort series: egen MSFE = mean(Error^2)
                        
                        * ------------------- spot check the coefficients for a few cases --------------
                        
                        local i 50
                        reg rgdp_first grs if quarter_date <= quarter_date[`i'] & series == series[`i']
                        list in `i'
                        
                        local i 150
                        reg rgdp_first grs if quarter_date <= quarter_date[`i'] & series == series[`i']
                        list in `i'

                        Comment


                        • #13
                          Thanks Robert so much. I tried the code in #12
                          The code works and produces results. But the results are not accurate in my setting.
                          I noted that it calculates the out of sample results for an evaluation sample from 1995 to 2004 (I have edited the code to be restricted to a final year of 2004) for the first series.
                          But for all next series, it produces results from 1987 to 2004. In other words, the evaluation sample is not constant across series.

                          I think there might be a little fix to make it fine. I also think the structure of the data might not have been clear to Robert which caused the issue. I apologize and I clarify more below

                          I have produced few examples using this nice dataex command by Picard and Cox . Here I show data for series 2, 3, 4... Mind that I have data for 12 series. Think about it as 12 subsets of data that goes from date 1987q1 to 2015q4 when series=1 , and then goes again from date 1987q1 to 2015q4 when series=2 , and so on. I give some examples:

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float quarter_date int year byte(quarter month week) float grs double rgdp_first float(Lrgdp_first series)
                          108 1987 1 1 2 -2.0297842 4.299     . 2
                          109 1987 2 1 2  -2.703133 2.591 4.299 2
                          110 1987 3 1 2 -3.5352764 3.838 2.591 2
                          111 1987 4 1 2  -3.710979 4.151 3.838 2
                          112 1988 1 1 2  -5.708353 2.266 4.151 2
                          end
                          format %tq quarter_date

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float quarter_date int year byte(quarter month week) float grs double rgdp_first float(Lrgdp_first series)
                          108 1987 1 1 3 -2.1076612 4.299     . 3
                          109 1987 2 1 3   -2.59371 2.591 4.299 3
                          110 1987 3 1 3 -3.5369556 3.838 2.591 3
                          111 1987 4 1 3  -3.756836 4.151 3.838 3
                          112 1988 1 1 3  -5.638969 2.266 4.151 3
                          end
                          format %tq quarter_date
                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float quarter_date int year byte(quarter month week) float grs double rgdp_first float(Lrgdp_first series)
                          108 1987 1 1 4 -1.9988308 4.299     . 4
                          109 1987 2 1 4  -2.511362 2.591 4.299 4
                          110 1987 3 1 4 -3.6620574 3.838 2.591 4
                          111 1987 4 1 4  -3.832227 4.151 3.838 4
                          112 1988 1 1 4  -5.493874 2.266 4.151 4
                          end
                          format %tq quarter_date

                          The code I have used is:
                          Code:
                          mata:
                          mata clear
                          mata set matastrict on
                          real rowvector myreg(real matrix Xall)
                          {
                              real colvector y, b, Xy
                              real matrix X, XX
                          
                              y = Xall[.,1]                // dependent var is first column of Xall
                              X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
                              X = X,J(rows(X),1,1)         // add a constant
                              
                              XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
                              Xy = quadcross(X, y)
                              b  = invsym(XX) * Xy
                              
                              return(rows(X), b')
                          }
                          end
                          
                          * a recursive rolling window starts at the first observation
                          gen qdate1 = quarter_date[1]
                          rangestat (myreg) rgdp_first grs, interval(quarter_date qdate1 quarter_date) by(series) casewise
                          rename myreg1 obs
                          rename myreg2 b_grs
                          rename myreg3 b_cons
                          
                          by series (quarter_date): gen Forecast = b_cons + b_grs * F.grs if _n >= 33 & year<2005
                          gen Error = f.rgdp_first - Forecast  if _n >= 33  & year<2005
                          bysort series: egen MSFEgrs = mean(Error^2)  if _n >= 33 & year<2005
                          
                          
                          
                          gen spferror=rgdp_first-DRGDP2  if _n >= 33  & year<2005
                          gen sq_spferror=(spferror)^2 if _n >= 33  & year<2005
                          egen MSFEspf=mean(sq_spferror)if _n >= 33  & year<2005
                          
                          gen Lrgdp_first=L.rgdp_first
                          gen RWerror=rgdp_first-Lrgdp_first  if _n >= 33  & year<2005
                          gen sqRWerror=(RWerror)^2 if _n >= 33  & year<2005
                          egen MSFErw=mean(sqRWerror)if _n >= 33  & year<2005
                          
                          gen grs_rw=MSFEgrs/MSFErw
                          gen spf_rw=MSFEspf/MSFErw


                          I hope I hear back from you.

                          Comment


                          • #14
                            I'm sorry but I don't see your data and I do not understand your reply. You say that the first series starts in 1995 and then later say that it starts in 1987.

                            If you want to restrict your computations to data from before 2005 and after 1987, just reduce the data using
                            Code:
                            keep if inrange(year,1987, 2004)
                            and then perform the calculations. You can later merge back the results with the original data if needed.

                            My guess is that you do not understand the code I posted in #12, in particular the following lines:
                            Code:
                            by series (quarter_date): gen Forecast = b_cons + b_grs * F.grs if _n >= 33
                            gen Error = f.rgdp_first - Forecast
                            bysort series: egen MSFE = mean(Error^2)
                            Within each group with the same values for series, the variable Forecast will be missing for the first 32 observations because of the condition imposed. I do not know where the 33 comes from, I just replicated what you were doing with rolling. The same goes for the variable Error; it will have missing values if Forecast is missing. The MSFE variable is the mean of Error^2, ignoring missing values. If you do not like the fact that egen fills all observations within each group with the mean, just replace the first 32 with missing values by adding the same conditional:
                            Code:
                            by series: replace MSFE = . if _n < 33
                            I note that in the new code you posted in #13, you omitted the bysort series:prefix which means that only the first 32 observations of the first series will be missing. This can't be right and highlights the problem of trying to fix your code when the code is the only explanation of what you are trying to do.
                            Last edited by Robert Picard; 10 Mar 2017, 11:01. Reason: fixed the last code sample

                            Comment


                            • #15
                              Thanks Robert so much.

                              Let me clarify.
                              The aim of this exercise is to use data from the beginning of the sample which is 1987q1 to 1994q4 to estimate regression coefficients that re used to make out of sample forecast and then calculating means squared forecast errors for the sample that starts from 1995q1 till 2004 (ignore the rest of year after 2004 for now). Then the regression expands by one quarter and the same routine applies and so on in an expanding rolling regression fashion. Therefore my first window in the test sample is 1987q1 to 1994q and it expands and my evaluation samples starts from 1995 (and I restrict it till end of 2004).



                              When using your code: the results in series 1 shows values for Forecast, Error, MSFE. However, the results in series 2 onwards show values for Forecast, Error, MSFE from 1987 onwards. In this case, I do not know what was the first window used to make the forecast? Please also note that the values of the data are different in each series, so we can not use data from the first series to make forecasts in the second. Think about each series as a different dataset.

                              Moreover, I think in my code above, I used bysort: series and I also used _n>-33 I copy the following from my use of your code in #13
                              Code:
                               bysort series: egen MSFEgrs = mean(Error^2)  if _n >= 33 & year<2005
                              The ideal situation is that in each series I get Forecast, Error and MSFE for the evaluation sample that starts from 1995.

                              My dataset is structured such as:



                              I hope you find a way to get rangestat work properly here. Thanks in advance.

                              Code:
                                
                              quarter_date        serier           grs
                              1987q1                1
                              ......                1
                              2015q4                1
                              1987q1                2
                              ......                2
                              2015q4                2
                              1987q1                3
                              ......                3
                              2014q4                3
                              Last edited by Mike Kraft; 10 Mar 2017, 10:48.

                              Comment

                              Working...
                              X