Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A loop over multiple events: what went wrong?

    Dear users

    I have multiple events coded as per their distance from an announcement day 0. The events take numbers from -6 to 33, i.e. { -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 }, where -6 represent 6 days before day 0 and number 33 represents 33 days after day 0. The variable that has these values is labeled as "cycle". In my daily time-series dataset, these cycles are adjacent to each other, i.e., when a cycle ends the next cycle starts. However, while a typical cycles runs from -6 to 33, other cycles can be shorter and run from say -2 to 27. The cycle can never take a value before -6 or a value after 33.

    I want to estimate a regression for a window of 5 days around each day in the cycle. Therefore, there should be 40 regressions to represent all days of the cycle.

    Here is my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(date retr infl cycle)
    12421   -.4379999879747576                    0  9
    12422    .2519999947398821                    0 10
    12423   .22199999354779454   -.7529186420793931 11
    12424  -.05800000019371421   -.6476922856575159 12
    12425   .49199998937548006   -.5675136933403084 13
    12428    .8919999953359348   -.1829062060379713 14
    12429  -.14799999631941452    1.132541406485246 15
    12430  .051999999210239345    .6876117110290538 16
    12431  -.15800000168383344   2.1938483273635483 17
    12432    .4819999989122081   1.0254890384207072 18
    12435  -.23799999989568832   -.8807656951864905 19
    12436   .11200000159441448   1.5604825117120975 20
    12437  -.06799999810754853    1.547373653807777 21
    12438   .22199999354779454    .6136717629895714 22
    12439  -.03800000064074771    1.072029207366402 23
    12442   -.4579999987035954   -.7425293417037483 24
    12443  -.24799999035894915   1.5244337943724877 25
    12444     .421999996528033   .10042454005582642 26
    12445     .791999971494084    .8037517518763057 -6
    12446    .4319999869912827    4.385885001254089 -5
    12449    .5819999929517428   .10237807752121178 -4
    12450  -.26900000125169576  -.48581482595927256 -3
    12451   .44100000709295095   1.2778222919669644 -2
    12452   -.1989999935030906    3.610154774440529 -1
    12453  -2.3089999333024025    .4480062347284372  0
    12456    .3410000130534163 -.029894337799963092  1
    12457   .06100000068545164                    0  2
    12458    .4509999975562007                    0  3
    12459   -.6590000167489007                    0  4
    12460   .04099999926985287   -.7859015611495589  5
    12463   .12099999934434003   2.3521912358714934  6
    12464   .48099999874831045    4.178839732149401  7
    12465   .07099999859929707   1.1339482016251412  8
    12466  -.44900000840425447                    0  9
    12467  -.46899998933076503  -.40466707984601585 10
    12470                    0                    0 11
    12471    .6409999951720202    .8514896750455218 12
    12472  -.12900000065565465   .43541366890007605 13
    12473   -1.239000000059609   .47331693320577684 14
    12474    .3710000142455039   1.5243646588527011 15
    12477   .39099999517202555                    0 16
    12478   -.5079999808222047  -.10542929997720135 17
    12479  -.11799999512731585  -.22343245179344326 18
    12480   -.2279999945312694  -.45511082398417746 19
    12481    .4720000084489584   -2.940778018779746 20
    12484     .572000002488493    .4683234004475248 21
    12485  -.21800000406801967    .8334439141415245 22
    12486   .17199999652803832   -2.098348541818269 23
    12487   -.6080000046640666    .6744274318240081 24
    12488   .45199999772012056    .1415167869680784 25
    12491    .3019999917596605   .33616394297385477 -6
    12492  .041999999433750546    .9915544650372341 -5
    12493     .572000002488493   -3.283250073830859 -4
    12494    .3120000120252264   -.9491470253930308 -3
    12495   .06200000084937152   .33768976426772973 -2
    12498   -.4879999998956941   -2.596585492063645 -1
    12499  .051999999210239345   .19178917268155118  0
    12500  .041999999433750546   .34001006035488796  1
    12501   -.9680000189691729    .7771827766196536  2
    12502   -.6080000046640666  -1.0974243742098777  3
    12505   -.4579999987035954   .04743454219793581  4
    12506  -1.7879999522119716   1.3928657270070266  5
    12507  -1.5179999712854664    .5086320610214753  6
    12508  -.11799999512731585   -2.034626822761019  7
    12509                    0                    0  8
    12512  -1.5660000424832066   .24298939528729838  9
    12513    2.383999885991206   -.7365092835970641 10
    12514    .0439999997615903                    0 11
    12515    .6140000242739863   .11380786290790468 12
    12516   -.7960000019520574    .2787635195888093 13
    12519   .34400001354515375    .0804331917080485 14
    12520   -.5260000210255411    .1555537050230116 15
    12521   -.5960000138729771  -.10403268395209728 16
    12522    .0539999995380569    .3851601714904467 17
    12523    .0439999997615903  .046418782065053406 18
    12526   -.8059999924153072   .33985984629417676 19
    12527   -.2459999900311205    .5846533238807186 20
    12528  -.40599998645484137    1.027287391254951 21
    12529    1.484000029042365    .5146464678134081 22
    12530 -.036000000312919056    .2506092878137687 23
    12533   1.0140000004321248  -.31719122367815267 24
    12534   .06400000117718907 -.006361355893142129 25
    12535                    0                    0 26
    12536   -.5360000114887908   1.2731798279660753 27
    12537    .4839999992400479   -.5364178929443594 28
    12540    .5149999996647336   1.3085677687013781 29
    12541 .0049999998882332974  .010685685744765713 30
    12542  -.13500000629573794   .25370652051436704 31
    12543  -.08500000182539713   .18124375121317704 32
    12544   -.8950000265613212   -3.065225152930967 33
    12547  -1.2250000098720237    .2665850052673965 -6
    12548    .6049999734386802   -2.531663560488393 -5
    12549  -1.0449999431148216    .5168722915188722 -4
    12550    .3949999948963612   .29155003275982916 -3
    12551   .05499999877065509  -.26820357894854213 -2
    12554  -.14499999675899877   -.6522250615984035 -1
    12555    .8250000020489034    .4132320059066271  0
    12556   1.1449999948963452  -1.2453641998690859  1
    12557    .6250000139698342    .5933305965824383  2
    12558  -.22499999497085366                    0  3
    end
    format %td date

    I did the following code:

    Code:
    gen time = _n
    tsset time
    
    * Identify unique event days
    levelsof cycle, local(events)
    
    foreach time in `events' {
    reg retr infl if cycle >= `time' - 2 & cycle <= `time' + 2, r
    }
    I do not know what went wrong with the loop?!! What makes me think it is wrong is that it seems to be picking up an incorrect number of observations per regression.

    For example, if you try something like:

    Code:
    sum infl if cycle=30
    And multiply the number of observations by 5 (as there should be a window of 5 days around each cycle day), you will see that the number of obs is different from the number of observations in the regression.
    There are no missing values for the dependent or independent variables. Also, the time series dataset is daily. It has no weekends and it has a few missing days. However, I think this should not be a problem as I use the variable "time".


    I would really appreciate any assistance with this matter

  • #2
    Are you opposed to this?

    Code:
    * ssc install asreg
    capture drop _*
    asreg retr infl , window(time -2 3) robust
    I'm a little confused. By time will you get 5 regressions (except at the edges). By cycle observations will get you various observations depending on how many times the cycle number appears (0 being the most I suppose).
    Last edited by George Ford; 13 May 2024, 06:55.

    Comment


    • #3
      This may be what your after (regression by cycle). You'll get the same coefficient for each cycle, but it will appear for all observations (repeated). I like this because it stores the results for you automatically.

      Code:
      tab cycle  // to see what you have
      capture drop _*
      asreg retr infl , window(cycle -2 3) robust
      tabstat _b_infl if !mi(_b_infl), by(cycle) stat(mean min max)
      tabstat _Nobs if !mi(_b_infl), by(cycle) stat(mean min max)
      This fixes your loop. You didn't need to use time but cycle. I've added some info to help you see what's going on.

      Code:
      levelsof cycle, local(events)
      foreach c in `events' {
          di "FOR CYCLE `c' THE RANGE IS: " _col(40) "Lower = " `c' - 2 _col(60) "Upper = " `c' + 2
          reg retr infl if cycle & cycle >= `c' - 2 & cycle <= `c' + 2, r
          di `"{hline 78}"'
      }

      Comment


      • #4
        This fixes a problem. You can compare what you get. Results are essentially identical (look at b_ratio).

        Code:
        g b_infl_loop = .
        g Nobs = .
        levelsof cycle, local(events)
        foreach c in `events' {
            di "FOR CYCLE `c' THE RANGE IS: " _col(40) "Lower = " `c' - 2 _col(60) "Upper = " `c' + 2
            reg retr infl if cycle==`c' | ( cycle >= `c' - 2 & cycle <= `c' + 2), r
            replace b_infl_loop = _b[infl] if cycle==`c'
            replace Nobs = e(N) if cycle==`c'
            di `"{hline 78}"'
        }
        
        capture drop _*
        asreg retr infl , window(cycle -3 2) robust
        tabstat _b_infl b_infl_loop if !mi(_b_infl), by(cycle) stat(mean min max)
        correl _b_infl b_infl_loop if !mi(_b_infl)
        assert _b_infl == b_infl_loop
        g b_ratio = _b_infl/b_infl_loop
        summ b_ratio  //trivial differences
        tabstat _Nobs Nobs if !mi(_b_infl), by(cycle) stat(mean min max)
        correl _Nobs Nobs
        assert _Nobs == Nobs

        Comment


        • #5
          Dear George

          Thank you very much for your help with that which I very much appreciate.

          I ran the loop in #4 but it gave me an error;
          FOR CYCLE -6 THE RANGE IS: Lower = -8 Upper = -4
          variables out of order
          r(111);

          The problem is that the cycle variable has values from -6 to 33 and given that not all cycles are the same length, they can start at -4 or -3 or other values (but never before -6) and ends for example at 20 23 or other values (but never after 33). The cycles are adjacent to each other so when one cycle ends the other starts and so on.

          So yes I want to estimate the regressions for each cycle number and for 5 days around it. For example, if the cycle is 1, I want to have two days before and after that day. The problem seems to happen with cycle values at the edges, for example, cycle value -6, as my code will take not find any value for -7 or -8 but then will only include the days after -6. The same problem happens for those at the other edge. To make it simple, I just want the 5 days around each cycle day regardless of the cycle number, so always 5 obs window. And yes, the regression will be for all cycle days with the same number; so one regression across the sample for cycle day 3, or 4, or 5, and so on, and hence one beta and one confidence interval. At the end I will collapse the data and create a new dataset for each cycle day along with its beta and confidence intervals.

          I have also created a code below that I hope is doing what I wanted. It seems to run smoothly and produces reasonable results. However, it is still having an issue with those cycle days at the edges as it does not pick up the 5 days observations around the day...The problem seems to come from the way the window is specified as the code does not find cycles -7 or -8 or 34 and 35..And also when the cycle ends earlier it will have a similar issue.

          Code:
          gen tt = _n
          tsset tt
           
          * Identify unique event days
          levelsof cycle, local(event_days)
           
          * Initialize lists to store coefficients, t-statistics, and confidence intervals
          local coefficients ""
          local t_statistics ""
          local ci_lower ""
          local ci_upper ""
           
          * Create variables to store coefficients, t-statistics, and confidence intervals
          gen coef = .
          gen t_stat = .
          gen ci_lower = .
          gen ci_upper = .
           
          * Loop over each event day
          foreach tt in `event_days' {
              * Run the regression
          reg retr infl if cycle >= `tt' - 2 & cycle <= `tt' + 2, r
           
              * Store coefficient for the current event day
              replace coef = _b[infl] if cycle == `tt'
           
              * Store t-statistic for the current event day
              replace t_stat = _b[infl] / _se[infl] if cycle == `tt'
           
              * Calculate confidence intervals (90%)
              *scalar z = invnormal(1 - (0.1 / 2))
                           
                            * r calculate Critical value for 95% confidence level
                            scalar z = invnormal(1 - (0.05 / 2))
                           
              scalar ci_lower_val = _b[infl] - z * _se[infl]
              scalar ci_upper_val = _b[infl] + z * _se[infl]
           
              * Store confidence intervals
              replace ci_lower = ci_lower_val if cycle == `tt'
              replace ci_upper = ci_upper_val if cycle == `tt'
           
              * Append coefficient, t-statistic, and confidence intervals to the lists
              local coef_value = _b[infl]
              local t_stat_value = `coef_value' / _se[infl]
              local coefficients "`coefficients' `coef_value'"
              local t_statistics "`t_stat_value'"
              local ci_lower_value = ci_lower_val
              local ci_upper_value = ci_upper_val
              local ci_lower "`ci_lower_value'"
              local ci_upper "`ci_upper_value'"
          }
           
          * Display coefficients, t-statistics, and confidence intervals
          di "Coefficients: `coefficients'"
          di "T-Statistics: `t_statistics'"
          di "CI Lower: `ci_lower'"
          di "CI Upper: `ci_upper'"

          I am happy to go with any code that does what I aim.

          I look forward to hearing from you

          Comment


          • #6
            I see the problem. You need to lag/forward in "tt" based on "cycle".

            Try this. It creates a dummy around the cycle value in time. Recode traps it around the cycle date so it doesn't explode. You can then reg y x if keep

            Code:
            g tt = _n
            tsset tt
            
            forv k = -6/33 {
                capture drop keep
                g keep = 0
                replace keep = 1 if cycle==`k'
                recode keep (0 = 1) if f2.keep==1 | f1.keep==1 | l1.keep==1 | l2.keep==1
                ** do stuff if keep
            }

            Comment


            • #7
              Thank you so much. Unfortunately, there are a few issues here:

              1) I am not sure which part of my code I should integrate with #6. Can you please integrate this with my code? I tried but it did not produce results and so I am assuming I am integrating it incorrectly.

              2) I am a bit confused about "keep" as it would seem to have values for -6, -5 and 31,32,33 of the cycle. How does that help here?

              3) please note that some cycles will end early say at 27 or 28 and some cycles will start at say -5, or -2; but cycles can never go beyond -6 or 33. In any case, I am only interested in the 5-(calendar)day window around any day of the cycle.

              Comment


              • #8
                Perhabs for each cycle, you can create a min and max cycle number to know when it starts and ends (and the day before the minimum; also the day before the max) and use that?! There is a variable for cycle_ID.

                But the issue remains on how I can loop over this variable as well as the cycle?!

                Comment


                • #9
                  Code:
                  g tt = _n
                  tsset tt
                  
                  g b_infl_loop = .
                  
                  forv k = -6/33 {
                      capture drop keep
                      g keep = 0
                      replace keep = 1 if cycle==`k'
                      recode keep (0 = 1) if f2.keep==1 | f1.keep==1 | l1.keep==1 | l2.keep==1
                      qui reg retr infl if keep, r
                      qui replace b_infl_loop = _b[infl] if cycle==`k'
                  }

                  Comment


                  • #10
                    Dear George

                    That seems to be working. I added the condition in the regression "if cycle<=33 & keep" which seems to work reasonably well.
                    Also, I assume that if I want to run the regression only for the cycle day, I will just simply delete the line that has "recorded".

                    I hope these are fine.

                    Thanks

                    Comment


                    • #11
                      I want to run the regression as ivreg2 in the loop and extract the confidence intervals,

                      I created these outside the loop:

                      Code:
                      gen ci_lower = .
                      gen ci_upper = .
                      And I added these lines inside the loop
                      Code:
                          xi: ivreg2 retr infl i.day*i.month if  cycle<=33 & keep, robust bw(auto) small
                      Also

                      Code:
                      * Calculate confidence intervals (95%)
                          *scalar z = invnormal(1 - (0.05 / 2)) 
                         
                          scalar ci_lower_val = _b[infl] - z * _se[infl]
                          scalar ci_upper_val = _b[infl] + z * _se[infl]
                      
                          replace ci_lower = ci_lower_val if cycle == `k'
                          replace ci_upper = ci_upper_val if cycle == `k'
                      Or alternatively instead of the last part:

                      Code:
                      * Calculate confidence intervals (95%)
                         
                          scalar ci_lower_val = _b[infl] - 1.96 * _se[infl]
                          scalar ci_upper_val = _b[infl] + 1.96 * _se[infl]
                      
                          replace ci_lower = ci_lower_val if cycle == `k'
                          replace ci_upper = ci_upper_val if cycle == `k'
                      The problem is that the confidence intervals are not the same as the ones in the output of the table.

                      Is there any way to just extract them from the table to be consistent?

                      Comment


                      • #12
                        ivreg2 is from SSC (FAQ Advice #12). Estimate the model with ivreghdfe from https://github.com/sergiocorreia/ivreghdfe and extract these from r(table). Also, use factor variable notation instead of the xi: prefix.

                        Code:
                        ivreghdfe ...
                        mat list r(table)

                        Comment


                        • #13
                          The model is estimated in other parts of the project with ivreg2 for newey and west with automatic bandwidth. Therefore, I need to stick to that.

                          Is there any way to get the confidence intervals from the ivreg2 in the loop?

                          Comment


                          • #14
                            There most likely is, but I won't be looking into that. For what it's worth, the documentation of ivreghdfe states

                            ivreghdfe is essentially ivreg2 with an additional absorb() option from reghdfe.
                            Essentially, whatever you can do with ivreg2, you can also do with ivreghdfe. Take it or leave it.

                            Comment


                            • #15
                              Perhabs it will be easier if someone can adjust the code in #9 to be able to have a variable for the lower confidence interval and upper confidence interval, as I am a bit confused. If ivreg2 and ivreghdfe are the same, what matters to me is that I have the same fixed effects and the same conditions on the bandwidth
                              Code:
                               xi: ivreg2 retr infl i.day*i.month if cycle<=33 & keep, robust bw(auto) small
                              when extracting the lower and upper bounds.

                              I look forward to further assistance.

                              Comment

                              Working...
                              X