A loop over multiple events: what went wrong?

Mike Kraft

Join Date: Dec 2014
Posts: 328

A loop over multiple events: what went wrong?

13 May 2024, 02:04

Dear users

I have multiple events coded as per their distance from an announcement day 0. The events take numbers from -6 to 33, i.e. { -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 }, where -6 represent 6 days before day 0 and number 33 represents 33 days after day 0. The variable that has these values is labeled as "cycle". In my daily time-series dataset, these cycles are adjacent to each other, i.e., when a cycle ends the next cycle starts. However, while a typical cycles runs from -6 to 33, other cycles can be shorter and run from say -2 to 27. The cycle can never take a value before -6 or a value after 33.

I want to estimate a regression for a window of 5 days around each day in the cycle. Therefore, there should be 40 regressions to represent all days of the cycle.

Here is my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(date retr infl cycle)
12421   -.4379999879747576                    0  9
12422    .2519999947398821                    0 10
12423   .22199999354779454   -.7529186420793931 11
12424  -.05800000019371421   -.6476922856575159 12
12425   .49199998937548006   -.5675136933403084 13
12428    .8919999953359348   -.1829062060379713 14
12429  -.14799999631941452    1.132541406485246 15
12430  .051999999210239345    .6876117110290538 16
12431  -.15800000168383344   2.1938483273635483 17
12432    .4819999989122081   1.0254890384207072 18
12435  -.23799999989568832   -.8807656951864905 19
12436   .11200000159441448   1.5604825117120975 20
12437  -.06799999810754853    1.547373653807777 21
12438   .22199999354779454    .6136717629895714 22
12439  -.03800000064074771    1.072029207366402 23
12442   -.4579999987035954   -.7425293417037483 24
12443  -.24799999035894915   1.5244337943724877 25
12444     .421999996528033   .10042454005582642 26
12445     .791999971494084    .8037517518763057 -6
12446    .4319999869912827    4.385885001254089 -5
12449    .5819999929517428   .10237807752121178 -4
12450  -.26900000125169576  -.48581482595927256 -3
12451   .44100000709295095   1.2778222919669644 -2
12452   -.1989999935030906    3.610154774440529 -1
12453  -2.3089999333024025    .4480062347284372  0
12456    .3410000130534163 -.029894337799963092  1
12457   .06100000068545164                    0  2
12458    .4509999975562007                    0  3
12459   -.6590000167489007                    0  4
12460   .04099999926985287   -.7859015611495589  5
12463   .12099999934434003   2.3521912358714934  6
12464   .48099999874831045    4.178839732149401  7
12465   .07099999859929707   1.1339482016251412  8
12466  -.44900000840425447                    0  9
12467  -.46899998933076503  -.40466707984601585 10
12470                    0                    0 11
12471    .6409999951720202    .8514896750455218 12
12472  -.12900000065565465   .43541366890007605 13
12473   -1.239000000059609   .47331693320577684 14
12474    .3710000142455039   1.5243646588527011 15
12477   .39099999517202555                    0 16
12478   -.5079999808222047  -.10542929997720135 17
12479  -.11799999512731585  -.22343245179344326 18
12480   -.2279999945312694  -.45511082398417746 19
12481    .4720000084489584   -2.940778018779746 20
12484     .572000002488493    .4683234004475248 21
12485  -.21800000406801967    .8334439141415245 22
12486   .17199999652803832   -2.098348541818269 23
12487   -.6080000046640666    .6744274318240081 24
12488   .45199999772012056    .1415167869680784 25
12491    .3019999917596605   .33616394297385477 -6
12492  .041999999433750546    .9915544650372341 -5
12493     .572000002488493   -3.283250073830859 -4
12494    .3120000120252264   -.9491470253930308 -3
12495   .06200000084937152   .33768976426772973 -2
12498   -.4879999998956941   -2.596585492063645 -1
12499  .051999999210239345   .19178917268155118  0
12500  .041999999433750546   .34001006035488796  1
12501   -.9680000189691729    .7771827766196536  2
12502   -.6080000046640666  -1.0974243742098777  3
12505   -.4579999987035954   .04743454219793581  4
12506  -1.7879999522119716   1.3928657270070266  5
12507  -1.5179999712854664    .5086320610214753  6
12508  -.11799999512731585   -2.034626822761019  7
12509                    0                    0  8
12512  -1.5660000424832066   .24298939528729838  9
12513    2.383999885991206   -.7365092835970641 10
12514    .0439999997615903                    0 11
12515    .6140000242739863   .11380786290790468 12
12516   -.7960000019520574    .2787635195888093 13
12519   .34400001354515375    .0804331917080485 14
12520   -.5260000210255411    .1555537050230116 15
12521   -.5960000138729771  -.10403268395209728 16
12522    .0539999995380569    .3851601714904467 17
12523    .0439999997615903  .046418782065053406 18
12526   -.8059999924153072   .33985984629417676 19
12527   -.2459999900311205    .5846533238807186 20
12528  -.40599998645484137    1.027287391254951 21
12529    1.484000029042365    .5146464678134081 22
12530 -.036000000312919056    .2506092878137687 23
12533   1.0140000004321248  -.31719122367815267 24
12534   .06400000117718907 -.006361355893142129 25
12535                    0                    0 26
12536   -.5360000114887908   1.2731798279660753 27
12537    .4839999992400479   -.5364178929443594 28
12540    .5149999996647336   1.3085677687013781 29
12541 .0049999998882332974  .010685685744765713 30
12542  -.13500000629573794   .25370652051436704 31
12543  -.08500000182539713   .18124375121317704 32
12544   -.8950000265613212   -3.065225152930967 33
12547  -1.2250000098720237    .2665850052673965 -6
12548    .6049999734386802   -2.531663560488393 -5
12549  -1.0449999431148216    .5168722915188722 -4
12550    .3949999948963612   .29155003275982916 -3
12551   .05499999877065509  -.26820357894854213 -2
12554  -.14499999675899877   -.6522250615984035 -1
12555    .8250000020489034    .4132320059066271  0
12556   1.1449999948963452  -1.2453641998690859  1
12557    .6250000139698342    .5933305965824383  2
12558  -.22499999497085366                    0  3
end
format %td date

I did the following code:

Code:

gen time = _n
tsset time

* Identify unique event days
levelsof cycle, local(events)

foreach time in `events' {
reg retr infl if cycle >= `time' - 2 & cycle <= `time' + 2, r
}

I do not know what went wrong with the loop?!! What makes me think it is wrong is that it seems to be picking up an incorrect number of observations per regression.

For example, if you try something like:

Code:

sum infl if cycle=30

And multiply the number of observations by 5 (as there should be a window of 5 days around each cycle day), you will see that the number of obs is different from the number of observations in the regression.
There are no missing values for the dependent or independent variables. Also, the time series dataset is daily. It has no weekends and it has a few missing days. However, I think this should not be a problem as I use the variable "time".

I would really appreciate any assistance with this matter

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3152
#2

13 May 2024, 06:46

Are you opposed to this?

Code:

* ssc install asreg capture drop _* asreg retr infl , window(time -2 3) robust

I'm a little confused. By time will you get 5 regressions (except at the edges). By cycle observations will get you various observations depending on how many times the cycle number appears (0 being the most I suppose).

Last edited by George Ford; 13 May 2024, 06:55.
Comment

George Ford

Join Date: Aug 2014
Posts: 3152

13 May 2024, 07:17

This may be what your after (regression by cycle). You'll get the same coefficient for each cycle, but it will appear for all observations (repeated). I like this because it stores the results for you automatically.

Code:

tab cycle  // to see what you have
capture drop _*
asreg retr infl , window(cycle -2 3) robust
tabstat _b_infl if !mi(_b_infl), by(cycle) stat(mean min max)
tabstat _Nobs if !mi(_b_infl), by(cycle) stat(mean min max)

This fixes your loop. You didn't need to use time but cycle. I've added some info to help you see what's going on.

Code:

levelsof cycle, local(events)
foreach c in `events' {
    di "FOR CYCLE `c' THE RANGE IS: " _col(40) "Lower = " `c' - 2 _col(60) "Upper = " `c' + 2
    reg retr infl if cycle & cycle >= `c' - 2 & cycle <= `c' + 2, r
    di `"{hline 78}"'
}

Comment

George Ford

Join Date: Aug 2014
Posts: 3152

13 May 2024, 07:34

This fixes a problem. You can compare what you get. Results are essentially identical (look at b_ratio).

Code:

g b_infl_loop = .
g Nobs = .
levelsof cycle, local(events)
foreach c in `events' {
    di "FOR CYCLE `c' THE RANGE IS: " _col(40) "Lower = " `c' - 2 _col(60) "Upper = " `c' + 2
    reg retr infl if cycle==`c' | ( cycle >= `c' - 2 & cycle <= `c' + 2), r
    replace b_infl_loop = _b[infl] if cycle==`c'
    replace Nobs = e(N) if cycle==`c'
    di `"{hline 78}"'
}

capture drop _*
asreg retr infl , window(cycle -3 2) robust
tabstat _b_infl b_infl_loop if !mi(_b_infl), by(cycle) stat(mean min max)
correl _b_infl b_infl_loop if !mi(_b_infl)
assert _b_infl == b_infl_loop
g b_ratio = _b_infl/b_infl_loop
summ b_ratio  //trivial differences
tabstat _Nobs Nobs if !mi(_b_infl), by(cycle) stat(mean min max)
correl _Nobs Nobs
assert _Nobs == Nobs

Comment

Mike Kraft

Join Date: Dec 2014

Posts: 328
#5

13 May 2024, 21:56

Dear George

Thank you very much for your help with that which I very much appreciate.

I ran the loop in #4 but it gave me an error;
FOR CYCLE -6 THE RANGE IS: Lower = -8 Upper = -4
variables out of order
r(111);

The problem is that the cycle variable has values from -6 to 33 and given that not all cycles are the same length, they can start at -4 or -3 or other values (but never before -6) and ends for example at 20 23 or other values (but never after 33). The cycles are adjacent to each other so when one cycle ends the other starts and so on.

So yes I want to estimate the regressions for each cycle number and for 5 days around it. For example, if the cycle is 1, I want to have two days before and after that day. The problem seems to happen with cycle values at the edges, for example, cycle value -6, as my code will take not find any value for -7 or -8 but then will only include the days after -6. The same problem happens for those at the other edge. To make it simple, I just want the 5 days around each cycle day regardless of the cycle number, so always 5 obs window. And yes, the regression will be for all cycle days with the same number; so one regression across the sample for cycle day 3, or 4, or 5, and so on, and hence one beta and one confidence interval. At the end I will collapse the data and create a new dataset for each cycle day along with its beta and confidence intervals.

I have also created a code below that I hope is doing what I wanted. It seems to run smoothly and produces reasonable results. However, it is still having an issue with those cycle days at the edges as it does not pick up the 5 days observations around the day...The problem seems to come from the way the window is specified as the code does not find cycles -7 or -8 or 34 and 35..And also when the cycle ends earlier it will have a similar issue.

Code:

gen tt = _n tsset tt * Identify unique event days levelsof cycle, local(event_days) * Initialize lists to store coefficients, t-statistics, and confidence intervals local coefficients "" local t_statistics "" local ci_lower "" local ci_upper "" * Create variables to store coefficients, t-statistics, and confidence intervals gen coef = . gen t_stat = . gen ci_lower = . gen ci_upper = . * Loop over each event day foreach tt in `event_days' { * Run the regression reg retr infl if cycle >= `tt' - 2 & cycle <= `tt' + 2, r * Store coefficient for the current event day replace coef = _b[infl] if cycle == `tt' * Store t-statistic for the current event day replace t_stat = _b[infl] / _se[infl] if cycle == `tt' * Calculate confidence intervals (90%) *scalar z = invnormal(1 - (0.1 / 2)) * r calculate Critical value for 95% confidence level scalar z = invnormal(1 - (0.05 / 2)) scalar ci_lower_val = _b[infl] - z * _se[infl] scalar ci_upper_val = _b[infl] + z * _se[infl] * Store confidence intervals replace ci_lower = ci_lower_val if cycle == `tt' replace ci_upper = ci_upper_val if cycle == `tt' * Append coefficient, t-statistic, and confidence intervals to the lists local coef_value = _b[infl] local t_stat_value = `coef_value' / _se[infl] local coefficients "`coefficients' `coef_value'" local t_statistics "`t_stat_value'" local ci_lower_value = ci_lower_val local ci_upper_value = ci_upper_val local ci_lower "`ci_lower_value'" local ci_upper "`ci_upper_value'" } * Display coefficients, t-statistics, and confidence intervals di "Coefficients: `coefficients'" di "T-Statistics: `t_statistics'" di "CI Lower: `ci_lower'" di "CI Upper: `ci_upper'"

I am happy to go with any code that does what I aim.

I look forward to hearing from you
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#6

14 May 2024, 08:11

I see the problem. You need to lag/forward in "tt" based on "cycle".

Try this. It creates a dummy around the cycle value in time. Recode traps it around the cycle date so it doesn't explode. You can then reg y x if keep

Code:

g tt = _n tsset tt forv k = -6/33 { capture drop keep g keep = 0 replace keep = 1 if cycle==`k' recode keep (0 = 1) if f2.keep==1 | f1.keep==1 | l1.keep==1 | l2.keep==1 ** do stuff if keep }
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#7

14 May 2024, 21:42

Thank you so much. Unfortunately, there are a few issues here:

1) I am not sure which part of my code I should integrate with #6. Can you please integrate this with my code? I tried but it did not produce results and so I am assuming I am integrating it incorrectly.

2) I am a bit confused about "keep" as it would seem to have values for -6, -5 and 31,32,33 of the cycle. How does that help here?

3) please note that some cycles will end early say at 27 or 28 and some cycles will start at say -5, or -2; but cycles can never go beyond -6 or 33. In any case, I am only interested in the 5-(calendar)day window around any day of the cycle.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#8

14 May 2024, 21:49

Perhabs for each cycle, you can create a min and max cycle number to know when it starts and ends (and the day before the minimum; also the day before the max) and use that?! There is a variable for cycle_ID.

But the issue remains on how I can loop over this variable as well as the cycle?!
Comment

George Ford

Join Date: Aug 2014
Posts: 3152

15 May 2024, 12:57

Code:

g tt = _n
tsset tt

g b_infl_loop = .

forv k = -6/33 {
    capture drop keep
    g keep = 0
    replace keep = 1 if cycle==`k'
    recode keep (0 = 1) if f2.keep==1 | f1.keep==1 | l1.keep==1 | l2.keep==1
    qui reg retr infl if keep, r
    qui replace b_infl_loop = _b[infl] if cycle==`k'
}

Comment

Mike Kraft

Join Date: Dec 2014

Posts: 328
#10

17 May 2024, 03:47

Dear George

That seems to be working. I added the condition in the regression "if cycle<=33 & keep" which seems to work reasonably well.
Also, I assume that if I want to run the regression only for the cycle day, I will just simply delete the line that has "recorded".

I hope these are fine.

Thanks
Comment

Mike Kraft

Join Date: Dec 2014
Posts: 328

#11

17 May 2024, 04:38

I want to run the regression as ivreg2 in the loop and extract the confidence intervals,

I created these outside the loop:

Code:

gen ci_lower = .
gen ci_upper = .

And I added these lines inside the loop

Code:

    xi: ivreg2 retr infl i.day*i.month if  cycle<=33 & keep, robust bw(auto) small

Also

Code:

* Calculate confidence intervals (95%)
    *scalar z = invnormal(1 - (0.05 / 2)) 
   
    scalar ci_lower_val = _b[infl] - z * _se[infl]
    scalar ci_upper_val = _b[infl] + z * _se[infl]

    replace ci_lower = ci_lower_val if cycle == `k'
    replace ci_upper = ci_upper_val if cycle == `k'

Or alternatively instead of the last part:

Code:

* Calculate confidence intervals (95%)
   
    scalar ci_lower_val = _b[infl] - 1.96 * _se[infl]
    scalar ci_upper_val = _b[infl] + 1.96 * _se[infl]

    replace ci_lower = ci_lower_val if cycle == `k'
    replace ci_upper = ci_upper_val if cycle == `k'

The problem is that the confidence intervals are not the same as the ones in the output of the table.

Is there any way to just extract them from the table to be consistent?

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10194
#12

17 May 2024, 06:40

ivreg2 is from SSC (FAQ Advice #12). Estimate the model with ivreghdfe from https://github.com/sergiocorreia/ivreghdfe and extract these from r(table). Also, use factor variable notation instead of the xi: prefix.

Code:

ivreghdfe ... mat list r(table)
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#13

17 May 2024, 07:26

The model is estimated in other parts of the project with ivreg2 for newey and west with automatic bandwidth. Therefore, I need to stick to that.

Is there any way to get the confidence intervals from the ivreg2 in the loop?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#14

17 May 2024, 07:53

There most likely is, but I won't be looking into that. For what it's worth, the documentation of ivreghdfe states

ivreghdfe is essentially ivreg2 with an additional absorb() option from reghdfe.

Essentially, whatever you can do with ivreg2, you can also do with ivreghdfe. Take it or leave it.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#15

17 May 2024, 22:40

Perhabs it will be easier if someone can adjust the code in #9 to be able to have a variable for the lower confidence interval and upper confidence interval, as I am a bit confused. If ivreg2 and ivreghdfe are the same, what matters to me is that I have the same fixed effects and the same conditions on the bandwidth

Code:

xi: ivreg2 retr infl i.day*i.month if cycle<=33 & keep, robust bw(auto) small

when extracting the lower and upper bounds.

I look forward to further assistance.
Comment

Announcement

A loop over multiple events: what went wrong?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment