Daily Percent Change with Inconsistent Gaps in Data

Andrew Bernal

Join Date: Feb 2022
Posts: 31

Daily Percent Change with Inconsistent Gaps in Data

11 Feb 2022, 14:35

Hello again. I'd like to get day-to-day percent change for each observation in my panel. However, by data does not "skip over" days, and instead holds empty values for weekends, holidays, etc. I played around with the business calendar function, but I don't believe it can help with my data. I'd like to know if it is possible to make STATA simply use the last recorded value as the "old" value in the percent change formula, i.e., (new value - old value) / old value, all multiplied by 100. Below is a sample of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(day abc xro rkt mgs tty)
19509 29.860113      . 1.8448875      .  29.91344
19510 30.020096  1.644  1.848199 6.3942  30.02937
19511 30.179895 1.6503 1.8629556 6.4046  29.98533
19512  30.23022      .  1.875309 6.4124  29.99614
19513  30.35522 1.6421  1.880363      . 29.983854
19514         .      .         .      .         .
19515         .      .         .      .         .
19516 30.459717 1.6385 1.8915282      . 30.059195
19517 30.410173   1.64 1.8727467 6.4417  29.91063
19518  30.51963 1.6341 1.8819928 6.4096 29.801025
19519  30.65406 1.6361  1.888474 6.4244 29.826956
19520 30.589745 1.6274 1.8838612 6.4232 29.728506
end
format %tdNN/DD/CCYY day

I am also unsure whether to use xtset or tsset for my data. So far, I have tried tsset, and used the following code:

Code:

gen abc_pc= 100*D.abc/L.abc

Which of course works, but it will not get the percent change (if there is any) from Sunday to Monday, for instance. Example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(day abc xro rkt mgs tty abc_pc xro_pc)
19509 29.860113      . 1.8448875      .  29.91344          .         .
19510 30.020096  1.644  1.848199 6.3942  30.02937   .5357739         .
19511 30.179895 1.6503 1.8629556 6.4046  29.98533   .5323087    .38321
19512  30.23022      .  1.875309 6.4124  29.99614   .1667577         .
19513  30.35522 1.6421  1.880363      . 29.983854   .4134872         .
19514         .      .         .      .         .          .         .
19515         .      .         .      .         .          .         .
19516 30.459717 1.6385 1.8915282      . 30.059195          .         .
19517 30.410173   1.64 1.8727467 6.4417  29.91063 -.16265213 .09154779
19518  30.51963 1.6341 1.8819928 6.4096 29.801025   .3599355 -.3597577
19519  30.65406 1.6361  1.888474 6.4244 29.826956   .4404766 .12239725
19520 30.589745 1.6274 1.8838612 6.4232 29.728506 -.20981787 -.5317531
end
format %tdNN/DD/CCYY day

I'm wondering if it is possible to write something not dependent on lags, but instead will just grab the last available value and use it in its calculation for value-to-value percent change. I would also like to create an entirely new table where the values are just the daily percent changes.
I have thought about removing all missing values, but I am afraid this will cost me some actual data, since the gaps are not necessarily consistent across variables. I could also drop the variables in which these inconsistencies occur, but not sure if this is prudent. Thanks again!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

11 Feb 2022, 14:45

Here's how to do it for variable abc. I don't think tampering with the way lags work, or resorting to a business calendar will work. The trick is to just carry forward the last available value wherever there is a missing.

Code:

tsset day gen prior_abc = L1.abc replace prior_abc = prior_abc[_n-1] if missing(prior_abc) gen abc_pc = 100*(abc-prior_abc)/prior_abc

You can replicate this for the other variables, or you can write a -foreach- loop iterating over them if there are enough of them to warrant the effort.

That is how you can do it. I will leave to you the substantive question of whether it is appropriate to calculate a "day-to-day" percent change in this manner when the time intervals are, in some cases, not actually single days.
Comment
Andrew Bernal

Join Date: Feb 2022

Posts: 31
#3

14 Feb 2022, 13:56

Sorry for the late response. Thanks a lot for this! The best code always ends up being simpler than I expect it to be. I won't need the "prior" variables. Is the below code the most efficient way to drop these new variables, or is there a better alternative?

Code:

ds abc-tty local countries `r(varlist)' foreach c of local countries { gen `c'_prior = L1.`c' replace `c'_prior = `c'_prior[_n-1] if missing(`c'_prior) gen `c'_pc_1 = 100*(`c'-`c'_prior)/`c'_prior drop `c'_prior }

Last edited by Andrew Bernal; 14 Feb 2022, 14:03.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

14 Feb 2022, 15:02

I think that is fine. The other approach would be to remove the -drop `c'_prior- command and then, after the loop, have -drop *_prior-. Theoretically that would be more compute-time efficient, but even in a data set with a few thousand variables, I doubt the difference would be noticeable. And in a data set with so many variables that the difference is noticeable, that would be traded off against having an intermediate dataset with some huge number of variables, and that might bring inefficiencies of a similar, or possibly larger scale. Plus, I think adhering to the concept that the `c'_prior is a temporary variable for use only inside that loop, it makes sense to -drop- it inside the loop precisely when it has outlived its usefulness. In the end, I think what you have is fine, and is how I would do it.
Comment

Announcement

Daily Percent Change with Inconsistent Gaps in Data

Comment

Comment

Comment