Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Daily Percent Change with Inconsistent Gaps in Data

    Hello again. I'd like to get day-to-day percent change for each observation in my panel. However, by data does not "skip over" days, and instead holds empty values for weekends, holidays, etc. I played around with the business calendar function, but I don't believe it can help with my data. I'd like to know if it is possible to make STATA simply use the last recorded value as the "old" value in the percent change formula, i.e., (new value - old value) / old value, all multiplied by 100. Below is a sample of the data:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(day abc xro rkt mgs tty)
    19509 29.860113      . 1.8448875      .  29.91344
    19510 30.020096  1.644  1.848199 6.3942  30.02937
    19511 30.179895 1.6503 1.8629556 6.4046  29.98533
    19512  30.23022      .  1.875309 6.4124  29.99614
    19513  30.35522 1.6421  1.880363      . 29.983854
    19514         .      .         .      .         .
    19515         .      .         .      .         .
    19516 30.459717 1.6385 1.8915282      . 30.059195
    19517 30.410173   1.64 1.8727467 6.4417  29.91063
    19518  30.51963 1.6341 1.8819928 6.4096 29.801025
    19519  30.65406 1.6361  1.888474 6.4244 29.826956
    19520 30.589745 1.6274 1.8838612 6.4232 29.728506
    end
    format %tdNN/DD/CCYY day
    I am also unsure whether to use xtset or tsset for my data. So far, I have tried tsset, and used the following code:
    Code:
    gen abc_pc= 100*D.abc/L.abc
    Which of course works, but it will not get the percent change (if there is any) from Sunday to Monday, for instance. Example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(day abc xro rkt mgs tty abc_pc xro_pc)
    19509 29.860113      . 1.8448875      .  29.91344          .         .
    19510 30.020096  1.644  1.848199 6.3942  30.02937   .5357739         .
    19511 30.179895 1.6503 1.8629556 6.4046  29.98533   .5323087    .38321
    19512  30.23022      .  1.875309 6.4124  29.99614   .1667577         .
    19513  30.35522 1.6421  1.880363      . 29.983854   .4134872         .
    19514         .      .         .      .         .          .         .
    19515         .      .         .      .         .          .         .
    19516 30.459717 1.6385 1.8915282      . 30.059195          .         .
    19517 30.410173   1.64 1.8727467 6.4417  29.91063 -.16265213 .09154779
    19518  30.51963 1.6341 1.8819928 6.4096 29.801025   .3599355 -.3597577
    19519  30.65406 1.6361  1.888474 6.4244 29.826956   .4404766 .12239725
    19520 30.589745 1.6274 1.8838612 6.4232 29.728506 -.20981787 -.5317531
    end
    format %tdNN/DD/CCYY day
    I'm wondering if it is possible to write something not dependent on lags, but instead will just grab the last available value and use it in its calculation for value-to-value percent change. I would also like to create an entirely new table where the values are just the daily percent changes.
    I have thought about removing all missing values, but I am afraid this will cost me some actual data, since the gaps are not necessarily consistent across variables. I could also drop the variables in which these inconsistencies occur, but not sure if this is prudent. Thanks again!

  • #2
    Here's how to do it for variable abc. I don't think tampering with the way lags work, or resorting to a business calendar will work. The trick is to just carry forward the last available value wherever there is a missing.

    Code:
    tsset day
    gen prior_abc = L1.abc
    replace prior_abc = prior_abc[_n-1] if missing(prior_abc)
    gen abc_pc = 100*(abc-prior_abc)/prior_abc
    You can replicate this for the other variables, or you can write a -foreach- loop iterating over them if there are enough of them to warrant the effort.

    That is how you can do it. I will leave to you the substantive question of whether it is appropriate to calculate a "day-to-day" percent change in this manner when the time intervals are, in some cases, not actually single days.

    Comment


    • #3
      Sorry for the late response. Thanks a lot for this! The best code always ends up being simpler than I expect it to be. I won't need the "prior" variables. Is the below code the most efficient way to drop these new variables, or is there a better alternative?

      Code:
      ds abc-tty
      local countries `r(varlist)'
      
       foreach c of local countries {
         gen `c'_prior = L1.`c'
         replace `c'_prior = `c'_prior[_n-1] if missing(`c'_prior)
         gen `c'_pc_1 = 100*(`c'-`c'_prior)/`c'_prior
         drop `c'_prior
      }
      Last edited by Andrew Bernal; 14 Feb 2022, 15:03.

      Comment


      • #4
        I think that is fine. The other approach would be to remove the -drop `c'_prior- command and then, after the loop, have -drop *_prior-. Theoretically that would be more compute-time efficient, but even in a data set with a few thousand variables, I doubt the difference would be noticeable. And in a data set with so many variables that the difference is noticeable, that would be traded off against having an intermediate dataset with some huge number of variables, and that might bring inefficiencies of a similar, or possibly larger scale. Plus, I think adhering to the concept that the `c'_prior is a temporary variable for use only inside that loop, it makes sense to -drop- it inside the loop precisely when it has outlived its usefulness. In the end, I think what you have is fine, and is how I would do it.

        Comment

        Working...
        X