Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clearing up False Weekend Observations from Certain Variables

    Hey all. Thanks for your help so far. I've noticed something in my data that will result in misleading results for certain variables. Basically, for most variables (you can think of them as stock tickers), there are missing observations on weekends and holidays, which works perfectly for me. For a couple, however, those weekend/holiday slots are filled in with the value of the next observation (i.e. Saturday and Sunday's values are equivalent to Monday's value). I'll provide a small example below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(day abc xro rkt mgs tty)
    19509 29.860113      . 1.8448875      .  29.91344
    19510 30.020096  1.644  1.848199 6.3942  30.02937
    19511 30.179895 1.6503 1.8629556 6.4046  29.98533
    19512  30.23022      .  1.875309 6.4124  29.99614
    19513  30.35522 1.6421  1.880363      . 29.983854
    19514         .      . 1.8915282      . 30.059195
    19515         .      . 1.8915282      . 30.059195
    19516 30.459717 1.6385 1.8915282      . 30.059195
    19517 30.410173   1.64 1.8727467 6.4417  29.91063
    19518  30.51963 1.6341 1.8819928 6.4096 29.801025
    19519  30.65406 1.6361  1.888474 6.4244 29.826956
    19520 30.589745 1.6274 1.8838612 6.4232 29.728506
    end
    format %tdNN/DD/CCYY day
    I have 80+ variables with daily data ranging over a quite a few years. Would business calendars help me here? I'm not sure how to proceed with this clean-up. Thanks!

    EDIT: Adding some clarification. RKT and TTY are examples of false weekend observations. ABC is ideal in terms of missing values over the weekend. If there's some way to have a "model" variable such as ABC and apply it to other variables, that could fix the issue.
    Last edited by Andrew Bernal; 16 Mar 2022, 08:58.

  • #2
    I'm very confused. Your example data does not at all behave in the way you describe. You have weekend dates in observations 2 and 3, as well as observations 9 and 10. While the variables RKT and TTY do show non-missing values in those observations, those values are not the values of the following Monday, nor are they those of the preceding Friday. In that regard, I don't see anything that distinguishes them from the "ideal" variable ABC.

    There are instances where the values for two consecutive dates are equal to the value in the subsequent date for RKT and TTY. These occur in observations 6 and 7, which replicate observation 8. But these are not weekend dates. They occur on a Wednesday and Thursday, copying the subsequent Friday.

    Do you want to simply replace with missing values any pair of consecutive observations that agree with each other and also with the day immediately following, disregarding day of the week?

    Comment


    • #3
      I am terribly sorry! I should've double-checked my dates. The below example should fit in to what I said in #1.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(day abc xro rkt mgs tty)
      19512 29.860113      . 1.8448875      .  29.91344
      19513 30.020096  1.644  1.848199 6.3942  30.02937
      19514 30.179895 1.6503 1.8629556 6.4046  29.98533
      19515  30.23022      .  1.875309 6.4124  29.99614
      19516  30.35522 1.6421  1.880363      . 29.983854
      19517         .      . 1.8915282      . 30.059195
      19518         .      . 1.8915282      . 30.059195
      19519 30.459717 1.6385 1.8915282      . 30.059195
      19520 30.410173   1.64 1.8727467 6.4417  29.91063
      19521  30.51963 1.6341 1.8819928 6.4096 29.801025
      19522  30.65406 1.6361  1.888474 6.4244 29.826956
      19523 30.589745 1.6274 1.8838612 6.4232 29.728506
      end
      format %tdNN/DD/CCYY day
      ----------
      Do you want to simply replace with missing values any pair of consecutive observations that agree with each other and also with the day immediately following, disregarding day of the week?
      This is something I thought of, but something like that could theoretically delete "true" observations if the price stayed static for 3 days.

      Comment


      • #4
        Code:
        tsset day
        gen day_of_week = dow(day)
        gen byte suspicious = 0
        foreach v of varlist abc-tty {
            replace suspicious = (day_of_week == 6 & `v' == F1.`v' & `v' == F2.`v') ///
                | (day_of_week == 0 & `v' == L1.`v' & `v' == F1.`v')
            replace `v' = . if suspicious
        }
        You can -drop- the variables day_of_week and suspicious once this is done.

        Comment


        • #5
          Very nice! I believe this will drop weekends, but how would you recommend I do this for holiday observations? For example, if TTY has a value on New Year's Day or Christmas Day, but it is simply a copy of the following value?

          Comment


          • #6
            This expands the code to cover Christmas and New Year's Day:
            Code:
            tsset day
            gen day_of_week = dow(day)
            gen byte suspicious = 0
            foreach v of varlist abc-tty {
                replace suspicious = (day_of_week == 6 & `v' == F1.`v' & `v' == F2.`v') ///
                    | (day_of_week == 0 & `v' == L1.`v' & `v' == F1.`v') ///
                    | (month(day) == 12 & day(day) == 25 & `v' == F1.`v') ///
                    | (month(day) == 1 & day(day) == 1 & `v' == F1.`v')
                replace `v' = . if suspicious
            }
            Holidays that do not occur on the same date each year would require a different, and much more complicated approach.

            Comment


            • #7
              This works perfectly. The only gaps are on these holidays, so luckily I won't need the complicated approach. Thanks again!

              Comment

              Working...
              X