Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calendar date problem

    Hi,

    I have panel data of stocks that do not necessarily trade every day. When the stock does not trade, the is no record for that day (as opposed to missing data). I would like to identify and remove all stocks that do no trade on a regular basis. I have gotten a official trading calendar time series data.

    I would like to use the official trading calendar time series data to create missing information in my stock data, for each stock.

    Here is a sample of the stock data:
    Code:
    trading_days
    01/02/2016
    01/03/2016
    01/04/2014
    01/04/2015
    01/04/2016
    01/05/2014
    01/05/2015
    01/06/2015
    01/06/2016
    01/07/2014
    01/07/2015
    01/07/2016
    01/08/2014
    01/08/2016
    01/09/2014
    01/09/2015
    01/09/2016
    01/10/2014
    01/10/2015
    01/11/2016
    01/12/2014
    01/12/2015
    01/12/2016
    02/01/2014
    02/01/2015
    02/02/2015
    02/02/2016
    02/03/2015
    02/03/2016
    this my stock data:
    Code:
    stockidentifier    date    date_ymd    hightradefortheday    lowtradefortheday    lasttradefortheday
    4DS.ASX    24/02/2016    20160224    .034    .03    .03
    4DS.ASX    25/02/2016    20160225    .03    .029    .029
    4DS.ASX    26/02/2016    20160226    .03    .029    .029
    4DS.ASX    29/02/2016    20160229    .029    .029    .029
    4DS.ASX    01/03/2016    20160301    .03    .029    .03
    4DS.ASX    02/03/2016    20160302    .033    .032    .032
    88EO.ASX    02/07/2015    20150702    .005    .004    .005
    88EO.ASX    03/07/2015    20150703    .005    .005    .005
    88EO.ASX    06/07/2015    20150706    .006    .005    .006
    88EO.ASX    07/07/2015    20150707    .005    .005    .005
    88EO.ASX    08/07/2015    20150708    .004    .004    .004
    88EO.ASX    13/07/2015    20150713    .005    .004    .005
    88EO.ASX    15/07/2015    20150715    .004    .004    .004
    88EO.ASX    17/07/2015    20150717    .005    .004    .005
    88EO.ASX    22/07/2015    20150722    .004    .004    .004

    Can someone help me with that? Thanks!

  • #2
    Dear Francois,

    is this the same question as this one?

    If yes: Double or cross posting will not increase Statalisters' capability nor desire to answer your question.

    Please read the FAQ (also linked at the top of each page here) on how to post data examples (using -dataex-). I assume you did not receive an answer (yet) because most Statalist users do not fully understand your problem. At least this is the case for me: It remains unclear what the target data format you like to achieve looks like.

    Can you provide an example on how the destination data should look like? I assume the command you're looking for is -joinby-, but without understanding what you want to achieve it is hard to tell.

    Regards
    Bela

    Comment


    • #3
      Hi Bela,

      Thanks a lot for your response, I truly appreciate it. I'm aware of dataex but I do not see how I can reproduce my setup using it in that particular case. I'm happy to provide more details, just let me know what you do not understand.
      Here is an example of what I'd like to do. If you look at the the following observations, there is gap between 08/07/2015 and 13/07/2015. The stock did not trade on the financial markets on Thursday 09/07/2015 and Friday 10/07/2015 (probably an illiquid stock) and of course saturday and Sunday but that is normal because financial markets are closed.
      Code:
       88EO.ASX    07/07/2015    20150707    .005    .005    .005 88EO.ASX    08/07/2015    20150708    .004    .004    .004 88EO.ASX    13/07/2015    20150713    .005    .004    .005
      I would like to add Thursday 09/07/2015 and Friday 10/07/2015 in the dataset for each stocks as missing data:
      Code:
       88EO.ASX    07/07/2015    20150707    .005    .005    .005 88EO.ASX    08/07/2015    20150708    .004    .004    .004 88EO.ASX 09/07/2015 20150709 . . . 88EO.ASX 10/07/2015 20150710 . . . 88EO.ASX    13/07/2015    20150713    .005    .004    .005
      I have the time series dataset that indicates days where financial markets are open (i.e. hursday 09/07/2015 and Friday 10/07/2015 would be included in this case). How do I program that in stata?

      Does it make more sense now?

      Comment


      • #4
        Originally posted by Francois Durant View Post
        Hi Bela,

        Thanks a lot for your response, I truly appreciate it. I'm aware of dataex but I do not see how I can reproduce my setup using it in that particular case. I'm happy to provide more details, just let me know what you do not understand.
        So, why not use -dataex- even upon explicit request? It is really cumbersome to manually create a dataset from the lists of data you provided. Eventually, I did so anyways, just to write some code to illustrate my guess on what you want to achieve. I want to re-emphasize that not providing an easy-to-use data example is the reason for receiving few (or none) answers to your question.

        Originally posted by Francois Durant View Post
        Does it make more sense now?
        No, it doesn't. First, some line breaks would have been helpful; second, what you write does not match the example data you provided: Your trading date example data does not include 09/07/2015 nor 10/07/2015.

        I wrote down some code that may or may not get you started, depending on my guess on what you try to achieve is correct or not:
        Code:
        // create easy to reproduce example data: trading dates
        clear
        input str10(trading_days)
        "01/02/2016"
        "01/03/2016"
        "01/04/2014"
        "01/04/2015"
        "01/04/2016"
        "01/05/2014"
        "01/05/2015"
        "01/06/2015"
        "01/06/2016"
        "01/07/2014"
        "01/07/2015"
        "01/07/2016"
        "01/08/2014"
        "01/08/2016"
        "01/09/2014"
        "01/09/2015"
        "01/09/2016"
        "01/10/2014"
        "01/10/2015"
        "01/11/2016"
        "01/12/2014"
        "01/12/2015"
        "01/12/2016"
        "02/01/2014"
        "02/01/2015"
        "02/02/2015"
        "02/02/2016"
        "02/03/2015"
        "02/03/2016"
        end
        
        // create a true date variable
        generate truedate=date(trading_days,"DMY")
        format truedate %td
        sort truedate
        // temporarily save
        tempfile dates
        save `"`dates'"'
        
        // create easy to reproduce example data: stock names and values
        clear
        input str25(stockidentifier) str10(date) date_ymd hightradefortheday lowtradefortheday lasttradefortheday
        "4DS.ASX" "24/02/2016" 20160224 .034 .03 .03
        "4DS.ASX" "25/02/2016" 20160225 .03 .029 .029
        "4DS.ASX" "26/02/2016" 20160226 .03 .029 .029
        "4DS.ASX" "29/02/2016" 20160229 .029 .029 .029
        "4DS.ASX" "01/03/2016" 20160301 .03 .029 .03
        "4DS.ASX" "02/03/2016" 20160302 .033 .032 .032
        "88EO.ASX" "02/07/2015" 20150702 .005 .004 .005
        "88EO.ASX" "03/07/2015" 20150703 .005 .005 .005
        "88EO.ASX" "06/07/2015" 20150706 .006 .005 .006
        "88EO.ASX" "07/07/2015" 20150707 .005 .005 .005
        "88EO.ASX" "08/07/2015" 20150708 .004 .004 .004
        "88EO.ASX" "13/07/2015" 20150713 .005 .004 .005
        "88EO.ASX" "15/07/2015" 20150715 .004 .004 .004
        "88EO.ASX" "17/07/2015" 20150717 .005 .004 .005
        "88EO.ASX" "22/07/2015" 20150722 .004 .004 .004
        end
        format date_ymd  %10.0g
        // create a true date variable
        generate truedate=date(date,"DMY")
        format truedate %td
        sort truedate
        // temporarily save
        tempfile stocks
        save `"`stocks'"'
        
        // save all names of stocks to local macro
        quietly : levelsof stockidentifier , local(stocknames)
        local stockcount : word count `stocknames'
        
        // duplicate trading dates for each stock
        use `"`dates'"' , clear
        generate stockidentifier=""
        local firstobs=1
        local lastobs=_N
        local origN=_N
        expand `stockcount'
        foreach stockname of local stocknames {
            replace stockidentifier=`"`stockname'"' in `firstobs'/`lastobs'
            local firstobs=`firstobs'+`origN'
            local lastobs=`lastobs'+`origN'
        }
        keep truedate stockidentifier
        save `"`dates'"' , replace
        
        // join trading dates and stocks
        use `"`stocks'"' , clear
        joinby stockidentifier truedate using `"`dates'"' , unmatched(both) _merge(source)
        sort stockidentifier truedate
        
        // list data
        list , sepby(stockidentifier)
        Regards
        Bela

        PS: You should have a look at help datetime. Handling date information as strings is not the way of choice, as it won't let you exploit Stata's massive features for time series analysis and data management.
        Last edited by Daniel Bela; 06 Sep 2017, 03:53. Reason: updated example code to use Stata date variables for sorting and joining data

        Comment


        • #5
          Thank you so much, that's exactly what I needed!

          However, I'm now getting the following problem (too many stocks):
          Code:
          . quietly : levelsof stockidentifier , local(stocknames)
          macro length exceeded
          Any idea how to resolve the problem?

          Comment


          • #6
            Francois, first learn how to use dataex. Then learn about Stata's business calendars (help bcal). This type of daily data is best handled using a business calendar. You can easily create one using your trading days dataset. Next, convert your stock data dates to this business calendar dates. You can then declare your data as panel and use tsfill to get what you wanted.

            Here's a quick example:

            Code:
            * simulate market dates for 2015
            clear
            set obs 365
            gen market_date = mdy(1,1,2015) + _n - 1
            format %tdDay_Mon_dd,_CCYY market_date
            drop if inlist(dow(market_date),0,6)
            list in 1/10
            
            * create a business calendar
            bcal create trade, from(market_date) generate(trade_day) replace
            
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str10(var1 var2) long var3 float(var4 var5 var6)
            "88EO.ASX"   "07/07/2015" 20150707 .005 .005 .005
            "88EO.ASX"   "08/07/2015" 20150708 .004 .004 .004
            "88EO.ASX"   "13/07/2015" 20150713 .005 .004 .005
            "UseDataex!" "02/01/2015" 20150707 .005 .005 .005
            "UseDataex!" "06/01/2015" 20150708 .004 .004 .004
            end
            
            * convert string date to Stata SIF date
            gen ndate = daily(var2,"DMY")
            format %td ndate
            
            * convert to business calendar date
            gen tdate = bofd("trade", ndate)
            format tdate %tbtrade
            
            * declare panel data
            egen long panelvar = group(var1)
            tsset panelvar tdate
            
            * fill in missing trading days
            tsfill
            
            list
            The output from the first list statement that shows trading days (no Sat. or Sun.):
            Code:
            . list in 1/10
            
                 +------------------+
                 |      market_date |
                 |------------------|
              1. |  Thu Jan 1, 2015 |
              2. |  Fri Jan 2, 2015 |
              3. |  Mon Jan 5, 2015 |
              4. |  Tue Jan 6, 2015 |
              5. |  Wed Jan 7, 2015 |
                 |------------------|
              6. |  Thu Jan 8, 2015 |
              7. |  Fri Jan 9, 2015 |
              8. | Mon Jan 12, 2015 |
              9. | Tue Jan 13, 2015 |
             10. | Wed Jan 14, 2015 |
                 +------------------+
            and the final list that shows the filled in observations:
            Code:
            . list
            
                 +--------------------------------------------------------------------------------------------+
                 |       var1         var2       var3   var4   var5   var6       ndate       tdate   panelvar |
                 |--------------------------------------------------------------------------------------------|
              1. |   88EO.ASX   07/07/2015   20150707   .005   .005   .005   07jul2015   07jul2015          1 |
              2. |   88EO.ASX   08/07/2015   20150708   .004   .004   .004   08jul2015   08jul2015          1 |
              3. |                                  .      .      .      .           .   09jul2015          1 |
              4. |                                  .      .      .      .           .   10jul2015          1 |
              5. |   88EO.ASX   13/07/2015   20150713   .005   .004   .005   13jul2015   13jul2015          1 |
                 |--------------------------------------------------------------------------------------------|
              6. | UseDataex!   02/01/2015   20150707   .005   .005   .005   02jan2015   02jan2015          2 |
              7. |                                  .      .      .      .           .   05jan2015          2 |
              8. | UseDataex!   06/01/2015   20150708   .004   .004   .004   06jan2015   06jan2015          2 |
                 +--------------------------------------------------------------------------------------------+
            As you can see, the "UseDataex!" stock is missing on Jan. 5 but non-trading days are not included (Jan. 3 and 4).

            Comment


            • #7
              Upon a second read-through, I totally agree with Robert Picard's advice.

              It is much more straightforward to use your dates as a business calendar and use Stata's features for time series data to do the job.

              Comment


              • #8
                @Robert Picard Hi Robert, can I ask you a question about this calendar date issue? When I try to create a business calendar from a normal stock date variable, it shows an error as such:

                Code:
                . bcal create trade, from(date) generate(trade_date) replace
                gap in business calendar too large; maxgap() currently set to 10
                r(198);
                Here are my first 100 observations:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double permno long date double ret
                10000 9502                    .
                10000 9503                    .
                10000 9504 -.024390242993831635
                10000 9505                    0
                10000 9506                    0
                10000 9509   .05000000074505806
                10000 9510    .0476190485060215
                10000 9511   .04545454680919647
                10000 9512  .043478261679410934
                10000 9513                    0
                10000 9516                    0
                10000 9517                    0
                10000 9518                    0
                10000 9519                  .25
                10000 9520   .11666666716337204
                10000 9523   .05970149114727974
                10000 9524                    0
                10000 9525 -.028169013559818268
                10000 9526  .028985507786273956
                10000 9527 -.014084506779909134
                10000 9530                    0
                10000 9531                    0
                10000 9532                    0
                10000 9533  -.04285714402794838
                10000 9534   .04477611929178238
                10000 9537 -.014285714365541935
                10000 9538                    0
                10000 9539 -.021739130839705467
                10000 9540                    0
                10000 9541  .014814814552664757
                10000 9545   -.0656934306025505
                10000 9546             -.015625
                10000 9547   -.0634920671582222
                10000 9548                    0
                10000 9551 -.016949152573943138
                10000 9552 -.017241379246115685
                10000 9553  -.08771929889917374
                10000 9554                    0
                10000 9555                    0
                10000 9558                    0
                10000 9559  .057692307978868484
                10000 9560   .00909090880304575
                10000 9561   .11711711436510086
                10000 9562   .14516128599643707
                10000 9565 -.056338027119636536
                10000 9566   .02985074557363987
                10000 9567                    0
                10000 9568  .028985507786273956
                10000 9569                    0
                10000 9572                    0
                10000 9573                    0
                10000 9574                    0
                10000 9575                    0
                10000 9576                    0
                10000 9579                    0
                10000 9580                    0
                10000 9581                    0
                10000 9582 -.007042253389954567
                10000 9586  .007092198356986046
                10000 9587 -.028169013559818268
                10000 9588                    0
                10000 9589                    0
                10000 9590                    0
                10000 9593 -.014492753893136978
                10000 9594 -.029411764815449715
                10000 9595  .022727273404598236
                10000 9596 -.014814814552664757
                10000 9597 -.007518797181546688
                10000 9600 -.007575757801532745
                10000 9601                    0
                10000 9602                    0
                10000 9603 -.007633587811142206
                10000 9604                    0
                10000 9607                    0
                10000 9608                    0
                10000 9609                    0
                10000 9610                    0
                10000 9611                    0
                10000 9614  .007692307699471712
                10000 9615 -.007633587811142206
                10000 9616 -.015384615398943424
                10000 9617                    0
                10000 9618            -.0078125
                10000 9621 -.019685039296746254
                10000 9622   -.0200803205370903
                10000 9623 -.016393441706895828
                10000 9624                    0
                10000 9625  .008333333767950535
                10000 9628   .11570248007774353
                10000 9629 -.029629629105329514
                10000 9630  .015267175622284412
                10000 9631 -.030075188726186752
                10000 9632  -.01550387591123581
                10000 9635 -.023622047156095505
                10000 9636                    0
                10000 9637  -.05645161122083664
                10000 9638  -.05982905998826027
                10000 9639   .00909090880304575
                10000 9643                    0
                10000 9644 -.036036036908626556
                end
                format %d date

                Do you possibly know how to fix this gap issue and make it work? Many thanks for your help in advance! Hope to hear from you!
                Last edited by Jae Li; 20 Dec 2017, 14:29.

                Comment


                • #9
                  there is a maxgap(#) option to the bcal command; see
                  Code:
                  help bcal
                  and look for the maxgap(#) option

                  added: note that I have not really paid much attention to the rest of this thread and this might not be the major issue

                  Comment


                  • #10
                    Thanks for the data example but bcal command works with the data example you posted in #8.

                    I don't personally use business calendar but as I understand, they are used to bridge over business holidays. When you build a business calendar form a dataset, it will look at all the days with data and consecrate dates with no data as business holidays. This allows you to do better than the neanderthal approach (ignore all gaps in the data) and have missing data for a company on days where other companies have data.

                    I'm am not aware of any instance where there has been a real life business holiday that spanned more than 10 days so this means that you are feeding bcal with data that have a real gap in it. You can choose to ignore the issue by using a sufficiently large argument for maxgap() but it would be a logical error. Say you have daily data but you are missing data for all of 2011. You would then be considering that, for the purpose of your analysis, 1/1/2012 occurred the next business day after 12/31/2010.

                    Here's code that will let you observe the gaps in the data, first within companies and then overall:

                    Code:
                    * calculate gaps within companies
                    xtset permno date
                    by permno: gen gap = date - date[_n-1] - 1
                    tab gap
                    
                    * overall date gaps
                    keep date
                    bysort date: keep if _n == 1
                    gen gap = date - date[_n-1] - 1
                    tab gap

                    Comment

                    Working...
                    X