Merging with dates, ignoring missing values.

Victor Schaper

Join Date: Jun 2021

Posts: 8
#1

Merging with dates, ignoring missing values.

27 Jun 2021, 08:01

Hello,

I am trying to merge two datasets, one with stock price, so per company an observation per business day. The other dataset has the same amount of rows, but per company only a couple observations per year (the date is a dividend announcement).
When merging, the dates should align, while sticking to the correct company, but I cannot seem to figure this out. I want to do an event study on these announcements, but in order to work the dates should be on the same row.
Is there anything I might be missing, since I do not believe this should be so difficult.

Kind regards,

Victor
Tags: dates, merging
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

27 Jun 2021, 08:53

Welcome to Statalist.

Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data from both datasets. Show the code you have tried. Show us what Stata told you. Tell us what precisely is wrong. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

In particular I cannot understand how the two datasets could have "the same number of rows" when the primary dataset with stock prices has one observation per company per day, while the secondary dataset with dividend announcements has only a couple observations per company per year.

But in any event, to give advice tailored to your actual data it will be necessary to see samples of your datasets. Even the best descriptions of data are no substitute for an actual example of the data. There are many ways your data might be organized that are consistent with your description, and each would require a somewhat different approach. Be sure to use the dataex command to do this. Run help dataex and read the simple instructions for using it. dataex will make it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Victor Schaper

Join Date: Jun 2021
Posts: 8

27 Jun 2021, 13:09

Thanks for your reply,

Originally, it was one dataset, that's why the number of rows are equal. When the dates there did not line up, I thought it would be easier to merge using dates, however that has proven to be quite a hassle.
This is the part with the continuous stock data:

Code:

clear
input str6 GlobalCompanyKeyDividends str3 IssueIDDividends int date str28 CompanyName long TradingVolumeDaily double PriceCloseDaily
"001234" "01" 18266 "ATRION CORP"  1189 155.85
"001234" "01" 18267 "ATRION CORP"  2014 155.72
"001234" "01" 18268 "ATRION CORP"  5480 158.99
"001234" "01" 18269 "ATRION CORP" 10420  164.5
"001234" "01" 18270 "ATRION CORP"  2808 163.54
"001234" "01" 18273 "ATRION CORP"  8054    160
"001234" "01" 18274 "ATRION CORP" 10914  157.5
"001234" "01" 18275 "ATRION CORP"  9911 162.21
"001234" "01" 18276 "ATRION CORP" 12299  158.5
"001234" "01" 18277 "ATRION CORP"  4261 157.93
"001234" "01" 18281 "ATRION CORP"  3105 159.87
"001234" "01" 18282 "ATRION CORP"  6590 155.63
"001234" "01" 18283 "ATRION CORP"  4499 152.29
"001234" "01" 18284 "ATRION CORP"  3949 150.95
"001234" "01" 18287 "ATRION CORP"  6775 146.71
"001234" "01" 18288 "ATRION CORP" 10055 146.56
"001234" "01" 18289 "ATRION CORP"  5085 147.64
"001234" "01" 18290 "ATRION CORP"  8200 142.43

Where the top date (18266) is the 4th of January 2010. (I formatted it with %td)
The second part of my dataset is as follows:

Code:

clear
input str28 CompanyName double CashDividendsDaily int date
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   . 18266
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .
"ATRION CORP"   .     .

Where 18266 corresponds to 4th of January 2010 as well, but it is in another row, as is the case for all observations with "date" in the second dataset.
In order to merge, I named the date variable in dataset one "date", as well as the announcement date variable in the second dataset.
I tried merging like this:

Code:

merge m:1 date using "C:\path\file.dta"

However, I get the error code

Code:

variable date does not uniquely identify observations in the using data

And it messes up my data completely, and I do not understand why.

Hope this clarifies my question.

Kind regards,

Victor Schaper

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

27 Jun 2021, 14:10

Let me start with the following observation. You do not explain for your original single dataset what the meaning of the announcement date variable is. I would have expected to see a value for CashDividendDaily on the same observation as the non-missing announcement date, suggesting that on that date something happened relevant to dividends happened that had been announced on 4 January 2010.

When we look into the dividend records for Atria, we see that on 1/4/2010 they announced a divided of $6.00 with a record date of 1/19/2010 and a payment date of 1/29/2010.

So I am very confused about what you are attempting to accomplish, and the data from which you started. It seems there is a strong possibility that you have taken what would have been a simple problem to solve and complicated it by splitting your data.

I am going to assume that what you are trying to do is add a variable to your primary dataset that indicates the announcement of a dividend on the given date. Possibly you are trying to see the effect that announcement has on stock prices. I would think you would also want to know the amount of the announced divided, but this doesn't seem possible from what you have shown us of your data.

If you were to return to your original dataset and present the entire output of

Code:

dataex GlobalCompanyKeyDividends IssueIDDividends date CompanyName TradingVolumeDaily PriceCloseDaily CashDividendsDaily if CompanyName=="ATRION CORP" & inrange(date,td(1jan2010),td(31mar2010))

replacing date in the above with the name of the date variable in the original dataset, it might be possible from that example to find a straightforward way to solve your problem without starting by splitting the dataset.

For now, I take the hand you've dealt me and try to explain how to get to where you say you want to go from the point you have reached on the path you started down. That is, we will add an indicator of a stock announcement to you primary dataset on the day of the announcement.

There are two probems, at least, in what you show us.

1) Your merge should be

Code:

merge m:1 CompanyName date ...

because otherwise the announcement date you show will match not only Atrion but every company in your dataset. That's an example of "date" not uniquely identifying observations in your using dataset.

2) Even if you make that change the merge will again fail, this time telling you

Code:

variables CompanyName date do not uniquely identify observations in the using data

because you have repeated values of Atrion on the same date - the missing value. These observations should be dropped from your secondary dataset before merging.

In the code below I have initially read your primary and secondary datasets into Stata temporary datasets.

Code:

use `secondary', clear
drop if missing(date)
save `secondary', replace

use `primary', clear
merge m:1 CompanyName date using `secondary'
generate DivDeclare = _merge==3 // matched observations
drop _merge
list clean, noobs

Code:

. use `primary', clear

. merge m:1 CompanyName date using `secondary'

    Result                      Number of obs
    -----------------------------------------
    Not matched                            17
        from master                        17  (_merge==1)
        from using                          0  (_merge==2)

    Matched                                 1  (_merge==3)
    -----------------------------------------

. generate DivDeclare = _merge==3 // matched observations

. drop _merge

.
. list, clean noobs

    Global~s   IssueI~s        date   CompanyName   Tradin~y   PriceC~y   CashDi~y   DivDec~e  
      001234         01   04jan2010   ATRION CORP       1189     155.85          .          1  
      001234         01   05jan2010   ATRION CORP       2014     155.72          .          0  
      001234         01   06jan2010   ATRION CORP       5480     158.99          .          0  
      001234         01   07jan2010   ATRION CORP      10420      164.5          .          0  
      001234         01   08jan2010   ATRION CORP       2808     163.54          .          0  
      001234         01   11jan2010   ATRION CORP       8054        160          .          0  
      001234         01   12jan2010   ATRION CORP      10914      157.5          .          0  
      001234         01   13jan2010   ATRION CORP       9911     162.21          .          0  
      001234         01   14jan2010   ATRION CORP      12299      158.5          .          0  
      001234         01   15jan2010   ATRION CORP       4261     157.93          .          0  
      001234         01   19jan2010   ATRION CORP       3105     159.87          .          0  
      001234         01   20jan2010   ATRION CORP       6590     155.63          .          0  
      001234         01   21jan2010   ATRION CORP       4499     152.29          .          0  
      001234         01   22jan2010   ATRION CORP       3949     150.95          .          0  
      001234         01   25jan2010   ATRION CORP       6775     146.71          .          0  
      001234         01   26jan2010   ATRION CORP      10055     146.56          .          0  
      001234         01   27jan2010   ATRION CORP       5085     147.64          .          0  
      001234         01   28jan2010   ATRION CORP       8200     142.43          .          0  

.

Last edited by William Lisowski; 27 Jun 2021, 14:12.

Comment

Victor Schaper

Join Date: Jun 2021

Posts: 8
#5

27 Jun 2021, 15:42

Dear William,

Thank you for your help, this already has been very helpful.
I have one last question, I entered your code and it did put the dividend announcement date next to the normal date, so that worked. Only thing is, now there are many cases in which the date has been tripled or doubled, for example:

Code:

GlobalCompanyKey DividendsIssueID Dividendsdate CompanyName TradingVolume DailyPriceClose DailyCashDividends DailyDivDeclare 015833 01W 26aug2011 COATS GROUP PLC . 33 .0115 1 015833 90W 26aug2011 COATS GROUP PLC 128362 .53 .0115 1 015833 04W 26aug2011 COATS GROUP PLC 1832349 .66 .0115 1 015833 04W 29aug2011 COATS GROUP PLC 220775 .675 . 0 015833 90W 29aug2011 COATS GROUP PLC 120328 .535 . 0 015833 01W 29aug2011 COATS GROUP PLC . .33 . 0 015833 90W 30aug2011 COATS GROUP PLC 858754 .55 . 0 015833 04W 30aug2011 COATS GROUP PLC 1089910 .685 . 0

The odd thing about this is that this merged file still has approximately as many rows as the un-merged file. However, the original dataset does not have this issue.
Moreover, the original dataset (which is in excel) has different values for the trading volume and the DailyPriceClose.

Is there a way to solve this?

Many thanks again,

Victor Schaper
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

27 Jun 2021, 16:26

In your example of your repeated observations in the merged Stata dataset, with a quick sort we see

Code:

+-------------------------------------------------------------------+ | 015833 01W 26aug2011 COATS | | 015833 04W 26aug2011 COATS | | 015833 90W 26aug2011 COATS | |-------------------------------------------------------------------| | 015833 01W 29aug2011 COATS | | 015833 04W 29aug2011 COATS | | 015833 90W 29aug2011 COATS | |-------------------------------------------------------------------| | 015833 04W 30aug2011 COATS | | 015833 90W 30aug2011 COATS | +-------------------------------------------------------------------+

which shows that the observations differ by the value of DividendsIssueID. So nothing has been tripled or doubled - you will find that these observations also exist in your primary dataset and in the Excel worksheet from which this data was imported. I am concerned that you apparently don't fully understand the data you are using.

Your screenshot of your Excel dataset is unhelpful because it shows only the two observations in the Stata dataset from DividendsIssueID "01", and the 29aug2011 value of DailyPriceClose agrees exactly, and the 29aug2011 values is the same but the decimal point is missing, which I suspect is a copying error on your part, since you did not use dataex to present this example data.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

27 Jun 2021, 18:47

Correction to #5:

Your screenshot of your Excel dataset shows only the two observations in the Stata dataset from DividendsIssueID "01", and the 29aug2011 value of DailyPriceClose agrees exactly, and the 26aug2011 value is the same but the decimal point is missing.
Comment

Victor Schaper

Join Date: Jun 2021
Posts: 8

28 Jun 2021, 01:53

Dear William,

Thanks again for your patience. My point is that in the original dataset there is only one DividendsIssueID per company, so it is odd that when merging, more turn up.
Here is a part of the original dataset, without splitting them up, maybe this can help.

Code:

input str6 GlobalCompanyKeyDividends str3 IssueIDDividends int date str28 CompanyName long TradingVolumeDaily double(PriceCloseDaily CashDividendsDaily) int DividendDeclarationDate
"001234" "01" 18266 "ATRION CORP"  1189 155.85   .     .
"001234" "01" 18267 "ATRION CORP"  2014 155.72   .     .
"001234" "01" 18268 "ATRION CORP"  5480 158.99   .     .
"001234" "01" 18269 "ATRION CORP" 10420  164.5   .     .
"001234" "01" 18270 "ATRION CORP"  2808 163.54   .     .
"001234" "01" 18273 "ATRION CORP"  8054    160   .     .
"001234" "01" 18274 "ATRION CORP" 10914  157.5   .     .
"001234" "01" 18275 "ATRION CORP"  9911 162.21   .     .
"001234" "01" 18276 "ATRION CORP" 12299  158.5   . 18266
"001234" "01" 18277 "ATRION CORP"  4261 157.93   .     .
"001234" "01" 18281 "ATRION CORP"  3105 159.87   .     .
"001234" "01" 18282 "ATRION CORP"  6590 155.63   .     .
"001234" "01" 18283 "ATRION CORP"  4499 152.29   .     .
"001234" "01" 18284 "ATRION CORP"  3949 150.95   .     .
"001234" "01" 18287 "ATRION CORP"  6775 146.71   .     .
"001234" "01" 18288 "ATRION CORP" 10055 146.56   .     .
"001234" "01" 18289 "ATRION CORP"  5085 147.64   .     .
"001234" "01" 18290 "ATRION CORP"  8200 142.43   .     .
"001234" "01" 18291 "ATRION CORP"  5806 141.17   .     .
"001234" "01" 18294 "ATRION CORP"  1402 141.21   .     .
"001234" "01" 18295 "ATRION CORP" 15356 140.23   .     .
"001234" "01" 18296 "ATRION CORP"   858  137.9   .     .
"001234" "01" 18297 "ATRION CORP"  3526 131.36   .     .
"001234" "01" 18298 "ATRION CORP"  1335 133.36   .     .
"001234" "01" 18301 "ATRION CORP"  3351 130.43   .     .
"001234" "01" 18302 "ATRION CORP" 11452 133.84   .     .
"001234" "01" 18303 "ATRION CORP"  5945 130.64   .     .
"001234" "01" 18304 "ATRION CORP"  1419 131.98   .     .
"001234" "01" 18305 "ATRION CORP"  4522  137.5   .     .
"001234" "01" 18309 "ATRION CORP"  5536 139.99   .     .
"001234" "01" 18310 "ATRION CORP"  1383 144.97   .     .
"001234" "01" 18311 "ATRION CORP"  3862 146.34   .     .
"001234" "01" 18312 "ATRION CORP"  2603 152.74   .     .
"001234" "01" 18315 "ATRION CORP"  4988 155.69   .     .
"001234" "01" 18316 "ATRION CORP"  7008 159.95   .     .
"001234" "01" 18317 "ATRION CORP"  3787 157.98   .     .
"001234" "01" 18318 "ATRION CORP"  1232 157.49   .     .
"001234" "01" 18319 "ATRION CORP"  1114 155.62   .     .
"001234" "01" 18322 "ATRION CORP"  3439  150.4   .     .
"001234" "01" 18323 "ATRION CORP"  3297    150   .     .
"001234" "01" 18324 "ATRION CORP"  4747  155.9   .     .
"001234" "01" 18325 "ATRION CORP"   400 153.75   .     .
"001234" "01" 18326 "ATRION CORP"  1534 158.98   .     .
"001234" "01" 18329 "ATRION CORP"  3147 158.87   .     .
"001234" "01" 18330 "ATRION CORP"  2504    159   .     .
"001234" "01" 18331 "ATRION CORP"  1100  161.7   .     .
"001234" "01" 18332 "ATRION CORP"  2046 162.85 .36 18311
"001234" "01" 18333 "ATRION CORP"   766 161.85   .     .
"001234" "01" 18336 "ATRION CORP"  1262 159.28   .     .
"001234" "01" 18337 "ATRION CORP"  2912 155.69   .     .
"001234" "01" 18338 "ATRION CORP"  2145 152.15   .     .
"001234" "01" 18339 "ATRION CORP"  7033 145.71   .     .
"001234" "01" 18340 "ATRION CORP"  5072 149.49   .     .
"001234" "01" 18343 "ATRION CORP"  4007 147.07   .     .
"001234" "01" 18344 "ATRION CORP"  3170 148.55   .     .
"001234" "01" 18345 "ATRION CORP"  1410 149.11   .     .
"001234" "01" 18346 "ATRION CORP"   700 146.05   .     .
"001234" "01" 18347 "ATRION CORP"  1041 146.19   .     .
"001234" "01" 18350 "ATRION CORP"  1908 144.71   .     .
"001234" "01" 18351 "ATRION CORP"  1699 143.98   .     .
"001234" "01" 18352 "ATRION CORP"  2789 143.04   .     .
end
format %td date
format %td DividendDeclarationDate

Hope can illustrate my problem better

Kind regards,

Victor

Comment

Victor Schaper

Join Date: Jun 2021

Posts: 8
#9

28 Jun 2021, 02:09

Please ignore this specific message

Last edited by Victor Schaper; 28 Jun 2021, 02:20. Reason: Sorry I made a mistake here
Comment
Victor Schaper

Join Date: Jun 2021

Posts: 8
#10

28 Jun 2021, 04:30

Dear William,

Thanks for your help, my problem has been solved, it lied in the duplicates within the original dataset.

I have solved it by creating a tag with how many duplicates there were for a specific observation and dropped those duplicates.

Best,

Victor
Comment

Announcement