difficult to merge cross section_ with panel data set

Jessie Lu

Join Date: Jan 2021
Posts: 7

difficult to merge cross section_ with panel data set

07 Jan 2022, 05:33

Dear Statalist,

Is any way that I could merge cross-section data into panel data set based on their stock code? Here is my sample of data:

Cross-section data: file1

StockCode

Stock Short Name

Industry Code C

000001

PAYH

J66

000002

WKA

K70

Panel data set: file 2

StockCode	Enddate	DirectorNumber
000001	2014-12-31	15
000001	2015-12-31	14
000001	2016-12-31	11
000001	2017-12-31	14
000001	2018-12-31	14
000001	2019-12-31	13
000001	2020-12-31	15
000002	2014-12-31	11
000002	2015-12-31	11
000002	2016-12-31	11
000002	2017-12-31	11
000002	2018-12-31	11
000002	2019-12-31	10
000002	2020-12-31	11

open the file 1 and
Code: merge m:m StockCode using "C:\Users\Jessie\Desktop\Data\file 2.dta". I have tried the merge function using this code which doesn't allow the data to automatically fill up the blank sheet. Appreciate it if anyone could help. And I expect the result below.

turns into:

StockCode	Enddate	DirectorNumber	Stock Short name	Industry Code
1	2014/12/31	15	PAYH	J66
1	2015/12/31	14	PAYH	J66
1	2016/12/31	11	PAYH	J66
1	2017/12/31	14	PAYH	J66
1	2018/12/31	14	PAYH	J66
1	2019/12/31	13	PAYH	J66
1	2020/12/31	15	PAYH	J66
2	2014/12/31	11	WKA	K70
2	2015/12/31	11	WKA	K70
2	2016/12/31	11	WKA	K70
2	2017/12/31	11	WKA	K70
2	2018/12/31	11	WKA	K70
2	2019/12/31	10	WKA	K70
2	2020/12/31	11	WKA	K70

Last edited by Jessie Lu; 07 Jan 2022, 06:13.

Tags: group, merge, panel data

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

07 Jan 2022, 08:36

First things first. It seems unlikely that you have carefully read the documentation for the merge command. The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu, and linked to from the output of help merge.

m:m merges

m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.

If you are thinking about using merge m:m, it is a near certainty that at least one of the following is true:

1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

4. You actually need to append your datasets rather than merge them.

5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.

In your case, it seems to me that merge 1:m is what you need, because each StockCode appears exactly one time in your file1 dataset.
Comment
George Ford

Join Date: Aug 2014

Posts: 3046
#3

07 Jan 2022, 08:41

use joinby.

Code:

joinby StockCode using filename

you can add ", unmatched(master)" and such if you can't to keep the unmatched. Joinby will assign the cross section data to all units of StockCode.
Comment

Announcement

difficult to merge cross section_ with panel data set

Comment

Comment