Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • difficult to merge cross section_ with panel data set

    Dear Statalist,

    Is any way that I could merge cross-section data into panel data set based on their stock code? Here is my sample of data:

    Cross-section data: file1
    StockCode Stock Short Name Industry Code C
    000001 PAYH
    J66
    000002 WKA K70
    Panel data set: file 2
    StockCode Enddate DirectorNumber
    000001 2014-12-31 15
    000001 2015-12-31 14
    000001 2016-12-31 11
    000001 2017-12-31 14
    000001 2018-12-31 14
    000001 2019-12-31 13
    000001 2020-12-31 15
    000002 2014-12-31 11
    000002 2015-12-31 11
    000002 2016-12-31 11
    000002 2017-12-31 11
    000002 2018-12-31 11
    000002 2019-12-31 10
    000002 2020-12-31 11
    open the file 1 and
    Code: merge m:m StockCode using "C:\Users\Jessie\Desktop\Data\file 2.dta". I have tried the merge function using this code which doesn't allow the data to automatically fill up the blank sheet. Appreciate it if anyone could help. And I expect the result below.

    turns into:
    StockCode Enddate DirectorNumber Stock Short name Industry Code
    1 2014/12/31 15 PAYH J66
    1 2015/12/31 14 PAYH J66
    1 2016/12/31 11 PAYH J66
    1 2017/12/31 14 PAYH J66
    1 2018/12/31 14 PAYH J66
    1 2019/12/31 13 PAYH J66
    1 2020/12/31 15 PAYH J66
    2 2014/12/31 11 WKA K70
    2 2015/12/31 11 WKA K70
    2 2016/12/31 11 WKA K70
    2 2017/12/31 11 WKA K70
    2 2018/12/31 11 WKA K70
    2 2019/12/31 10 WKA K70
    2 2020/12/31 11 WKA K70
    Last edited by Jessie Lu; 07 Jan 2022, 06:13.

  • #2
    First things first. It seems unlikely that you have carefully read the documentation for the merge command. The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu, and linked to from the output of help merge.

    m:m merges

    m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

    Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.
    If you are thinking about using merge m:m, it is a near certainty that at least one of the following is true:

    1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

    2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

    3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

    4. You actually need to append your datasets rather than merge them.

    5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.

    In your case, it seems to me that merge 1:m is what you need, because each StockCode appears exactly one time in your file1 dataset.

    Comment


    • #3
      use joinby.

      Code:
      joinby StockCode using filename
      you can add ", unmatched(master)" and such if you can't to keep the unmatched. Joinby will assign the cross section data to all units of StockCode.

      Comment

      Working...
      X