Hello Everybody,
This is my first time posting, so I hope I'm following the correct protocol. I'm merging two Excel documents and there are 5,597 families that have an identification number in each data set. The first data set has general program information with demographics and the second data set has outcomes. I have successfully merged the data sets by their ID but I am finding a curious error when I double-check the data. The outcomes data that is merged on is sometimes incorrect i.e. the data of an occurrence date of an outcome is sometimes wrong. I pasted an example below and the erroneous dates are in bold blue font; I am using Stata 15.1.
+-----------------------------------------------------------------------------------------+
| famid txstart term_date termdate timeintx term_reason2 rerefer1 after_f.. after_f~2|
|-----------------------------------------------------------------------------------------|
| 1095732 11 Feb 14 10 Apr 14 10 Apr 14 58 Arrested Yes 26 Feb 14 26 Feb 14
| 1090180 04 Apr 13 23 Jul 13 23 Jul 13 110 Arrested Yes 29 Jun 13 29 Jun 13
I sorted the data before doing a 1:1 merge, and it seemed successful because I received the following message:
. tab _merge_linkagesfinal
_merge_final | Freq. Percent Cum.
-----------------------------------------------------------
using only (2) | 1 0.02 0.02
matched (3) | 5,596 99.98 100.00
------------------------+-----------------------------------
Total | 5,597 100.00
I would appreciate any help in figuring out why these date occasionally do not match the original Excel data sets. Thanks!
This is my first time posting, so I hope I'm following the correct protocol. I'm merging two Excel documents and there are 5,597 families that have an identification number in each data set. The first data set has general program information with demographics and the second data set has outcomes. I have successfully merged the data sets by their ID but I am finding a curious error when I double-check the data. The outcomes data that is merged on is sometimes incorrect i.e. the data of an occurrence date of an outcome is sometimes wrong. I pasted an example below and the erroneous dates are in bold blue font; I am using Stata 15.1.
+-----------------------------------------------------------------------------------------+
| famid txstart term_date termdate timeintx term_reason2 rerefer1 after_f.. after_f~2|
|-----------------------------------------------------------------------------------------|
| 1095732 11 Feb 14 10 Apr 14 10 Apr 14 58 Arrested Yes 26 Feb 14 26 Feb 14
| 1090180 04 Apr 13 23 Jul 13 23 Jul 13 110 Arrested Yes 29 Jun 13 29 Jun 13
I sorted the data before doing a 1:1 merge, and it seemed successful because I received the following message:
. tab _merge_linkagesfinal
_merge_final | Freq. Percent Cum.
-----------------------------------------------------------
using only (2) | 1 0.02 0.02
matched (3) | 5,596 99.98 100.00
------------------------+-----------------------------------
Total | 5,597 100.00
I would appreciate any help in figuring out why these date occasionally do not match the original Excel data sets. Thanks!
Comment