I’m trying to understand this bizarre behavior I’m encountering with merge. Unfortunately, the files are too large to attach here, and an excerpt won’t really help because when I restrict the data to subsets (explained below), everything works as intended.
Here’s an example, when I contract the identifier and merge on the test_2 dataset, all but one area_fips merge.
However, without restricting the dataset I get the following result, with virtually nothing merging in.
Notice above how area_fips == "01001" shows no matches. However, restricting the dataset beforehand, everything works as intended
Is this a bug? If not, can someone explain this behavior?
Here’s an example, when I contract the identifier and merge on the test_2 dataset, all but one area_fips merge.
Code:
use "test_1.dta", clear
. contract area_fips
. merge 1:1 area_fips using "test_2.dta"
Result # of obs.
-----------------------------------------
not matched 3,277
from master 3,276 (_merge==1)
from using 1 (_merge==2)
matched 1,449 (_merge==3)
-----------------------------------------
Code:
. use "test_2.dta", replace
. describe
Contains data from test_2.dta
obs: 1,450
vars: 2 2 Mar 2022 08:23
--------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------
area_fips str5 %9s County FIPS code
cbsa long %10.0g
--------------------------------------------------------------------------------------------------------------------------
Sorted by:
. isid area_fips
.
. merge 1:m area_fips using "test_1.dta"
Result # of obs.
-----------------------------------------
not matched 107,624,840
from master 1,448 (_merge==1)
from using 107,623,392 (_merge==2)
matched 2 (_merge==3)
-----------------------------------------
.
. tab _merge if area_fips == "01001"
_merge | Freq. Percent Cum.
------------------------+-----------------------------------
master only (1) | 1 0.00 0.00
using only (2) | 26,816 100.00 100.00
------------------------+-----------------------------------
Total | 26,817 100.00
Code:
use "test_1.dta", clear
. describe
Contains data from test_1.dta
obs: 107,623,394
vars: 2 2 Mar 2022 08:27
--------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------
area_fips str5 %9s
industry_code int %10.0g
--------------------------------------------------------------------------------------------------------------------------
Sorted by: area_fips
. keep if area_fips == "01001"
(107,596,578 observations deleted)
.
. merge m:1 area_fips using "test_2.dta"
Result # of obs.
-----------------------------------------
not matched 1,449
from master 0 (_merge==1)
from using 1,449 (_merge==2)
matched 26,816 (_merge==3)
-----------------------------------------
Code:
. about Stata/SE 16.1 for Mac (Intel 64-bit) Revision 07 Dec 2021 Copyright 1985-2019 StataCorp LLC Total physical memory: 64.00 GB
Comment