I’m working with a dataset that contains information about businesses:
Some convenience stores located at gas station convenience stores are listed more than once - once as a gas station, and then separately as a convenience store, and likely with different names. They would have the same address, but different NAICScode (445120 for covenience store and 447190 for gas station). I created the duplicateaddress variable with the following code:
duplicates tag address if NAICScode==445120 | NAICScode==447190, gen(duplicateaddress)
I would like a way to identify companies that have an address duplicate (duplicateaddress>0) but different NAICS codes. Any suggestions?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str30 company str47 address long NAICScode byte duplicateaddress "CIRCLE K" "4474 RANDOLPH RD CHARLOTTE" 445120 0 "CIRCLE K" "2326 OWEN DR FAYETTEVILLE" 445120 0 "CIRCLE K" "1627 E MAIN ST LINCOLNTON" 445120 0 "CIRCLE K" "1001 SE CARY PKWY CARY" 445120 0 "CIRCLE K" "379 SHAWNEEHAW AVE BANNER ELK" 445120 0 "CIRCLE K" "4100 WESTERN BLVD RALEIGH" 445120 1 "CIRCLE K" "8191 CLIFFDALE RD FAYETTEVILLE" 445120 0 "CIRCLE K" "2820 LILLINGTON HWY SPRING LAKE" 445120 0 "CIRCLE K" "919 DURHAM RD WAKE FOREST" 445120 0 "CIRCLE K" "7905 SOUTH BLVD CHARLOTTE" 445120 0 "CIRCLE K" "8500 HARPS MILL RD RALEIGH" 445120 0 "CIRCLE K" "8925 PINEVILLE MATTHEWS RD CHARLOTTE" 445120 1 "CIRCLE K" "6400 BURLINGTON RD WHITSETT" 445120 0 "CIRCLE K" "1003-1007 SPRING LN SANFORD" 445120 0 "CIRCLE K" "1612 CONOVER BLVD E CONOVER" 445120 0 "CIRCLE K" "1803 SAFRIET RD STATESVILLE" 445120 0 "CIRCLE K" "3505 KILDAIRE FARM RD CARY" 445120 0 "CIRCLE K" "546 ASHDALE CT CONCORD" 445120 0 "CIRCLE K" "3101 YANCEYVILLE ST GREENSBORO" 445120 0 "CIRCLE K" "903 N NC 16 HWY CONOVER" 445120 0 "CIRCLE K" "8100 POPLAR TENT RD CONCORD" 445120 0 "CIRCLE K" "1764 IRELAND DR FAYETTEVILLE" 445120 0 "CIRCLE K" "3458 N MAIN ST HOPE MILLS" 445120 0 "CIRCLE K" "5981 UNIVERSITY PKWY WINSTON SALEM" 445120 0 "CIRCLE K" "3308 APEX HWY # 55 DURHAM" 445120 0 "CIRCLE K" "501 W 3RD ST PEMBROKE" 445120 0 "CIRCLE K" "5018 SUNSET RD CHARLOTTE" 445120 0 "CIRCLE K" "220 CHARLOTTETOWNE AVE CHARLOTTE" 445120 0 "CIRCLE K" "3721 TRYON RD RALEIGH" 445120 0 "CIRCLE K" "1800 N CROATAN HWY KILL DEVIL HILLS" 445120 0 "CIRCLE K" "3424 MATTHEWS MINT HILL RD MATTHEWS" 445120 0 "CIRCLE K" "1830 N WESLEYAN BLVD ROCKY MOUNT" 445120 1 "CIRCLE K" "217 SALISBURY ST ROCKWELL" 445120 0 "CIRCLE K" "2195 EVANS ST GREENVILLE" 445120 0 "CIRCLE K" "1145 COPPERFIELD BLVD NE CONCORD" 445120 0 "CIRCLE K" "3289 AVENT FERRY RD RALEIGH" 445120 1 "CIRCLE K" "1032 N HARRISON AVE CARY" 445120 1 "CIRCLE K" "1115 RANDOLPH ST THOMASVILLE" 445120 0 "CIRCLE K" "1630 SUNSET AVE ROCKY MOUNT" 445120 0 "CIRCLE K" "2606 CAROLINA COMMERCE DR GOLDSBORO" 445120 0 "CIRCLE K" "700 JONESTOWN RD WINSTON SALEM" 445120 0 "CIRCLE K" "225 CLEVELAND AVE KINGS MOUNTAIN" 445120 0 "CIRCLE K" "2200 S TRYON ST CHARLOTTE" 445120 0 "CIRCLE K" "7910 RAEFORD RD FAYETTEVILLE" 445120 0 "CIRCLE K" "2494 HOPE MILLS RD HOPE MILLS" 445120 0 "CIRCLE K" "2372 ZEPHYR RD DOBSON" 445120 0 "CIRCLE K" "429 E WEATHERSPOON ST SANFORD" 445120 0 "CIRCLE K" "3602 REHOBETH CHURCH RD GREENSBORO" 445120 0 "CIRCLE K" "1818 N BERKELEY BLVD GOLDSBORO" 445120 0 "CIRCLE K" "2200 FLEMING RD GREENSBORO" 445120 0 "CIRCLE K" "765 VALLEY RD MOCKSVILLE" 445120 0 "CIRCLE K" "2877 WARD BLVD WILSON" 445120 0 "CIRCLE K" "3739 CAROLINA BEACH RD WILMINGTON" 445120 0 "CIRCLE K" "421 TYVOLA RD CHARLOTTE" 445120 0 "CIRCLE K" "100 W WOODCROFT PKWY DURHAM" 445120 0 "CIRCLE K" "15620 DON LOCHMAN LN CHARLOTTE" 445120 0 "CIRCLE K" "2500 NEW BERN HWY JACKSONVILLE" 445120 0 "CIRCLE K" "10000 N TRYON ST CHARLOTTE" 445120 0 "CIRCLE K" "621 GREEN VALLEY RD GREENSBORO" 445120 0 "CIRCLE K" "6519 BROOKSHIRE BLVD CHARLOTTE" 445120 0 "CIRCLE K" "6648 GORDON RD WILMINGTON" 445120 0 "CIRCLE K" "199 PINE VALLEY RD JACKSONVILLE" 445120 0 "CIRCLE K" "1300 W SUGAR CREEK RD CHARLOTTE" 445120 0 "CIRCLE K" "4050 RIVER POINTE PL HIGH POINT" 445120 0 "CIRCLE K" "3503 WEDDINGTON RD MONROE" 445120 0 "CIRCLE K" "110 W HAGGARD AVE ELON" 445120 0 "CIRCLE K" "3001 PLEASANT GARDEN RD GREENSBORO" 445120 0 "CIRCLE K" "2434 S NEW HOPE RD GASTONIA" 445120 0 "CIRCLE K" "1910 N MAIN ST MT AIRY" 445120 0 "CIRCLE K" "1218 STATE FARM RD BOONE" 445120 0 "CIRCLE K" "2350 US HIGHWAY 70 SE HICKORY" 445120 0 "CIRCLE K" "112 ISLAND FORD RD MAIDEN" 445120 0 "CIRCLE K" "1301 ROBESON ST FAYETTEVILLE" 445120 0 "CIRCLE K" "4000 MEMORIAL DR WINTERVILLE" 445120 0 "CIRCLE K" "1101 WALNUT ST CARY" 445120 0 "CIRCLE K" "2227 ROCKFORD ST MT AIRY" 445120 0 "CIRCLE K" "4400 LOUISBURG RD RALEIGH" 445120 0 "CIRCLE K" "4330 LOUISBURG RD RALEIGH" 445120 0 "CIRCLE K" "3053 CASTLE HAYNE RD CASTLE HAYNE" 445120 0 "CIRCLE K" "6759 CAROLINA BEACH RD WILMINGTON" 445120 0 "CIRCLE K" "812 S HORNER BLVD SANFORD" 445120 0 "CIRCLE K" "101 E KING ST KING" 445120 0 "CIRCLE K" "4525 MAIN ST SHALLOTTE" 445120 0 "CIRCLE K" "807 CONOVER BLVD W CONOVER" 445120 0 "CIRCLE K" "11399 US 15 501 N CHAPEL HILL" 445120 0 "CIRCLE K" "9501 UNIVERSITY CITY BLVD CHARLOTTE" 445120 0 "CIRCLE K" "106 HICKORY TREE RD WINSTON SALEM" 445120 0 "CIRCLE K" "2105 TEN TEN RD APEX" 445120 0 "CIRCLE K" "144 CEDAR CREEK RD FAYETTEVILLE" 445120 0 "CIRCLE K" "873 LONG BRANCH RD DUNN" 445120 1 "CIRCLE K" "1908 US HIGHWAY 117 S GOLDSBORO" 445120 0 "CIRCLE K" "1550 W GATE CITY BLVD GREENSBORO" 445120 0 "CIRCLE K" "2531 NORTH CAROLINA HIGHWAY 87 CAMERON" 445120 0 "CIRCLE K" "1135 PAMALEE DR FAYETTEVILLE" 445120 0 "CIRCLE K" "701 N GRAHAM ST CHARLOTTE" 445120 0 "CIRCLE K" "104 CHERAW RD HAMLET" 445120 0 "CIRCLE K" "327 CHICKEN FOOT RD HOPE MILLS" 445120 0 "CIRCLE K" "3122 FORT BRAGG RD FAYETTEVILLE" 445120 0 "CIRCLE K" "3102 BRAGG BLVD FAYETTEVILLE" 445120 0 "CIRCLE K" "601 E SOUTH MAIN ST WAXHAW" 445120 0 end label values duplicateaddress duplicates label def duplicates 0 "No", modify label def duplicates 1 "Yes, 2 occurrences total", modify
Some convenience stores located at gas station convenience stores are listed more than once - once as a gas station, and then separately as a convenience store, and likely with different names. They would have the same address, but different NAICScode (445120 for covenience store and 447190 for gas station). I created the duplicateaddress variable with the following code:
duplicates tag address if NAICScode==445120 | NAICScode==447190, gen(duplicateaddress)
I would like a way to identify companies that have an address duplicate (duplicateaddress>0) but different NAICS codes. Any suggestions?
Comment