Hello Everybody,
This is my second time posting, so I’m going to try to add additional information to see if it helps. I have an existing dataset of 5,597 families that participated in a treatment program, and I noticed that some of their termination dates were missing. Thus, I received an updated dataset in Excel to fill in the missing termination dates, which I converted to a Stata dataset using Stattransfer version 14.
The first time I tried to merge on the additional dataset, it did not recognize the identifier although it was a numeric variable. Thus I used the following code, which helped me match the identifiers:
tostring fpid, replace format (%07.0f)
encode fpid, g (fpid2)
rename fpid fpid_original
rename fpid2 fpid
sort fpid
save
Below is the dataex from this new dataset
I successfully merged this dataset by the fpid to the new one, and the merged dataset is below. However, I am finding a curious error when I double-check the data. Some of the merged dates are incorrect i.e. the start dates do not match and should match every time. The two variables that should match are fpstart & FP_REFERRAL_START. I pasted the dataex below and the erroneous variables are in bold font; I am using Stata 15.1.
Below is the dataex from this merged dataset
I would appreciate any help in figuring out why these start dates do not match. To see the error, you will have to compare FP_REFERRAL_START in the old dataset and the merged one. I’m wondering if it had something to do with the initial code I used to convert the fpid from string back to numeric. Thanks!
This is my second time posting, so I’m going to try to add additional information to see if it helps. I have an existing dataset of 5,597 families that participated in a treatment program, and I noticed that some of their termination dates were missing. Thus, I received an updated dataset in Excel to fill in the missing termination dates, which I converted to a Stata dataset using Stattransfer version 14.
The first time I tried to merge on the additional dataset, it did not recognize the identifier although it was a numeric variable. Thus I used the following code, which helped me match the identifiers:
tostring fpid, replace format (%07.0f)
encode fpid, g (fpid2)
rename fpid fpid_original
rename fpid2 fpid
sort fpid
save
Below is the dataex from this new dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str7 fpid_original long(FP_REFERRAL_START FP_REFERRAL_TERM FP_REFERRAL_END fpid) "1087671" 19361 19509 19541 1 "1087672" 19361 19395 19541 2 "1087674" 19361 19416 19541 3 "1087698" 19360 19524 19539 4 "1087703" 19360 19477 19540 5 "1087704" 19366 19473 19534 6 "1087722" 19365 19438 19545 7 "1087756" 19361 19446 19541 8 "1087759" 19361 19386 19541 9 "1087760" 19361 19375 19541 10 "1087762" 19361 19477 19541 11 "1087765" 19366 19414 19540 12 "1087799" 19360 19526 19540 13 "1087813" 19366 19730 19730 14 "1087814" 19366 19502 19546 15 "1087818" 19365 19754 19788 16 "1087822" 19365 19563 19607 17 "1087823" 19365 19458 19545 18 "1087837" 19361 19607 19633 19 "1087839" 19372 19446 19542 20 end format %tdD_m_Y FP_REFERRAL_START format %tdD_m_Y FP_REFERRAL_TERM format %tdD_m_Y FP_REFERRAL_END label values fpid fpid2 label def fpid2 1 "1087671", modify label def fpid2 2 "1087672", modify label def fpid2 3 "1087674", modify label def fpid2 4 "1087698", modify label def fpid2 5 "1087703", modify label def fpid2 6 "1087704", modify label def fpid2 7 "1087722", modify label def fpid2 8 "1087756", modify label def fpid2 9 "1087759", modify label def fpid2 10 "1087760", modify label def fpid2 11 "1087762", modify label def fpid2 12 "1087765", modify label def fpid2 13 "1087799", modify label def fpid2 14 "1087813", modify label def fpid2 15 "1087814", modify label def fpid2 16 "1087818", modify label def fpid2 17 "1087822", modify label def fpid2 18 "1087823", modify label def fpid2 19 "1087837", modify label def fpid2 20 "1087839", modify
I successfully merged this dataset by the fpid to the new one, and the merged dataset is below. However, I am finding a curious error when I double-check the data. Some of the merged dates are incorrect i.e. the start dates do not match and should match every time. The two variables that should match are fpstart & FP_REFERRAL_START. I pasted the dataex below and the erroneous variables are in bold font; I am using Stata 15.1.
Below is the dataex from this merged dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(fpstart term_date fpid FP_REFERRAL_START FP_REFERRAL_TERM) 19361 19509 1 19361 19509 19361 19395 2 19361 19395 19361 19416 3 19361 19416 19360 19524 4 19360 19524 19360 19477 5 19360 19477 19365 19438 6 19366 19473 19361 19477 7 19365 19438 19360 19526 8 19361 19446 19366 19730 9 19361 19386 19366 19502 10 19361 19375 19365 19754 11 19361 19477 19365 19563 12 19366 19414 19365 19458 13 19360 19526 19361 19607 14 19366 19730 19372 19446 15 19366 19502 19360 19477 16 19365 19754 19372 19558 17 19365 19563 19372 19537 18 19365 19458 19362 19705 19 19361 19607 19372 19409 20 19372 19446 end format %tdD_m_Y fpstart format %tdD_m_Y term_date format %tdD_m_Y FP_REFERRAL_START format %tdD_m_Y FP_REFERRAL_TERM label values fpid fpid label def fpid 1 "1087671", modify label def fpid 2 "1087672", modify label def fpid 3 "1087674", modify label def fpid 4 "1087698", modify label def fpid 5 "1087703", modify label def fpid 6 "1087722", modify label def fpid 7 "1087762", modify label def fpid 8 "1087799", modify label def fpid 9 "1087813", modify label def fpid 10 "1087814", modify label def fpid 11 "1087818", modify label def fpid 12 "1087822", modify label def fpid 13 "1087823", modify label def fpid 14 "1087837", modify label def fpid 15 "1087839", modify label def fpid 16 "1087843", modify label def fpid 17 "1087844", modify label def fpid 18 "1087846", modify label def fpid 19 "1087847", modify label def fpid 20 "1087848", modify
I would appreciate any help in figuring out why these start dates do not match. To see the error, you will have to compare FP_REFERRAL_START in the old dataset and the merged one. I’m wondering if it had something to do with the initial code I used to convert the fpid from string back to numeric. Thanks!
Comment