Hi Statalist,
I want to merge two datasets that I call parent data and child data in this post. Parent data has 208 observations and child data has 2623 observations.
Both datasets have a variable named "npf" containing French names and a variable named "id" that I created and that is my key variable for merging.
The _merge variable displays the following result:
. tab _merge
_merge | Freq. Percent Cum.
------------------------------+--------------------------------------------------
only in using data | 14 0.53 0.53
both in master and using data | 2,609 9.47 100.00
------------------------------+--------------------------------------------------
Total | 2,623 100.00
An inspection of the child data (my using data) with "tab npf, nolabel" shows that the value labels goes from 1 to 209 (instead of 208), in particular for, say, the npf "cesar" I have two values with their respective frequencies:
181 césar 21
182 cesar 1
I tried to remove the value label "182" but this changes the number of observations for child data. I also copied and pasted the original data in a new excel file but the problem remains.
What I want to get is the name written without any accent.
How can I fix this problem?
I also inspected the parent data with "tab npf, nol" which gives 224 value labels while the number of observations remains 208. From my understanding, all these value labels arise after changes I made to the names of the variable "npf" in the original excel file of parent data to remove names with characters that have accent. It seems that each time Stata keeps memory of all these changes I made.
Any suggestion for some good practice to avoid this kind of problems in the future?
Thank you!
Chwen Chwen
I want to merge two datasets that I call parent data and child data in this post. Parent data has 208 observations and child data has 2623 observations.
Both datasets have a variable named "npf" containing French names and a variable named "id" that I created and that is my key variable for merging.
The _merge variable displays the following result:
. tab _merge
_merge | Freq. Percent Cum.
------------------------------+--------------------------------------------------
only in using data | 14 0.53 0.53
both in master and using data | 2,609 9.47 100.00
------------------------------+--------------------------------------------------
Total | 2,623 100.00
An inspection of the child data (my using data) with "tab npf, nolabel" shows that the value labels goes from 1 to 209 (instead of 208), in particular for, say, the npf "cesar" I have two values with their respective frequencies:
181 césar 21
182 cesar 1
I tried to remove the value label "182" but this changes the number of observations for child data. I also copied and pasted the original data in a new excel file but the problem remains.
What I want to get is the name written without any accent.
How can I fix this problem?
I also inspected the parent data with "tab npf, nol" which gives 224 value labels while the number of observations remains 208. From my understanding, all these value labels arise after changes I made to the names of the variable "npf" in the original excel file of parent data to remove names with characters that have accent. It seems that each time Stata keeps memory of all these changes I made.
Any suggestion for some good practice to avoid this kind of problems in the future?
Thank you!
Chwen Chwen
Comment