Hello,
I'm trying to use the infix command to import a fixed-width text file that contains Japanese.
The original dataset was in SHIFT_JIS encoding (which is "ibm-943_P15A-2003" based on the page from "help encodings"), so I converted the file to UTF-8 using unicode translate:
unicode encoding set ibm-943_P15A-2003
unicode translate "h3jcho19.dat"
I then confirmed with my text editor (VS Code) that the Japanese characters appear correctly whenever I view the file with a UTF-8 encoding.
Then I used the infix command to import the file. However, when Stata imports the file, many of the characters are incorrect and they appear as squares with question marks (�). The infix help file says "If string data are encoded as ASCII or UTF-8, they will be imported correctly." So why am I unable to get the correct Japanese characters?
Any suggestions?
Thanks in advance for the help. As an FYI, I have a Mac and I am using Stata 18.
(as an aside, I also tried converting the original file to UTF-8 using Terminal's iconv command, and once again, I confirmed that my text editor can read the new file, but then Stata could not read this file either).
I'm trying to use the infix command to import a fixed-width text file that contains Japanese.
The original dataset was in SHIFT_JIS encoding (which is "ibm-943_P15A-2003" based on the page from "help encodings"), so I converted the file to UTF-8 using unicode translate:
unicode encoding set ibm-943_P15A-2003
unicode translate "h3jcho19.dat"
I then confirmed with my text editor (VS Code) that the Japanese characters appear correctly whenever I view the file with a UTF-8 encoding.
Then I used the infix command to import the file. However, when Stata imports the file, many of the characters are incorrect and they appear as squares with question marks (�). The infix help file says "If string data are encoded as ASCII or UTF-8, they will be imported correctly." So why am I unable to get the correct Japanese characters?
Any suggestions?
Thanks in advance for the help. As an FYI, I have a Mac and I am using Stata 18.
(as an aside, I also tried converting the original file to UTF-8 using Terminal's iconv command, and once again, I confirmed that my text editor can read the new file, but then Stata could not read this file either).
Comment