Unicode decode error while using Data.getAsDict in sfi.Data

Francisco Leal Augusto

Join Date: Jul 2023

Posts: 7
#1

Unicode decode error while using Data.getAsDict in sfi.Data

27 Jul 2023, 04:28

Dear Stata Forum,

While trying to import some data from a Stata dataset to a Dictionary with the Data.getAsDict module in Python, the following error was retrieved:

Code:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 1: invalid continuation byte

The code I am currently using is:

Code:

python from sfi import Data import numpy as np import pandas as pd dataraw = Data.getAsDict(None, valuelabel=False, missingval=np.nan) end

I am not at ease with this topic, but I think it may be related with the characters in string variable in the Stata dataset. In fact, running this code with the auto dataset works fine, but it triggers the error with my dataset. I present a sample of the dataset below:

Variable 1 [long] Variable 2 [double] Variable 3 [str12]

20211231 45411111 NIF / NIPC

20211231 45411112 NIF / NIPC

20211231 45411113 NIF / NIPC

Please note that if I limit the dataset to Variable 1 and Variable 2, no error is triggered.

I am currently using Stata 17 in a Windows.

How may I solve this issue?

Please state if there is any other information I may provide which may help.

Thanks in advance,
Francisco
Tags: None

Announcement