Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting column names when using Python in Stata

    I am learning how to use the -sfi.Data- class when using Python in Stata. When I import the dataset from Stata to Python, the columns get numbered index, instead of keeping their variable names. I guess this is because [-sfi.Data.getVar()- returns a list. I propose one solution below, but does anyone know of an easier/built-in method?

    Code:
    version 16.0
    python:
    stata: sysuse auto, clear
    
    from sfi import Data
    import pandas as pd
    
    # Load dataset from Stata to Python
    df_data = pd.DataFrame(Data.get())
    
    # Read column names from Stata and assign to dataframe columns
    df_data.columns = [
        Data.getVarName(column_index) for column_index in range(len(df_data.columns))
    ]
    
    print(df_data.columns)
    end

  • #2
    Hi Karl,

    Try -getAsDict()- instead. See below for an example.

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . python
    ----------------------------------------------- python (type end to exit) --------------------------------------------------
    >>> import sfi
    >>> import pandas
    >>> pandas.DataFrame(sfi.Data.get()).head()
                  0     1   2              3    4   ...   7   8    9     10  11
    0    AMC Concord  4099  22   3.000000e+00  2.5  ...  186  40  121  3.58   0
    1      AMC Pacer  4749  17   3.000000e+00  3.0  ...  173  40  258  2.53   0
    2     AMC Spirit  3799  22  8.988466e+307  3.0  ...  168  35  121  3.08   0
    3  Buick Century  4816  20   3.000000e+00  4.5  ...  196  40  196  2.93   0
    4  Buick Electra  7827  15   4.000000e+00  4.0  ...  222  43  350  2.41   0
    
    [5 rows x 12 columns]
    >>> pandas.DataFrame(sfi.Data.getAsDict()).head()
                make  price  mpg  ...  displacement  gear_ratio  foreign
    0    AMC Concord   4099   22  ...           121        3.58        0
    1      AMC Pacer   4749   17  ...           258        2.53        0
    2     AMC Spirit   3799   22  ...           121        3.08        0
    3  Buick Century   4816   20  ...           196        2.93        0
    4  Buick Electra   7827   15  ...           350        2.41        0
    
    [5 rows x 12 columns]
    >>> end
    ------
    Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
    ----
    Research Fellow
    Fors Marsh

    ----
    Version 18.0 MP

    Comment


    • #3
      Excellent Joseph

      I overlooked that method reading the documentation, but it looks like what I need. I haven't testet it yet, but thank you for your reply!

      //KW

      Comment

      Working...
      X