Reading JSON files into Stata

Gordon Hughes

Join Date: Mar 2023

Posts: 28
#1

Reading JSON files into Stata

20 Apr 2024, 08:26

This is partly a request to Stata and partly a request for guidance. There are an increasing number of databases that fill data requests by exporting JSON files. Unfortunately, JSON is a long way from being a standardised format, though there are modules in Python and other languages that identify and handle various JSON formats. However, it would be really nice if Stata could write the equivalent of import delimited to handle different JSON formats.

For now, there is the user written procedure insheetjson.ado but I am unable to find a way of getting it to import one kind of JSON file format that I encounter frequently. To explain, the conventional format for JSON files is a set of nested tables or n-tuples as follows

Code:

{<headers> , "tablename":[{"vara":x1, "varb":y1, "varc":z1, ...}{"vara":x2,"varb":y2,"varc":z2, ...}{ ... } ]

In effect, each row in the table is presented as an n-tuple and the input process is to identify the table and then read each row as a separate observation in the Stata datasheet. So far so good.

However, there is another variant of JSON files in which tables are presented one column at time as in:

Code:

<headers>, "tablename":[{"vara":[x1,x2,x3, ...],"varb":[y1,y2,y3,...],"varc:[z1,z2,z3, ...], ...}]

which is much messier to handle. Think of the standard variant as row-wise format and the alternative as column-wise format. I haven't seen this terminology used but it seems to be the logical way of thinking about different JSON presentations of tabular data. Generating column-wise formatted data seems a bit strange to me but there may be reasons why database designers have chosen to organise their data in this way.

I am able to use insheetjson.ado to read row-wise JSON files but it gets column-wise formats completely wrong. Maybe I don't understand the options fully, but there is nothing that seems to allow reading column-wise formatted data.

Any suggestions?
Tags: None

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

21 Apr 2024, 04:59

While waiting, fwiw:

Code:

python:

import pandas as pd
import json

with open('data.json', 'r', encoding='utf-8') as file:
    json_data = json.load(file)
    
df = pd.DataFrame(json_data["tablename"][0])
    
df.to_stata('data_from_json.dta', version=118)

end

Code:

use data_from_json.dta

. list 

     +----------------------------+
     | index   vara   varb   varc |
     |----------------------------|
  1. |     0     x1     y1     z1 |
  2. |     1     x2     y2     z2 |
  3. |     2     x3     y3     z3 |
     +----------------------------+

. type data.json
{
    "headers": ["vara", "varb", "varc"],
    "tablename": [
        {"vara": ["x1", "x2", "x3"], "varb": ["y1", "y2", "y3"], "varc": ["z1", "z2", "z3"]}
    ]
}

Announcement

Reading JSON files into Stata

Comment