JSON File into Stata

Paul Olz

Join Date: Nov 2022
Posts: 27

JSON File into Stata

14 Nov 2023, 05:03

Dear all,

I face a problem using a .json file (https://www.sec.gov/files/company_tickers.json) in Stata. Using the insheetjson command gets me the following problem with this code.

Any help would be appriciated!

Code:

Code:

gen str10 cik_str = ""
gen str10 ticker = ""
gen str50 title = ""

insheetjson cik_str ticker title using "cik_ticker_company.json", columns("cik_str" "ticker" "title") tableselector("*") flatten replace

Response:

Code:

libjson::parse: unexpected character '9' at position 68609 forced error #-22
VERSAL DISPLAY CORP \\PA\\"},"968":{"cik_str":6201,"ticker":
                              ^  parse error here!
  IONAL INC /DE/"},"967":{"cik_str":1005284,"ticker":"OLED","title":"UNIVERSAL DISPLAY CORP \\PA\\"},"968":{"cik_
> str":6201,"ticker":
 Warning: No response from source?!?

Tags: insheetjson, JSON

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

14 Nov 2023, 22:44

If you can't get that to work for you, then you can try a Python approach if you have it set up. For example, the following code works. (Do-file and log file attached if you want to explore it further.)

Code:

version 18.0 clear * copy "https://www.sec.gov/files/company_tickers.json" "company_tickers.json" python: import json import pandas as pd from sfi import Data f = open("company_tickers.json", "r") data_frame = pd.DataFrame(json.load(f)).transpose() f.close() Data.setObsTotal(len(data_frame)) Data.addVarLong("cik_str") Data.addVarStr("ticker", 10) Data.addVarStr("title", 100) Data.store("cik_str", None, data_frame["cik_str"], None) Data.store("ticker", None, data_frame["ticker"], None) Data.store("title", None, data_frame["title"], None) end compress list in 1/5, noobs count exit

Unfortunately, for the moment in Stata proper you can't use context manager or anything in Python that requires indentation (which is just about everything), and so it's a little clunky if you want to run it from Stata's Python environment as above. But you can put the Python code (with indentation) in a script file and call it with python script.
Attached Files

company_tickers.do (720 Bytes, 1 view)

company_tickers.smcl (1.6 KB, 1 view)
1 like
Comment

Luis Pecht

Join Date: May 2017
Posts: 151

15 Nov 2023, 03:27

In the meanwhile, an alternative, inspired in

https://www.stata.com/meeting/chicag..._matsuoka.pptx

would be:

Code:

clear all
tempfile f1 f2
copy "company_tickers.json" `f1'
hexdump `f1', analyze
filefilter `f1' `f2', from ("\r\n") to ("")
filefilter `f2' `f1', replace from ("{") to ("\r\n{")
import delimited `f1', delimit("@@@", asstring) clear
split v1  , p(`",""') gen (x)
list x* in 1/3
replace x1 = subinstr( x1 , `"{"cik_str":"',  "", .)
replace x2 = subinstr( x2 , `"ticker":"',  "", .)
replace x3 = subinstr( x3 , `"title":"',  "", .)
replace x3 = subinstr( x3 , `"}"',  "", .)
replace x4 = subinstr( x4 , `"":"',  "", .)
list x* in 1/10

     +----------------------------------------------------------------------+
     |      x1        x2                                            x3   x4 |
     |----------------------------------------------------------------------|
  1. |   {"0":                                                              |
  2. |  320193    "AAPL"                                  "Apple Inc."    1 |
  3. |  789019    "MSFT"                              "MICROSOFT CORP"    2 |
  4. | 1067983   "BRK-B"                      "BERKSHIRE HATHAWAY INC"    3 |
  5. |  731766     "UNH"                      "UNITEDHEALTH GROUP INC"    4 |
     |----------------------------------------------------------------------|
  6. |  200406     "JNJ"                           "JOHNSON & JOHNSON"    5 |
  7. |   34088     "XOM"                            "EXXON MOBIL CORP"    6 |
  8. | 1045810    "NVDA"                                 "NVIDIA CORP"    7 |
  9. | 1046179     "TSM"   "TAIWAN SEMICONDUCTOR MANUFACTURING CO LTD"    8 |
 10. |  104169     "WMT"                                "Walmart Inc."    9 |
     +----------------------------------------------------------------------+

Last edited by Luis Pecht; 15 Nov 2023, 03:58. Reason: remembered,also, using jsonio, somewhere

Comment

Luis Pecht

Join Date: May 2017
Posts: 151

15 Nov 2023, 03:57

or, using jsonio (from SSC)

Code:

jsonio kv, file("company_tickers.json") nourl elem("[0-9]+")
split key  , p("/") gen (x)
drop x1 key
reshape wide value,i(x2) j(x3) string
. list in -10/-1

       +---------------------------------------------------------------------+
       |   x2   valuec~r   valuet~r                               valuetitle |
       |---------------------------------------------------------------------|
11637. | 9990    1856948      CHEAW                  Chenghe Acquisition Co. |
11638. | 9991    1878074      BWAQU              Blue World Acquisition Corp |
11639. | 9992    1878074      BWAQR              Blue World Acquisition Corp |
11640. | 9993    1871983      ANGHW                              Anghami Inc |
11641. | 9994    1870404      PBAXW        PHOENIX BIOTECH ACQUISITION CORP. |
       |---------------------------------------------------------------------|
11642. | 9995    1870404      PBAXU        PHOENIX BIOTECH ACQUISITION CORP. |
11643. | 9996    1842937      HCVIW     Hennessy Capital Investment Corp. VI |
11644. | 9997    1842937      HCVIU     Hennessy Capital Investment Corp. VI |
11645. | 9998    1843762      EQRXW                               EQRx, Inc. |
11646. | 9999    1845550      APTMU   Alpha Partners Technology Merger Corp. |
       +---------------------------------------------------------------------+

Comment

Paul Olz

Join Date: Nov 2022

Posts: 27
#5

15 Nov 2023, 04:47

Dear Joseph and Luis, thank you very much for your effort and the corresponding solutions!

Since the second approach from Luis seems to be the shortest and (from my point of view) the most logical one, I'll go with this one.
Comment

Luis Pecht

Join Date: May 2017
Posts: 151

15 Nov 2023, 05:02

Actually, insheetjson (from SSC) presents two problems here:

1) you will need to replace any "\" to "", externally, before parse it;
2) this JSON structure embodies 11645 different objects, sequentially numbered, not a array, so you would need to loop thru it;

Code:

. gen str10 cik_str = ""
. gen str10 ticker = ""
. gen str50 title = ""
. insheetjson cik_str ticker title using "company_tickers.json", columns("cik_str" "ticker" "title") tableselector("11645") replace
 1 observations updated/written.

. list

     +---------------------------------------+
     | cik_str   ticker                title |
     |---------------------------------------|
  1. |  753308   NEE-PP   NEXTERA ENERGY INC |
     +---------------------------------------+

Comment

Gordon Hughes

Join Date: Mar 2023

Posts: 28
#7

06 Aug 2024, 03:43

While insheetjson, jsonio and Python occasionally work, my experience is that they are more trouble than they are worth for any but the simplest JSON structures. The problem is files that are nested more than one level deep, especially when the data is not organised as a table, even if the core data is tabular. For some reason jsonio seems to generate Java errors in many cases (using StataNow 18.5) and similarly I get Python errors when trying to follow Joseph's example. I find that insheetjson only works when the JSON structure is strictly tabular. That is as advertised but what seems to be tabular data is often organised differently in JSON files. I have no doubt that part of the blame lies with database managers who generate excessively complicated JSON structures.

From experience, my preferred option is to convert any JSON file that I want to import into Stata to CSV format first and then use import delimited ... which is easy to control. [Of course, things would be so much easier if data providers would export CSV files but in some cases they appear to deliberately avoid that option. Nested tables need only some thought to export as CSV files but data managers prefer you to do the work rather than them.] How you do this will vary from whether you are carrying out an occasional conversion of small files, in which case online programs are an easy option. Since I have moderately complex and recurrent data extraction programs, I prefer to use a command line utility that can be called from Stata. These vary from the simple to the very powerful (but costly) such as jq.

My current utility of choice is provided by Gunamoi Software (https://www.gunamoi.com.au/soft/savejson2csv/index.html) which offers a program called SaveJson2Csv as a menu driven GUI or a command line interface. I have no connection with them. The program is freeware but with donations invited. No doubt, at some point I will get dissatisfied with it but with the amount of time that I have spent trying to get other procedures working, I can strongly recommend it.
Comment

Announcement

JSON File into Stata

Comment

Comment

Comment

Comment

Comment

Comment