Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble Importing JSON Data into Stata

    Hi all,

    I'm trying to import data in JSON format into Stata to construct a time series and I've looked at the "insheetjson" guide but haven't been able to figure it out.

    Say I want to import from the following url: http://stats.grok.se/json/en/201308/Bitcoin

    My end goal is to have one variable where each row is a day (2013-08-01, 2013-08-02, etc), and one variable where each row is the number of page views.

    So far I have:

    gen str20 date=""
    gen str20 views=""
    insheetjson date views using "http://stats.grok.se/json/en/201308/Bitcoin", showresponse

    I get this back:

    Response from server:
    {
    "daily_views" :
    {
    "2013-08-26" : "7351",
    "2013-08-27" : "9386",
    "2013-08-28" : "9560",
    "2013-08-19" : "10826",
    "2013-08-18" : "6747",
    "2013-08-31" : "5259",
    "2013-08-29" : "8280",
    "2013-08-15" : "13482",
    "2013-08-14" : "15454",
    "2013-08-17" : "7780",
    "2013-08-16" : "11230",
    "2013-08-11" : "6069",
    "2013-08-10" : "5136",
    "2013-08-13" : "11157",
    "2013-08-12" : "10290",
    "2013-08-30" : "7050",
    "2013-08-20" : "11569",
    "2013-08-21" : "11741",
    "2013-08-22" : "10214",
    "2013-08-23" : "8863",
    "2013-08-24" : "6216",
    "2013-08-25" : "5688",
    "2013-08-08" : "11422",
    "2013-08-09" : "8161",
    "2013-08-06" : "6994",
    "2013-08-07" : "7421",
    "2013-08-04" : "5322",
    "2013-08-05" : "8729",
    "2013-08-02" : "6321",
    "2013-08-03" : "5310",
    "2013-08-01" : "6973"
    },
    "project" : "en",
    "month" : "201308",
    "rank" : "89",
    "title" : "Bitcoin"

    I'm not sure what to do from here. I want "daily_views" to correspond to all of the numbers denoting views, and I don't want 31 separate variables for each day, which is what I think I'd get if I used any of the dates as column names/selectors. If I add

    tableselector("daily_views")

    it tells me that "possible name candidates" for the columns are the dates ("2013-08-26", for instance). But when I try adding

    col("2013-08-26")

    it gives me an error that says:

    injson_sheet() 3200 conformability error
    <istmt>: - function returned error.

    I've looked for solutions to this error online, but if I'm not mistaken, the error refers to some sort of incorrect matrix multiplication in mata where the matrices are not conformable...so I'm confused.

    Thanks in advance! I really appreciate any help.

    Best,

    Daniel
    Last edited by Daniel Tartakovsky; 28 May 2014, 17:21.

  • #2
    Daniel,

    For what it's worth, here is how you can read in one row worth of data:

    Code:
    gen str20 views=""
    insheetjson views using "http://stats.grok.se/json/en/201308/Bitcoin", col("daily_views:2013-08-26")
    Unfortunately, I think you are stuck with querying each day's data one at a time. It look likes, however, you might be able to do a loop for this purpose:

    Code:
    gen str20 views=""
    forvalues day=0/30 {
       local mdy=mdy(8,1,2013)+`day'
       local key : display "daily_views:" %tdCCYY-NN-DD `mdy'
       local obs = `day'+1
       insheetjson views using "http://stats.grok.se/json/en/201308/Bitcoin", col("`key'") offset(`day') replace
    }
    gen date=mdy(8,1,2013)+_n-1
    format date %td
    Of course, this can be expanded to get multiple months and to account for varying numbers of days per month.

    Regards,
    Joe

    Comment


    • #3
      Daniel,

      Thanks for your post, I am interested in this subject. On number of occasions I had to revert to R in order to import linked data. In particular, I would like to issue SPARQL commands via Stata. Presently I am only able to do it in R via the CRAN SPARQL package. With respect to the open data and Stata, wbopendata (SSC) by World Bank proves that it is possible to marry open data frameworks with Stata. Ideally, I would like to do this for the data sets that I tend to use, like the Open Data Scotland project.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        Thanks so much, Joe! I don't have much experience with loops yet - any chance you could link me to a guide to combining things like loops for months and days at the same time? Say I'd like to get the data from 2011 to 2014. You specified `day', so would I need to also do the same thing for `month' and `year' and somehow use "offset" but with intervals? Is this just a "loop within a loop"?

        Additionally, since certain months have different numbers of days, could I just specify 31 days for everything and then remove days with 0 page views observed? I feel like STATA would refuse to import JSON data for a non-existent date like February 31, though. Also, I would have to loop through URLs in order to get all the months? Is that possible in STATA? I was thinking of adding something like:

        Code:
        forvalues month=01/12 {
        forvalues day=0/30 { ......
        local URL = "http://stats.grok.se/json/en/2013"`month'"/Bitcoin"
        insheetjson views using `URL', ...
        Would something like that work or am I better off just creating a loop for each month separately?

        I appreciate the help!
        Last edited by Daniel Tartakovsky; 29 May 2014, 09:47.

        Comment


        • #5
          Daniel,

          Since Stata date variables are just the number of days since a fixed point in the past (1/1/1960) you could actually do this with one loop rather than loops within loops. This would assume that every day from 1/1/2011 to 12/31/2014 is available, but even if it isn't you can adjust the code to account for such errors. (Actually, it looks like insheetjson just returns an empty string when there is no data for a particular date.)

          Code:
           
           gen str20 views="" 
           local startdate=mdy(1,1,2011)
           local enddate=mdy(12,31,2014)
           local obs=0
           forvalues mdy=`startdate'/`enddate' {    
              local key : display "daily_views:" %tdCCYY-NN-DD `mdy'    
              local yrmon : display %tdCCYYNN `mdy'
              insheetjson views using "http://stats.grok.se/json/en/`yrmon'/Bitcoin", col("`key'") offset(`obs') replace 
              local ++obs
          } 
          gen date=`startdate'+_n-1 
          format date %td
          Regards,
          Joe

          P.S. If you ever do need to do loops within loops, here is the basic idea:

          Code:
          forvalues year=2011/2014 {
            forvalues month=1/12 {
              forvalues day=1/31 {
                local mdy=mdy(`month',`day',`year')
                etc...
              }
            }
          }

          Comment


          • #6
            In Joe's last code segment the key to non-existent days is just that mdy will be missing. Accordingly, jump out of the innermost loop when that is so.

            Comment


            • #7
              Thanks so much for the help! Loops make everything so simple...

              Comment

              Working...
              X