Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Downloading from Kaggle

    I wanna copy a zipfile into my directory (without using Python, though I will if I must).

    This is the dataset's homepage, a Walmart dataset, to be precise.

    When I try
    Code:
    copy "https://www.kaggle.com/datasets/yasserh/walmart-dataset/download?datasetVersionNumber=1" "archive.zip", replace
    I get a zipfile, but when I click to open it, the computer says it's empty and Stata won't unzip it for the same reason

    How might I solve this, aside from Python's requests/Selenium?

  • #2
    An issue that I see is that Kaggle requires sign in before you can download the file. So I won't say that the files are openly accessible.

    Comment


    • #3
      It seems your best bet is to use Kaggle's API, which is only accessible through Python. GitHub - Kaggle/kaggle-api: Official Kaggle API

      Comment


      • #4
        Okay Python it is then! Thank you both!

        Comment


        • #5
          For anyone who finds this at a later date, let me correct a misunderstanding in the problem description from post #1.

          When I try

          copy "https://www.kaggle.com/datasets/yasserh/walmart-dataset/download?datasetVersionNumber=1" "archive.zip", replace

          I get a zipfile,
          To be precise, you create a file named archive.zip, but that is simply the name you assigned the file that receives whatever is sent by the host in response to the URL.

          Perhaps in your browser that URL downloads a zipfile containing the requested data, but that likely reflects saved cookies or other browser metadata from previous visits to Kaggle - the sort of thing provided to "keep me logged in here". That is of course browser-specific. Having visited the URL using a particular browser does not mean other browsers on your computer - including Stata's code for sending and receiving HTML - will have similar access.

          Opening "archive.zip" in a text editor displays not a zip file, but rather a text file containing the HTML for the Kaggle sign-in page; the first few lines of the version sent to me are displayed below.
          Code:
          
          <!DOCTYPE html>
          <html lang="en">
          
          <head>
            <title>Kaggle: Your Home for Data Science</title>
            <meta charset="utf-8" />
              <meta name="robots" content="index, follow" />
            <meta name="description" content="Kaggle is the world&#x2019;s largest data science community with powerful tools and resources to help you achieve your data science goals." />
            <meta name="turbolinks-cache-control" content="no-cache" />
              <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=5.0, minimum-scale=1.0">
            <meta name="theme-color" content="#008ABC" />
          Last edited by William Lisowski; 27 Jul 2022, 10:25.

          Comment

          Working...
          X