I am interested in scraping Google Scholar citations for a list of 700+ papers I have listed in a CSV file. I've written some fairly simple Stata code to do the web scraping, but I've found that after about 40 queries Google "blocks" me from entering any more queries and I receive the error message:
web error 503
could not open url
Does anyone have tips for getting around the 503 error for several hundred queries? I've tried running the queries at non-regular intervals to simulate a human, using Stata's sleep command, but with no luck.
I've been using Stata's import delimited command. Is this the best option? The relevant portion of my code is:
foreach title in[list of paper titles] {
local website "https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&q=`title' "
import delimited using "`website'", [options]
}
web error 503
could not open url
Does anyone have tips for getting around the 503 error for several hundred queries? I've tried running the queries at non-regular intervals to simulate a human, using Stata's sleep command, but with no luck.
I've been using Stata's import delimited command. Is this the best option? The relevant portion of my code is:
foreach title in[list of paper titles] {
local website "https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&q=`title' "
import delimited using "`website'", [options]
}
Comment