How to scrape data from a website?

Let’s learn how to scrape data from a website. The most accurate or helpful solution is served by Stack Overflow.

There are ten answers to this question.

Best solution

PHP: how to extract content or scrape data sets from website source page

I would like to know how to scrape the content of the source code from website using php. I have tried using http://simplehtmldom.sourceforge.net/ and also looked at Best methods to parse HTML with PHP Im still having hard time trying to get info from the source code. As you can see the main page of the source code contain the link list of author which include the year and the number of books wrote. <div id="fleft"> <ul> <li><a href="http://www.books.com/john...

Answer:

you should mention what approach you are using to get html of target page, i suppose that you have html...

Read more

merrill at Stack Overflow Mark as irrelevant Undo

Other solutions

How do I scrape data off a page displaying live scores on CricInfo website, using BeautifulSoup,lxml(xpath or cssselect) and Python?

I want scrape the teams,score,batsmen,bowlers,fall of wickets etc. This screenshot is an example.

Answer:

This is the approach that I generally use for scraping. 1. Understand source code of the website/webpage...

Read more

Akshat Goel at Quora Mark as irrelevant Undo

I'm a candidate for a data scientist position. Is it a good idea to scrape the company's website and run an analysis on the data to show my worth?

Should note that the scraping will be done in an unobtrusive manner. Edit: Following up on an answer from Sean Owen, it's important to clarify that the company in question is a web business which centers around a community, and so there's a lot of data...

Answer:

No, it is almost certainly a waste of time. I can't think of an interesting stat you are likely to get...

Read more

Sean Owen at Quora Mark as irrelevant Undo

What is the best way to scrape data from a website?

My question is also available on Stackoverflow: What's the best way of scraping data from a website? If the question gets anwsered I'll update this question. I need to extract some information from a website, but the website doesn’t provide...

Answer:

ParseHub [1] is a recently launched company started by former Facebook Data Tools engineers that has...

Read more

Paul King at Quora Mark as irrelevant Undo

How to scrape the Google cache

Had a plan to scrape a website, but now it's down indefinitely. Google has the site cached, but this makes things kind of complicated. Newbie questions about scraping websites and using the Google cache inside. I've read this question - should I be trying...

Answer:

You may want to check on archive.org's WayBackMachine as well which could help cover question 2) above...

Read more

hot soup at Ask.Metafilter.Com Mark as irrelevant Undo

How to scrape a web forum?

How to scrape a web forum? I need help understanding the process. I want to figure out how to scrape a website, specifically a forum. It’s a site that’s been around for a long time with a lot of knowledge, but the...

Answer:

I've successfully used SiteScraper for something similar (grabbing a bunch of hiking pages for offline...

Read more

[insert clever name here] at Ask.Metafilter.Com Mark as irrelevant Undo

Web scrapping of paginated data on web page(c#?

I have been given an interesting problem at work. We scrape the Dept. of Labor & Industries website to get information on contractors, which in turn is then used to populate some fields in our web portal for our insurance agents. Recently, they have...

Answer:

I would start by installing Firebug in Firefox and using its NET panel to look at exactly what is being...

Read more

Ratchetr at Yahoo! Answers Mark as irrelevant Undo

Is it illegal to scrape data resulting from someone else's unpermitted scrape?

SinglePlatform and Locu have scraped hundreds of thousands of restaurant websites, without permission, and have structured their menu data into their respective offerings. Since their APIs essentially prohibit commercial uses, why couldn't a startup...

Answer:

You have two things to worry about, copyright, and the various rules that deal with scraping.  Copyright...

Read more

Neil Aggarwal at Quora Mark as irrelevant Undo

Running a data only website?

Any ideas on starting or running a website just for data and charts? I love data but hate the news. I always wonder "how do they know Americans drove 50 million fewer miles last month"? Where do they get that data? I want to start a website...

Answer:

It's really not my desire to piss in your cornflakes, but there are already several websites that are...

Read more

gnossos at Ask.Metafilter.Com Mark as irrelevant Undo

Launching a website that contains public government data

Where can I find information about the legality of using public government data on my website? I came across the website EveryBlock.com, which takes data from city governments and posts them on its website in a nice user interface. These include things...

Answer:

All federal work is in the public domain. I assume this applies to state and local government, but I...

Read more

lunchbox at Ask.Metafilter.Com Mark as irrelevant Undo

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.