How to extract text from web page?

Let’s learn how to extract text from web page. The most accurate or helpful solution is served by Stack Overflow.

There are ten answers to this question.

Best solution

How to extract the first few sentences from a body of text on web page

We are building some sort of digg site and want to automatically fetch limited text (2-3 sentences). It can be last 3 sentences of article.if that would be easier. At the momemt we fetch web page content without the problem but want to make universal script to get few sentences. We want to avoid making custom scripts for each web site from which we want to get content. I was thinking to find the text block by dots. To find dots in a close range and than to get words around dots. That is raw idea...


You could look for large portions of the document that have less markup and less vertical whitespace...

Read more

Croky at Stack Overflow Mark as irrelevant Undo

Other solutions

Has anyone used NLTK to extract noun phrase embedded in web page content? How much work does it take to write a NLTK code for extracting noun phrase from web page? Also, do I need to train it in order to extract noun phrase from text in English. Can I use the default module without any training. Tha

How would one compare NLTK to openNLP? I am a newbies to text analysis. I need to write a code for extracting and identifying noun phrase from web page. Any suggestion or input are appreciated. Thanks,


For Noun Phrase extraction, I think OpenNLP is much easier to use than NLTK. Since you are starting...

Read more

Sujit Pal at Quora Mark as irrelevant Undo

PHP question: I know how to read the text on a web page through PHP code, but how can I extract the HTML...?

background code that generated the text on the web page? For instance, I would want to extract the http information from this: <h3 class="ens fontnormal"><a href=" y-The-Sweet-History-books -ISBN...


left click and click on page source, or go to view and go to page source

Read more

Annie M at Yahoo! Answers Mark as irrelevant Undo

Extract Text and Images From a PDF

I'd like to extract the text and images from a multi-page pdf to use on the web. I've got a number of large (100 to 200 pages) PDFs that I need to extract the text and images from to use with a CMS for a website. I've looked at a number of PDF to HTML...


I've been through tons of PDF converters, and the one that i liked the most is ABBYY PDF Transformer...

Read more

backwards guitar at Ask.Metafilter.Com Mark as irrelevant Undo

Is there an API which can extract article text from any blog page?

I'd like to read clean article text into my own rss reader. Is there an API which I can use to extract only the article text from any blog web page? Thanks


Yes, there are several ways. In any case, you need to extract full HTML source, then extract from it...

Read more

bababyt at Yahoo! Answers Mark as irrelevant Undo

How can I get the person's name, designation, and company name from a web page to store in a database ?

I  have to extract text from a web page via web crawler then parse the  data with a name of the person with his designation and company name to  store in a database, their is no special tag in html content to specify  so... I am out of options .. please...


Hi for this you require to learn a lot of things I am providing you a link which explains all the things...

Read more

Sachin Sarawgi at Quora Mark as irrelevant Undo

Printing out selected text from a web page - text not realigning to fit page?

I want to print selected text from a page of a website and it is printing out "as is" on the screen instead of realigning to fit on the printed page. Up to now on this same website I have been able to print the text on the pages with it realigned...


I went to the web site, copied some of the text and pasted it into a word processor. When I printed...

Read more

Loki at Yahoo! Answers Mark as irrelevant Undo

HTML: Make text appear on web page when text is entered into a textbox?

Hello. I'm fairly new to HTML and I would like to see how the following would work. I've done this in VB using If statements and I want to do the same with web pages. So basically, suppose you have a code: XXXX When, you enter this code into a textbox...


Use Javascript: Check contents of textarea with javascript function executed on appropriate event (see...

Read more

Sarim at Yahoo! Answers Mark as irrelevant Undo

Internet 9 changes text on a web page?

Internet 9 changes text on a web page? So I have a Tumblr and I do most editing in Google chrome because google chrome is what I use. Well I get it looking great on that, but then I visited from Internet explore 9 and it changed my text. I had bullets...


It's happening because you're using non-standard HTML. Without seeing the code, I expect that Firefox...

Read more

Makayla at Yahoo! Answers Mark as irrelevant Undo

I get the yahoo web page BUT the email text is not on the web page I can see the email on the email drop down?

I am not getting any email on the web page and I get emails that I can see on the email drop down box


If you are having problems you may want to consider the "basic" version. "Classic"...

Read more

C46T3PQT7UG73FJVSRPRVJHHGY at Yahoo! Answers Mark as irrelevant Undo

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.