blogmarks.net Get Firefox!

Adrian Holovaty releases templatemaker, a Python library for smart screen scraping

17 month ago

Andy Baio : Adrian Holovaty releases templatemaker, a Python library for smart screen scraping - given a large set of HTML documents, intelligently extracts the strings that change between them

Matthew M. Boedicker : templatemaker, Python screenscraping library - (via waxy) [via]

joshua : Introducing templatemaker - back out templates from similar documents

Rod Begbie : Introducing templatemaker - Python library that analyses a corpus of web pages, works out where the dynamic values are in the template, then allows you to scrape out the juicy details. I can think of oh, so many uses for this. [via#

philgyford : Introducing templatemaker | Holovaty.com - Python thing. Point it at some HTML files and it will make a template with holes for the unique strings in the pages. (via Daring Fireball)

Tags : dev python web adrianholovaty screenscraping html scraping templates templating top via:daringfireball webdevelopment

  copy

Zend Developer Zone | Tidying up your HTML with PHP 5

28 month ago

deusx : Zend Developer Zone | Tidying up your HTML with PHP 5 - "The Tidy extension is new in PHP 5, and is available from PHP version 5.0b3 upward. It is based on the TidyLib library, and allows the developer to validate, repair, and parse HTML, XHTML and XML documents from within PHP."

Tags : php scrapers scraping tidy webdev

  copy

Microsummaries in Firefox 2

31 month ago

Simon Willison : Microsummaries in Firefox 2 - Neat new feature: short summaries of pages extracted using XSLT.

deusx : Microsummaries - MozillaWiki - "Microsummaries are regularly-updated succinct compilations of the most important information on web pages."

Paul Hammond : Microsummaries - MozillaWiki - Microsummaries are regularly-updated succinct compilations of the most important information on web pages

Isofarro : Microsummaries - 'Microsummaries are regularly-updated succinct summaries of web pages. They are compact enough to fit in the space available to a bookmark label, provide more useful information about pages than static page titles, and are regularly updated as new informa

Tags : firefox mozilla scraping webdev xsl

  copy

HTML Screen Scraping: A How-To Document

35 month ago

deusx : HTML Screen Scraping: A How-To Document - "This document explains how to do HTML screen scraping. In effect it shows how to treat the Web as a resource by enabling you to retrieve and extract data from HTML Web pages."

Tags : programming scraping webdev

  copy

For GRDDL-heads: XSLT+Tidy from Mark Nottingham on 2005-10-19 (semantic-web@w3.org from October 2...

38 month ago

deusx : For GRDDL-heads: XSLT+Tidy from Mark Nottingham on 2005-10-19 (semantic-web@w3.org from October 2005) - "Let the scraping begin..."

Tags : grddl microformats scraping xslt

  copy

Read/Write Web: The danger of running a remix service

39 month ago

deusx : Read/Write Web: The danger of running a remix service - "Populicio.us still lost their service because their reliance on del.icio.us fell away, but the lesson here is that screen scraping HTML comes with those risks by nature."

Tags : del.icio.us hacks scraping webdev webservices

  copy

miscoranda: Link in a Soupstack

43 month ago

deusx : miscoranda: Link in a Soupstack - "The problem with getting links from HTML is that the HTML you find lying about on the web is often quite broken..."

Tags : html python scraping

  copy

Pop Goes the Gmail. SMTP/POP server for Gmail!

55 month ago

deusx : Pop Goes the Gmail. SMTP/POP server for Gmail!

Tags : mail scraping webdev

  copy

Beautiful Soup

55 month ago

Simon Willison : Beautiful Soup - Ultra Liberal Python HTML/XHTML parser. (via)

Nelson Minar : Python screen scrape - Beautiful soup - Python library for screenscraping HTML

deusx : Beautiful Soup - "You didn't write that awful page. You're just trying to get some data out of it. Right now, you don't really care what HTML is supposed to look like."

Paul Hammond : Beautiful Soup - You didn't write that awful page. You're just trying to get some data out of it

Anne van Kesteren : Beautiful Soup - I want this, only with XPath or CSS Selectors. Learning a new selecting language again and again is annoying. Please base your product on standards. #

Matthew M. Boedicker : Beautiful Soup, Python lib for screenscraping

Rod Begbie : Beautiful Soup - Python HTML parser which doesn't choke on malformed markup. Handy for screenscraping. #

philgyford : Beautiful Soup: We called him Tortoise because he taught us. - "Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping."

factoryjoe : Beautiful Soup: We called him Tortoise because he taught us. - Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Saved By: Chris Messina | View Details | Give Thanks Tags: python, beautifulsoup, css, screenscraping, microformats

Tags : python scraping html screenscraping beautifulsoup top webdevelopment

  copy
xml
Upian.