17 month ago
Andy Baio : Adrian Holovaty releases templatemaker, a Python library for smart screen scraping - given a large set of HTML documents, intelligently extracts the strings that change between them
Matthew M. Boedicker : templatemaker, Python screenscraping library - (via waxy) [via]
joshua : Introducing templatemaker - back out templates from similar documents
Rod Begbie : Introducing templatemaker - Python library that analyses a corpus of web pages, works out where the dynamic values are in the template, then allows you to scrape out the juicy details. I can think of oh, so many uses for this. [via] #
philgyford : Introducing templatemaker | Holovaty.com - Python thing. Point it at some HTML files and it will make a template with holes for the unique strings in the pages. (via Daring Fireball)
# copy
28 month ago
deusx : Zend Developer Zone | Tidying up your HTML with PHP 5 - "The Tidy extension is new in PHP 5, and is available from PHP version 5.0b3 upward. It is based on the TidyLib library, and allows the developer to validate, repair, and parse HTML, XHTML and XML documents from within PHP."
# copy
31 month ago
Simon Willison : Microsummaries in Firefox 2 - Neat new feature: short summaries of pages extracted using XSLT.
deusx : Microsummaries - MozillaWiki - "Microsummaries are regularly-updated succinct compilations of the most important information on web pages."
Paul Hammond : Microsummaries - MozillaWiki - Microsummaries are regularly-updated succinct compilations of the most important information on web pages
Isofarro : Microsummaries - 'Microsummaries are regularly-updated succinct summaries of web pages. They are compact enough to fit in the space available to a bookmark label, provide more useful information about pages than static page titles, and are regularly updated as new informa
# copy
35 month ago
deusx : HTML Screen Scraping: A How-To Document - "This document explains how to do HTML screen scraping. In effect it shows how to treat the Web as a resource by enabling you to retrieve and extract data from HTML Web pages."
# copy
38 month ago
deusx : For GRDDL-heads: XSLT+Tidy from Mark Nottingham on 2005-10-19 (semantic-web@w3.org from October 2005) - "Let the scraping begin..."
# copy
39 month ago
deusx : Read/Write Web: The danger of running a remix service - "Populicio.us still lost their service because their reliance on del.icio.us fell away, but the lesson here is that screen scraping HTML comes with those risks by nature."
# copy
43 month ago
deusx : miscoranda: Link in a Soupstack - "The problem with getting links from HTML is that the HTML you find lying about on the web is often quite broken..."
# copy
55 month ago
deusx : Pop Goes the Gmail. SMTP/POP server for Gmail!
# copy
55 month ago
Simon Willison : Beautiful Soup - Ultra Liberal Python HTML/XHTML parser. (via)
Nelson Minar : Python screen scrape - Beautiful soup - Python library for screenscraping HTML
deusx : Beautiful Soup - "You didn't write that awful page. You're just trying to get some data out of it. Right now, you don't really care what HTML is supposed to look like."
Paul Hammond : Beautiful Soup - You didn't write that awful page. You're just trying to get some data out of it
Anne van Kesteren : Beautiful Soup - I want this, only with XPath or CSS Selectors. Learning a new selecting language again and again is annoying. Please base your product on standards. #
Matthew M. Boedicker : Beautiful Soup, Python lib for screenscraping
Rod Begbie : Beautiful Soup - Python HTML parser which doesn't choke on malformed markup. Handy for screenscraping. #
philgyford : Beautiful Soup: We called him Tortoise because he taught us. - "Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping."
factoryjoe : Beautiful Soup: We called him Tortoise because he taught us. - Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Saved By: Chris Messina | View Details | Give Thanks Tags: python, beautifulsoup, css, screenscraping, microformats
# copy