Scraping Links With PHP
By Justin Laing2008-01-06
Using PHP’s DOM Functions To Parse The HTML
PHP provides with a really cool tool for working with HTML content: DOM Functions. The DOM Functions allow you to parse HTML (or XML) into an object structure (or DOM - Document Object Model). Let’s see how we do it:
$dom = new DOMDocument();
@$dom->loadHTML($html);
Wow is it really that easy? Yes! Now we have a nice DOMDocument object that we can use to access everything within the HTML in a nice clean way. I discovered this over at Russll Beattie’s post on: Using PHP TO Scrape Sites As Feeds, thanks Russell!
Tutorial Pages:
» Scraping Links With PHP
» Get The Page Content
» Tip: Fake Your User Agent
» Using PHP’s DOM Functions To Parse The HTML
» XPath Makes Getting The Links You Want Easy
» Iterate And Store Your Links
» Your Completed Link Scraper
» What Else Could I Do With This Thing?
» Is Scraping Content Legal?
Originally posted on Makebeta
