Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Your Email:

Scraping Links With PHP

By Justin Laing
2008-01-06


XPath Makes Getting The Links You Want Easy

Now for the real magic of the DOM: XPath! XPath allows you to gather collections of DOM nodes (otherwise known as tags in HTML). Say you want to only get links that are within unordered lists. All you have to do is write a query like “/html/body//ul//li//a” and pass it to XPath->evaluate(). I’m not going to go into all the ways you can use XPath because I’m just learning myself and someone else has already made a great list of examples: XPath Examples. Here’s a code snippet that will just get every link on the page using XPath:

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");


Tutorial Pages:
» Scraping Links With PHP
» Get The Page Content
» Tip: Fake Your User Agent
» Using PHP’s DOM Functions To Parse The HTML
» XPath Makes Getting The Links You Want Easy
» Iterate And Store Your Links
» Your Completed Link Scraper
» What Else Could I Do With This Thing?
» Is Scraping Content Legal?


Originally posted on Makebeta


 | Bookmark
Related Tutorials:
» Port Scanning and Service Status Checking in PHP
» Web Database Access from Desktop Applications
» CubeCart 3.0 Installation and Configuration
» PHP Site Search Made Easy
» Installing and Configuring Drupal 6.1
» Desktop Application Development with PHP-GTK