Scraping Links With PHP
By Justin Laing2008-01-06
Next we’ll iterate through all the links we’ve gathered using XPath and store them in a database. First the code to iterate through the links:
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
storeLink($url,$target_url);
}
$hrefs is an object of type DOMNodeList and item() is a function that returns a DOMNode object for the specified index. The index can be between 0 and $hrefs->length. So we’ve got a loop that retrieves each link as a DOMNode object.
$url = $href->getAttribute('href');
DOMNodes inherit the getAttribute() function from the DOMElement class. getAttribute() returns any attribute of the node (in this case an tag with the href attribute). Now we’ve got our URL and we can store it in the database.
We’ll want a database table that looks something like this:
CREATE TABLE `links` (
`url` TEXT NOT NULL ,
`gathered_from` TEXT NOT NULL ,
`time_stamp` TIMESTAMP NOT NULL
);
We’ll a storeLink() function to put the links in the database. I’ll assume you know the basics of how to connect to a database (If not grab a MySQL & PHP tutorial here).
function storeLink($url,$gathered_from) {
$query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')";
mysql_query($query) or die('Error, insert query failed');
}
Tutorial pages:
|
Originally posted on Makebeta
|
|||||||||
You might also want to check these out:
|
Link to This Tutorial Page!

