Helping ordinary people create extraordinary websites!

Scraping Links With PHP

By Justin Laing
2008-01-06

Iterate And Store Your Links

Next we’ll iterate through all the links we’ve gathered using XPath and store them in a database. First the code to iterate through the links:

for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
storeLink($url,$target_url);
}

$hrefs is an object of type DOMNodeList and item() is a function that returns a DOMNode object for the specified index. The index can be between 0 and $hrefs->length. So we’ve got a loop that retrieves each link as a DOMNode object.

$url = $href->getAttribute('href');

DOMNodes inherit the getAttribute() function from the DOMElement class. getAttribute() returns any attribute of the node (in this case an tag with the href attribute). Now we’ve got our URL and we can store it in the database.

We’ll want a database table that looks something like this:

CREATE TABLE `links` (
`url` TEXT NOT NULL ,
`gathered_from` TEXT NOT NULL ,
`time_stamp` TIMESTAMP NOT NULL
);

We’ll a storeLink() function to put the links in the database. I’ll assume you know the basics of how to connect to a database (If not grab a MySQL & PHP tutorial here).

function storeLink($url,$gathered_from) {
$query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')";
mysql_query($query) or die('Error, insert query failed');
}




Tutorial pages:

Originally posted on Makebeta


 5 Votes

You might also want to check these out:


Leave a Comment on "Scraping Links With PHP"
You must be logged in to post a comment.

Link to This Tutorial Page!


GET OUR NEWSLETTERS