|
Helping ordinary people create extraordinary websites! |
Scraping Links With PHPBy Justin Laing2008-01-06
Get The Page Content cURL is a great tool for making requests to remote servers in PHP. It can imitate a browser in pretty much every way. Here’s the code to grab our target site content:
If the request is successful $html will be filled with the content of $target_url. If the call fails then we’ll see an error message about the failure.
This line determines what URL will be requested. For example if you wanted to scrape this site you’d have $target_url = “http://www.merchantos.com/makebeta/”. I won’t go into the rest of the options that are set (except for CURLOPT_USERAGENT - see below). You can read an in depth tutorial on PHP and cURL here. Tutorial Pages: » Scraping Links With PHP » Get The Page Content » Tip: Fake Your User Agent » Using PHP’s DOM Functions To Parse The HTML » XPath Makes Getting The Links You Want Easy » Iterate And Store Your Links » Your Completed Link Scraper » What Else Could I Do With This Thing? » Is Scraping Content Legal? Originally posted on Makebeta |
|