|
Helping ordinary people create extraordinary websites! |
Scraping Links With PHPBy Justin Laing2008-01-06
Tip: Fake Your User Agent Many websites won’t play nice with you if you come knocking with the wrong User Agent string. What’s a User Agent string? It’s part of every request to a web server that tells it what type of agent (browser, spider, etc) is requesting the content. Some websites will give you different content depending on the user agent, so you might want to experiment. You do this in cURL with a call to curl_setopt() with CURLOPT_USERAGENT as the option:
This would set cURL’s user agent to mimic Google’s. You can find a comprehensive list of user agents here: User Agents. Search Engine User Agents
Browser User Agents
Tutorial Pages: » Scraping Links With PHP » Get The Page Content » Tip: Fake Your User Agent » Using PHP’s DOM Functions To Parse The HTML » XPath Makes Getting The Links You Want Easy » Iterate And Store Your Links » Your Completed Link Scraper » What Else Could I Do With This Thing? » Is Scraping Content Legal? Originally posted on Makebeta |
|