PHP Site Search Made Easy
By Akash Mehta
2008-03-30
A brief crash course on search APIs
In this tutorial, we're going to build a site search system for a
website using the Yahoo search web services. The web services provided
by Yahoo are essentially web-based machine-readable interfaces to
Yahoo's various products. There are quite a few web services made
available; head over to developer.yahoo.com for a full list (scroll down to "Services" in the sidebar).
Let's get RESTful
The web search API, or application programming interface, falls into
the category of RESTful web services - that is, it's an API delivered
over the web that uses the REST protocol. As far as you're concerned,
REST, or REpresentational State Transfer, just involves HTTP and URLs
(URIs, actually) - technologies and concepts you will be familiar with.
To demonstrate, here's a sample URL to access to the search web service:
http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna
A couple of things to note here. First, it's a perfectly normal URL:
you can load it up in your web browser and see the XML returned.
Second, there's clearly a query parameter in the URL that we can change. Here's a sample of what the results might look like:
<ResultSet>
<Result>
<Title>Madonna</Title>
<Summary>
Official site of pop diva Madonna, with news, music, media, and fan club.
</Summary>
<Url>http://www.madonna.com/</Url>
<DisplayUrl>www.madonna.com/</DisplayUrl>
<ModificationDate>1206428400</ModificationDate>
<MimeType>text/html</MimeType>
<Cache><Size>18519</Size></Cache>
</Result>
</ResultSet>
I've removed a fair bit of information, but this is enough to
demonstrate how the information is provided. If you load up a Yahoo
search page and search for "madonna", you'll get the same result. Now,
this is entirely machine readable - it's XML - but we can go one step
further. If we add another parameter to the URL, output, and give it a value of php, we get the following:
a:1:{s:9:"ResultSet";a:6:{s:6:"Result";
a:1:{i:0;a:8:{s:5:"Title";s:7:"Madonna";s:7:"Summary";
s:73:"Official site of pop diva Madonna, with news, music, media, and
fan club.";s:3:"Url";s:23:"http://www.madonna.com/";s:10:"DisplayUrl";
s:16:"www.madonna.com/";s:16:"ModificationDate";i:1206428400;
s:8:"MimeType";s:9:"text/html";s:5:"Cache";a:2:{s:3:"Url";
s:316:"http://uk.wrs.yahoo.com/_ylt=A0Je5VfTHupHWj4AgjbdmMwF;
_ylu=X3oDMTBwOHA5a2tvBGNvbG8DdwRwb3MDMQRzZWMDc3IEdnRpZAM-/SIG=15vvp3oak/EXP=
1206612051/**http%3A//66.218.69.11/search/cache%3Fei=UTF-8%26appid=
YahooDemo%26query=madonna%26results=1%26output=php%26u=
www.madonna.com/%26w=madonna%26d=A_opCvH_Qg-3%26icp=1%26.intl=
us";s:4:"Size";s:5:"18519";}}}}}
Doesn't look like much? Run it through the PHP unserialize() function, and then print_r().
[ResultSet] => Array
(
[Result] => Array
(
[0] => Array
(
[Title] => Madonna
[Summary] => Official site of pop diva Madonna, with news, music, media,
and fan club.
[Url] => http://www.madonna.com/
[DisplayUrl] => www.madonna.com/
[ModificationDate] => 1206428400
[MimeType] => text/html
[Cache] => Array
(
[Size] => 18519
)
)
)
)
Now that looks easy to work with. But how did we get to
this stage? Well, as the APIs are accessed via a simple URL, we can
first fetch the data using file_get_contents(). Now, this will give us the mess of characters we saw earlier. We then run it through unserialize() and finally print_r(). Here's the code:
<?php
$data = file_get_contents('http://search.yahooapis.com/'.
'WebSearchService/V1/webSearch?'.
'appid=YahooDemo&query=madonna'.
'&results=1&output=php');
echo '<pre>'.print_r(unserialize($data),true);
Go ahead, run it on your web server. You'll see roughly the sample above, plus a few extra elements.
Building a real site search system
Now, I've added a results=1 to our previous example, to
cut down on data here, but let's take that out (it will default to 10)
and do something real. Ignoring my multi-line file_get_contents() URL, we can build a functional web search in just five lines of code. You can experiment with that array (go ahead, a foreach works fine), but here's how I did it:
<?php
$data = file_get_contents('http://search.yahooapis.com/'.
'WebSearchService/V1/webSearch?'.
'appid=YahooDemo&query=madonna'.
'&output=php');
$results = unserialize($data);
foreach ($results['ResultSet']['Result'] as $result) {
echo "<h3><a href=\"{$result['Url']}\">{$result['Title']}</a></h3>\n";
echo "<p>{$result['Summary']}</p>\n";
}
Load it up in your web browser or run it via CLI. Provided PHP can
connect to the Yahoo API server, you'll see something like the
following:
<h3><a href="http://www.madonna.com/">Madonna</a></h3>
<p>Official site of pop diva Madonna, with news, music, media, and fan club.</p>
<h3><a href="http://madonnalicious.typepad.com/">madonnalicious</a></h3>
<p>Pictures, articles, downloads, concert info, news, and more about Madonna.</p>
<h3><a href="http://www.myspace.com/madonna">MySpace.com -
Madonna - Pop / Rock - www.myspace.com/madonna</a></h3>
<p>Madonna MySpace page with news, blog, music downloads, desktops, wallpapers, and more.</p>
But wait - we're building a specific site search here, and chances
are you aren't terribly interested in Madonna. The web service has yet
another parameter up its sleeve: site (unsurprisingly).
Let's say I was building a site search for engadget.com, and I needed
to give users a way to actually choose what to search for. First, we
set the site parameter to engadget.com, and then we set
the actual query to a user supplied value. We'll use a simple form for
the user to enter their search query, and then pass it to the Yahoo
APIs from $_GET. Here's what I came up with:
<form action="" method="get">
<input type="text" name="q" /><input type="submit" />
</form>
<?php
if (isset($_GET['q'])) {
$q = $_GET['q'];
$data = file_get_contents('http://search.yahooapis.com/'.
'WebSearchService/V1/webSearch?'.
'appid=YahooDemo&query='.$q.
'&output=php&site=engadget.com');
$results = unserialize($data);
foreach ($results['ResultSet']['Result'] as $result) {
echo "<h3><a href=\"{$result['ClickUrl']}\">{$result['Title']}</a></h3>\n";
echo "<p>{$result['Summary']}</p>\n";
}
}
One more thing to note here - ClickUrl. If you noticed the output of
the array we unserialized earlier, you would have seen the 'ClickUrl'
parameter. It's rather long, and not terribly interesting, so I've left
it out of the demonstrations, but when sending a user to a link you
fetch from the Yahoo services, you should be using the ClickUrl
parameter and not just Url. By using ClickUrl, the great folks at Yahoo
can analyse how to improve their engine to improve the quality of their
search service - which is good for everyone. When you send a user to
the ClickUrl, it is hosted at Yahoo but it will send the user right
back to the normal Url.
Anyway, this is not the most elegant solution, but probably one of
the simplest. Load it up in your web browser and search for 'iphone',
your form and first result will look something like this:
... history -- and that's saying a lot -- the iPhone has been
announced today. ... partnership with Yahoo will allow all iPhone
customers to hook up with free push ...
Compare that to searching for iphone site:engadget.com in a normal Yahoo search page:

Essentially, Yahoo just gave you the full power of their search system.
A note on application IDs
You might have noticed the appid parameter in our calls
to the Yahoo web service. This parameter represents the application ID,
and allows Yahoo to identify your application from everyone else's.
While just testing, it's okay to use the 'YahooDemo' application ID,
but when you go to build a real application you should register it with Yahoo.
If something goes wrong with your application, they may need to shut
it down entirely and cut off your access to the APIs. By registering
with Yahoo, you provide them with some basic contact details and
details of your application. If they see something wrong with queries
coming from your application, they can then easily work out that you
are in charge of the application, and contact you before taking any
actions that might break your code.
Tutorial Pages:
»
Why site search?
» A brief crash course on search APIs
»
Building a real site search system
»
Further reading