Google is now letting web sites submit an xml file that lists urls and some information about how often the pages change, and how important the page is relative to other pages. Basically, it gets you to do part of the work for them – which we would hope helps everyone.

I do wish Google would add to this to include at least a "not about" property. I realize that Google isn’t going to let anyone tell them what a page IS about, but a "not about" property can’t really be abused as easily and could help their accuracy in search results.

Google provides a Python script that can produce the file for your site; I wrote a Perl script that does the same:


@stuff=`find . -type f -name "*.html"`;
print O <<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
foreach (@stuff) {
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks)=stat $rfile;
$year +=1900;
$freq="daily" if /index.html/;
$priority="1.0" if /index.html/;

print O <<EOF;
print O <<EOF;
close O;
system("gzip sitemap");

Season to taste.. see https://www.google.com/webmasters/sitemaps/

2010-05-26T10:53:59+00:00 June 16th, 2005

