RSS Tutorial

Introducing RSS

Think about all of the information that you access on the Web on a
day-to-day basis; news headlines, search results, “What’s New”, job
vacancies, and so forth. A large amount of this content can be thought of as
a list; although it probably isn’t in HTML <li> elements,
the information is list-oriented.

Most people need to track a number of these lists, but it becomes
difficult once there are more than a handful of sources. This is because they
have to go to each page, load it, remember how it’s formatted, and find where
they last left off in the list.

RSS is an XML-based format that allows the syndication
of lists of hyperlinks, along with other information, or metadata,
that helps viewers decide whether they want to follow the link.

This allows peoples’ computers to fetch and understand the information, so
that all of the lists they’re interested in can be tracked and personalized
for them. It is a format that’s intended for use by computers on behalf of
people, rather than being directly presented to them (like HTML).

To enable this, a Web site will make a feed, or channel,
available, just like any other file or resource on the server. Once a feed is
available, computers can regularly fetch the file to get the most recent
items on the list. Most often, people will do this with an
aggregator, a program that manages a number of lists and presents
them in a single interface.

Feeds can also be used for other kinds of list-oriented information, such as
syndicating the content itself (often weblogs) along with the links.
However, this tutorial focuses on the use of RSS for syndication of links.

What’s in a feed?

A feed contains a list of items or entries, each of which is
identified by a link. Each item can have any amount of other metadata associated
with it as well.

The most basic metadata for an entry includes a title for the link and
a description of it; when syndicating news headlines, these fields might be
used for the story title and the first paragraph or a summary, for example.
For example, a simple entry might look like;

<item>
<title>Earth Invaded</title>
<link>http://news.example.com/2004/12/17/invasion</link>
<description>The earth was attacked by an invasion fleet
from halfway across the galaxy; luckily, a fatal
miscalculation of scale resulted in the entire armada
being eaten by a small dog.</description>
</item>

Additionally, the feed itself can have metadata associated with it, so
that it can be given a title (e.g., “Bob’s news headlines”), description, and
other fields like publisher and copyright terms.

For an idea of what full feeds look like, see ‘RSS
Versions and Modules
’.

How do people use feeds?

Aggregators are the most common use of feeds, and there are several
types. Web aggregators (sometimes called portals) make this view available in
a Web page; my Yahoo is a well-known
example of this. Aggregators have also been integrated into e-mail clients,
users’ desktops, or standalone, dedicated software.

Aggregators can offer a variety of special features, including combining
several related feeds into a single view, hiding entries that the viewer has
already seen, and categorizing feeds and entries.

Other uses of feeds include site tracking by search engines and other
software; because the feed is machine-readable, the search software doesn’t
have to figure out which parts of the site are important and which parts are
just the navigation and presentation. You may also choose to allow people to
republish your feeds on their Web sites, giving them the ability to represent
your content as they require.

Why should I make a feed available?

Your viewers will thank you, and there will be more of them, because it
allows them to see your site without going out of their way to visit.

While this seems bad at first glance, it actually improves your site’s
visibility; by making it easier for your users to keep up with your site —
allowing them to see it the way they want to — it’s more likely that they’ll
know when something that interests them is available on your site.

For example, imagine that your company announces a new product or feature
every month or two. Without a feed, your viewers have to remember to come to
your site and see if they find anything new — if they have time. If you
provide a feed for them, they can point their aggregator or other software at
it, and it will give them a link and a description of developments at your
site almost as soon as they happen.

News is similar; because there are so many sources of news on the
Web, most of your viewers won’t come to your site every day. By
providing a feed, you are in front of them constantly, improving the
chances that they’ll click through to an article that catches their eye.

But isn’t that giving away my content?

No! You still retain copyright on your content (if you wish to).

You also control what information is syndicated
in the feed, whether it’s a full article or just a teaser. Your content can
still be protected by your current access control mechanisms; only the links
and metadata are distributed. You can also protect the RSS feed itself with
SSL encryption and HTTP username/password authentication too, if you’d
like.

In many ways, syndication is similar to the subscription newsletters that many
sites offer to keep viewers up-to-date. The big difference is that they don’t
have to supply an e-mail address, lowering the barrier of privacy concerns,
while still giving you a direct channel to your viewers. Also, they get to
see the content in the manner that’s most convenient to them, which means
that you get more eyes looking at your content.

Choosing Content for Your Feeds

Any list-oriented information on your site that your viewers might be
interested in tracking or reusing is a good candidate for a feed. This
can encompass news headlines and press releases, job listings, conference
calendars and rankings (like ‘top 10’ lists).

For example;

  • News & Announcements – headlines, notices and any
    list of announcements that are added to over time
  • Document listings – lists of added or changed pages,
    so that people don’t need to constantly check for different content
  • Bookmarks and other external links – while most people
    use RSS for sharing links from their own sites, it’s a natural fit for
    sharing lists of external links
  • Calendars – listings of past or upcoming events,
    deadlines or holidays
  • Mailing lists – to compliment a Web-based archive of
    public or private e-mail lists
  • Search results – to let people track changing or new
    results to their searches
  • Databases – job listings, software releases, etc.

While it’s a good start to have a “master feed” for your site that lists
recent news and events, don’t stop there. Generally, each area of your site
that features a changing list of information should have a corresponding
feed; this allows viewers to precisely target their interests.

For example, if your news site has pages for World news, national news,
local news, business, sports, etc., there should be a feed for each of these
sections.

If your site offers a personalized view of data (e.g., people can choose
categories of information that will show up on their home page), offer this
as a feed, so that the viewers’ Web pages match the content of their
feeds.

A great example of this is the variety of feeds that Netflix provides; not only
can you keep track of new releases, but also personalised reccommendations and
even a listing of the movies in your queue.

Another good example is Apple’s
iTunes
Music Store RSS feed generator
; you can customize it based on your
preferences, and the views it allows match those provided in the Music
Store itself.

Finally, remember that feeds are just as — if not more — useful on an
Intranet as they are on the Internet. Syndication can be a powerful tool for sharing
and integrating information inside a company.

Publishing Your Feed

There are a number of ways to generate a feed from your content. First of
all, explore your content management system – it might already have an option
to generate an RSS feed.

If that option isn’t available, you have a number of choices;

  • Self-scraping — The easiest way to publish a feed from
    existing content. Scraping tools fetch your Web page and pull
    out the relevant parts for the feed, so that you don’t have to change
    your publishing system. Some use regular expressions or XPath
    expressions, while others require you to mark up your page with minimal
    hints (usually using <div> or <span> tags) that help it
    decide what should be put into the feed.
  • Feed integration — If your site is dynamically
    generated (using languages like Perl, Python or PHP), it may have a RSS
    library available, so that you can integrate the feed into your
    publishing process.
  • Starting with the feed — Alternatively, you can manage
    the list-oriented parts of your content in the RSS feed itself, and
    generate your Web pages (as well as other content, like e-mail lists)
    from the feed. This has the advantage of always having the correct
    information in the feed, and tools like XSLT make this option easy,
    especially if you’re starting from scratch.
  • Third party scraping — If none of these options work
    for you, some people on the Web will scrape your site for you and make
    the feed available. Be warned, however, that this is never as reliable or
    accurate as doing it yourself, because they don’t know the details of
    your content or your system. Also, using third parties introduces another
    point of failure in the delivery process; problems there (network, server
    or business) will cause your feed to be unavailable.

For more information about all of these options, see “Feed Tools” and “More Information”.

Telling People About Your Feed

An important step after publishing a feed is letting your viewers know
that it exists; there are a lot of feeds available on the Web now, but it’s
hard to find them, making it difficult for viewers to utilize them.

Pages that have an associated RSS feed should clearly indicate this to
viewers by using a link containing like ‘RSS feed’. For example,

<a type="application/rss+xml" href="feed.rss">RSS feed for this page</a>

where ‘feed.rss’ is the URL for the feed. the ‘type’ attribute tells
browsers that this is a link to an RSS feed in a way that they understand.

Additionally, some programs look for a link in the <head> section of
your HTML. To support this, include a <link> tag;

<head>
<title>My Page</title>
<link rel="alternate" type="application/rss+xml"
href="feed.rss" title="RSS feed for My Page">
</head>

These links should be placed on the Web page that is most similar to the
feed content; this enables people to find them as they browse.

Note that Atom feeds should use application/atom+xml rather than
application/rss+xml in both styles of use.

Finally, there are a number of guides and registries for RSS feeds that
people can search and browse through, much like the Yahoo directory for Web
sites; it’s a good idea to register your feed; see More Information.

Format Versions and Modules

There are a number of different versions of the RSS format in use today,
but the main choices are RSS 1.0 and RSS 2.0. Each version has its
benefits and drawbacks; RSS 2.0 is known for its simplicity, while RSS 1.0
is more extensible and fully specified. Both formats are XML-based and have
the same basic structure.

There’s one more choice; Atom is an effort in the IETF (an Internet
standards body) to come up with a well-documented, standard syndication format.
Although it has a different name, it has the same basic functions as RSS, and
many people use the term “RSS” to refer to RSS or Atom syndication.

This section presents a quick
overview of each; for more information, see their specifications and
supporting materials.

RSS 2.0

RSS 2.0 is championed by UserLand’s Dave
Winer. In this version, RSS stands for “Really Simple Syndication,” and
simplicity is its focus.

This branch of RSS is based on RSS 0.91, which was first documented
at Netscape
and later refined by Userland.

Included in 2.0.1 – the
latest stable version of this branch — are channel metadata like link,
title, description; image, which
allows you to specify a thumbnail image to display with the feed);
webMaster and managingEditor, to identify who’s
responsible for the feed, and lastBuildDate, which shows when
the feed was last updated.

Items have the standard link,
title and description metadata, as
well as other, more experimental facilities like enclosure,
which allows attachments to be automatically downloaded (don’t expect these
features to be supported by all aggregators, however). Finally, items can have
a guid element that identifies the item uniquely; this allows some
advanced functionality in some aggregators.

Here’s an example of a minimal RSS 2.0 feed:

<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Example Channel</title>
<link>http://example.com/</link>
<description>My example channel</description>
<item>
<title>News for September the Second</title>
<link>http://example.com/2002/09/01</link>
<description>other things happened today</description>
</item>
<item>
<title>News for September the First</title>
<link>http://example.com/2002/09/02</link>
</item>
</channel>
</rss>

In the RSS 2.0 roadmap, Winer
states that this branch is, for all practical purposes, frozen, except for
clarifications to the specification.

However, exensions to the format are allowed in separate modules, using
XML
Namespaces
to avoid conflicts in their names. For example, if you had an ISBN
module to track books, it might look like this;

<item xmlns:book="http://namespace.example.com/book/1.0"
rdf:about="http://www.amazon.com/exec/obidos/tg/detail/-/0553575376">
<title>Excession</link>
<link>http://www.amazon.com/exec/obidos/tg/detail/-/0553575376</link>
<book:isbn>0553575376</book:isbn>
</item>

Generally, though, you should look for available RSS Modules, rather than
defining your own, unless you’re sure that what you need doesn’t exist.

RSS 1.0

RSS 1.0 stands for “RDF Site Summary.” This flavor of RSS incorporates
RDF, a Web standard for
metadata. Because RSS 1.0 uses RDF, any RDF processor can understand RSS
without knowing anything about it in particular. This allows syndicated
feeds to easily become part of the Semantic Web.

RSS 1.0 also uses XML Namespaces to allow extensions, in a manner similar to RSS 2.0.

RSS 1.0 feeds look very similar to RSS 2.0 feeds, with a few key
differences;

  • The entire feed is wrapped in <rdf:RDF>
    </rdf:RDF> elements (so that processors know that it’s
    RDF)
  • Each <item> has an rdf:about attribute
    that usually, but not always, matches the <link>; this
    assigns an identifier to each item
  • There’s an <items> element in the channel metadata
    that contains a list of items in the channel, so that RDF processors can
    keep track of the relationship between the items
  • Some metadata uses the rdf:resource attribute to carry
    links, instead of putting it inside the element.

RSS 1.0 is developed and maintained by an ad hoc group of interested
people; see their Web site for more
information about RSS 1.0 and RSS Modules
. See below for an example of an
RSS 1.0 feed.

Dublin Core Module

The most well-known example of an RSS 1.0 Module is the Dublin Core Module. The
Dublin Core is a set of metadata developed by librarians and
information scientists that standardizes a set of common metadata that is
useful for describing documents, among other things. The Dublin Core Module
uses these metadata to attach information to both feeds (in the channel
metadata) and to individual items.

This module includes useful elements like dc:date, for
associating dates with items, dc:subject, which can be useful
for categorizing items or feeds, and dc:rights, for dictating
the intellectual property rights associated with an item or a feed.

Here’s an example of a minimal RSS 1.0 feed that uses the Dublin Core
Module:

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<channel rdf:about="http://example.com/news.rss">
<title>Example Channel</title>
<link>http://example.com/</link>
<description>My example channel</description>
<items>
<rdf:Seq>
<rdf:li resource="http://example.com/2002/09/01/"/>
<rdf:li resource="http://example.com/2002/09/02/"/>
</rdf:Seq>
</items>
</channel>
<item rdf:about="http://example.com/2002/09/01/">
<title>News for September the First</title>
<link>http://example.com/2002/09/01/</link>
<description>other things happened today</description>
<dc:date>2002-09-01</dc:date>
</item>
<item rdf:about="http://example.com/2002/09/02/">
<title>News for September the Second</title>
<link>http://example.com/2002/09/02/</link>
<dc:date>2002-09-02</dc:date>
</item>
</rdf:RDF>

As you can see, RSS 1.0 is a bit more verbose than 2.0, mostly because it
needs to be compatible with other versions of RSS while containing the markup
that RDF processors need.

Atom

Both RSS 1.0 and 2.0 are informal specifications; that is, they aren’t
published by a well-known standards body or industry consortium, but instead by a
small group of people.

Some people are concerned by this, because such specifications can be changed
at the whim of the people who control it. Standards bodies bring stability, by
limiting change and having well-established procedures for introducing it. To
introduce such stability to syndication, a group of people established an IETF
Working Group to standardise a format called Atom.

Atom is functionally similar to both branches of RSS, and is also an XML-based
format.

For example;

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Example Feed</title>
<link href="http://example.org/"/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
<name>John Doe</name>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

<entry>
<title>Atom-Powered Robots Run Amok</title>
<link href="http://example.org/2003/12/13/atom03"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<summary>Some text.</summary>
</entry>
</feed>

As you can see, Atom has a feed element that contains both the feed-level
metadata as well as the entrys (analogous to RSS’ items),
and entry can contain similar metadata, such as title,
link, id (instead of RSS 1.0’s rdf:about or RSS 2.0’s
guid), and a short textual summary (instead of RSS’
description
).

Generally, Atom isn’t as widely supported as RSS 1.0 or 2.0 right now, because it’s relatively new.
However, it should catch up quickly, because of the broad base of vendors supporting the standardisation
effort.

Which Format Should I Choose?

One of the most confusing and unfortunate problems in syndication is the
large number of formats in use. In addition to those listed above, there are
many other formats (e.g., RSS 0.9, 0.91, 0.92) that are commonly encountered on
the Web.

For better or worse, the decision isn’t as critical as you might think. Most
aggregators and other software use syndication libraries which abstract
out the particular format that a feed is in, so that they can consume any popular
syndication feed.

As a result, which format to choose is a matter of personal taste. RSS 1.0 is
very extensible, and useful if you want to integrate it into Semantic Web systems.
RSS 2.0 is very simple and easy to author by hand. Atom is now an IETF Standard,
bringing stability and a natural community to support its use.

Tips for Generating Good Feeds

RSS and Atom are easy to work with, but like any new format, you may encounter some
problems in using them. This section attempts to address the most common issues
that arise when generating a feed.

  • Distinct Entries — Make sure that aggregators can tell
    your entries apart, by using different identifiers in rdf:about (RSS 1.0),
    guid (RSS 2.0) and id (Atom). This will save a lot of
    headaches down the road.
  • Meaningful Metadata — Try to make the metadata useful
    on its own; for example, if you only include a short
    <title>, people may not know what the link is about.
    By the same token, if you shove an entire article into
    <description>, it’ll crowd people’s view of the feed,
    and they’re less likely to stay interested in what you have to say.
    Generally, you want to put enough into the feed to help someone decide
    whether they should follow the link.
  • Encoding HTML — Although it’s tempting, refrain from
    including HTML markup (like <a href="...">,
    <b> or <p>) in your RSS feed;
    because you don’t know how it will be presented, doing so can prevent
    your feed from being displayed correctly. If you need to include a a tag
    in the text of the feed (e.g., the title of an entry is “Ode to
    <title>”), make sure you escape ampersands and angle brackets (so
    that it would be “Ode to &lt;title&gt;”).
  • XML Entities — Remember that XML doesn’t predefine
    entities like HTML does; therefore, you won’t have
    &nbsp; &copy; and other common entities
    available. You can define them in the XML, or alternatively just use an
    character encoding that makes what you need available.
  • Character Encoding — Some software generates feeds
    using Windows character sets, and sometimes mislabels them. The safest
    thing to do is to encode your feed as UTF-8 and check it by parsing it
    with an XML parser.
  • Communicating with Viewers — Don’t use entries in your
    feed to communicate to your users; for example, some feeds have been
    known to use the <description> to dictate copyright
    terms. Use the appropriate element or module.
  • Communicating with Machines — Likewise, use the
    appropriate HTTP status codes if your feed has relocated (usually,
    301 Moved Permanently) or is no longer available (410
    Gone
    or 404 Not Found).
  • Making your Feed Cache-Friendly — Successful feeds
    see a fair amount of traffic because clients poll them often to see if
    they’ve changed. To support the load, Web Caching can help; see the caching tutorial.
  • Validate — use the Feed Validator to catch any problems in your feed; it works
    with RSS and Atom. Also, don’t just run it once; make sure you regularly check
    your feed, so that you can catch transient errors.

Feed Tools

This is an incomplete list of tools for creating feeds and checking
them to make sure that you’ve done so correctly. Note that there are many
more libraries that help parsing feeds; these haven’t been included here
because this tutorial focuses on the Webmaster, not consumers of feeds.

  • xpath2rss
    — Tool for scraping Web sites using XPath expressions (a method of
    selecting parts of HTML and XML documents).
  • Site Summaries
    in XHTML
    — Online service (also available as an XSLT
    stylesheet) that uses hints in your HTML to generate a feed.
  • myRSS — An online,
    third-party automated scraping service. Doesn’t require any special
    markup.
  • RSS.py
    — Python library for generating and parsing RSS.
  • ROME
    Java library for parsing and generating RSS and Atom feeds, as well as translating between formats.
  • XML::RSS
    — Perl module for generating and parsing RSS.
  • Online Validator
    Check your RSS 1.0, 2.0 and Atom feeds.

More Information

  • Syndicated content
    Good list of best practices for creating an RSS feed.
  • Syndic8 — A
    community effort to gather, validate and search feeds with lots of other
    information.
  • RSS Workshop
    A well-regarded introduction to publishing RSS feeds, from the state of
    Utah Online Services division.
  • RSS
    Devcenter
    — O’reilly’s Web portal for all things RSS.
2010-05-25T22:46:36+00:00 December 23rd, 2007|Miscellaneous|0 Comments

About the Author:

Mark Nottingham is a Principal Technical Yahoo!, putting together Web-based infrastructure for sites like Yahoo! Finance, Sports, Tech, TV and Movies.

He has spent the last decade designing, debugging, serving and caching Web content, with past stints at Merrill Lynch, Akamai and BEA Systems, along with scars from writing specifications like the Atom Syndication Format, WS-Policy and the WS-I Basic Profile, and chairing both IETF and W3C Working Groups.

Right now, his focus is on using HTTP for what the rest of the industry calls Web Services.

Leave A Comment