News Aggregators As Denial of Service Clients

Every once in a while I see a developer of a news aggregator that decides to add a 'feature' that unnecessarily chomps down the bandwidth of a web server in a manner one could classify as rude. The first I remember was Syndirella which had a feature that allowed you to syndicate an HTML page then specify regular expressions for what parts of the feed you wanted it to treat as titles and content. There are three reasons I consider this rude,

  • If a site hasn't put up an RSS feed it may be because they don't want to deal with the bandwidth costs of clients repeatedly hitting their sites on behalf of a few users
  • An HTML page is often larger than the corresponding RSS feed. The Slashdot RSS feed is about 2K while just the raw HTML of the front page of slashdot is about 40K
  • An HTML page could change a lot more often than the RSS feed [e.g. rotating ads, trackback links in blogs, etc] in situations where an RSS feed would not