Chris's coding blog

Simplified C# Atom and Rss feed parser

February 08, 2010

.NET already has quite a few open source RSS and ATOM libraries for parsing feeds. The most complete one is Argotic but there are also 3-4 others on codeplex and google code.

Update
.NET 4 now has built in RSS support in the framework in the System.ServiceModel.Syndication namespace.

A lot of the time these APIs are a bit overkill and bulky for what you need - the Link, Title, Content and Publish date. With LINQ-to-XML this is easily achievable with the the XDocument object and some basic node searching.

The only drawback with this method is it doesn’t remove the namespaces from the nodes, which is why there are Where lookups going on instead of straight XName comparisons. When you have a namespace declaration in your XML document at the root, all XElement nodes than get called “{namespace}node”. XElement has an overriden comparison operator to compare with a string, but unfortunately it uses this full name instead of its LocalName property, which would just be “Node”.

I haven’t done any performance comparisons but I will make a wild guess that it’s faster than most feed APIs for parsing, as it ignores all the unwanted nodes. For downloading the feed versus a custom HttpWebRequest it may be slower as it uses the XDocument.Load method.

Usage

FeedParser parser = new FeedParser();
var items = parser.Parse("http://www.ft.com/rss/home/uk",FeedType.RSS);

Source

Firstly the Item class for holding a Feed’s metadata. You might want to add a Site property to this for grouping per site.

/// <summary>
/// Represents a feed item.
/// </summary>
public class Item
{
public string Link { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public DateTime PublishDate { get; set; }
public FeedType FeedType { get; set; }
public Item()
{
Link = "";
Title = "";
Content = "";
PublishDate = DateTime.Today;
FeedType = FeedType.RSS;
}
}
view raw gistfile1.cs hosted with ❤ by GitHub

Any bad publish dates are turned into DateTime.Min. The idea behind the virtual methods is that you can subclass and do something more. At present though these methods don’t return an Item but the entire list so there isn’t much gained from overriding them. Sometime in the future I will probably add it.

/// <summary>
/// A simple RSS, RDF and ATOM feed parser.
/// </summary>
public class FeedParser
{
/// <summary>
/// Parses the given <see cref="FeedType"/> and returns a <see cref="IList&amp;lt;Item&amp;gt;"/>.
/// </summary>
/// <returns></returns>
public IList<Item> Parse(string url, FeedType feedType)
{
switch (feedType)
{
case FeedType.RSS:
return ParseRss(url);
case FeedType.RDF:
return ParseRdf(url);
case FeedType.Atom:
return ParseAtom(url);
default:
throw new NotSupportedException(string.Format("{0} is not supported", feedType.ToString()));
}
}
/// <summary>
/// Parses an Atom feed and returns a <see cref="IList&amp;lt;Item&amp;gt;"/>.
/// </summary>
public virtual IList<Item> ParseAtom(string url)
{
try
{
XDocument doc = XDocument.Load(url);
// Feed/Entry
var entries = from item in doc.Root.Elements().Where(i => i.Name.LocalName == "entry")
select new Item
{
FeedType = FeedType.Atom,
Content = item.Elements().First(i => i.Name.LocalName == "content").Value,
Link = item.Elements().First(i => i.Name.LocalName == "link").Attribute("href").Value,
PublishDate = ParseDate(item.Elements().First(i => i.Name.LocalName == "published").Value),
Title = item.Elements().First(i => i.Name.LocalName == "title").Value
};
return entries.ToList();
}
catch
{
return new List<Item>();
}
}
/// <summary>
/// Parses an RSS feed and returns a <see cref="IList&amp;lt;Item&amp;gt;"/>.
/// </summary>
public virtual IList<Item> ParseRss(string url)
{
try
{
XDocument doc = XDocument.Load(url);
// RSS/Channel/item
var entries = from item in doc.Root.Descendants().First(i => i.Name.LocalName == "channel").Elements().Where(i => i.Name.LocalName == "item")
select new Item
{
FeedType = FeedType.RSS,
Content = item.Elements().First(i => i.Name.LocalName == "description").Value,
Link = item.Elements().First(i => i.Name.LocalName == "link").Value,
PublishDate = ParseDate(item.Elements().First(i => i.Name.LocalName == "pubDate").Value),
Title = item.Elements().First(i => i.Name.LocalName == "title").Value
};
return entries.ToList();
}
catch
{
return new List<Item>();
}
}
/// <summary>
/// Parses an RDF feed and returns a <see cref="IList&amp;lt;Item&amp;gt;"/>.
/// </summary>
public virtual IList<Item> ParseRdf(string url)
{
try
{
XDocument doc = XDocument.Load(url);
// <item> is under the root
var entries = from item in doc.Root.Descendants().Where(i => i.Name.LocalName == "item")
select new Item
{
FeedType = FeedType.RDF,
Content = item.Elements().First(i => i.Name.LocalName == "description").Value,
Link = item.Elements().First(i => i.Name.LocalName == "link").Value,
PublishDate = ParseDate(item.Elements().First(i => i.Name.LocalName == "date").Value),
Title = item.Elements().First(i => i.Name.LocalName == "title").Value
};
return entries.ToList();
}
catch
{
return new List<Item>();
}
}
private DateTime ParseDate(string date)
{
DateTime result;
if (DateTime.TryParse(date, out result))
return result;
else
return DateTime.MinValue;
}
}
/// <summary>
/// Represents the XML format of a feed.
/// </summary>
public enum FeedType
{
/// <summary>
/// Really Simple Syndication format.
/// </summary>
RSS,
/// <summary>
/// RDF site summary format.
/// </summary>
RDF,
/// <summary>
/// Atom Syndication format.
/// </summary>
Atom
}
view raw gistfile1.cs hosted with ❤ by GitHub

csharp

I'm Chris Small, a software engineer working in London. This is my tech blog. Find out more about me via GithubStackoverflowResume