Wednesday, March 24, 2010

Converting RSS Feed through URL to XPathNavigator

In one of my project, i was required to read RSS Feed from a website, cache it and then output the RSS feed onto website that i am working on using XSLT.

To accomplish this, I have used open source library or code, they are RSS.Net and CacheHelper

It took me sometime to accomplish these as i run into some problems. They are:

1. XML Document root element missing
2. XML Document element not closed properly which cause memory leakage

After several trials and error, i found out that RssWriter in RSS.Net works fine if you output the xml into a document, however it is not the same case if you were to output it onto a Stream.

So the following code that output RSS onto a file works fine:

var writer = new RssWriter(@"c:\\test.ml");
writer.Write(feed.Channels[0]);
writer.Close();

However the following code that deal with memory stream doesnt work

var rssXmlDoc = new XmlDocument();
var memStream = new MemoryStream();
var writer = new RssWriter(memStream, Encoding.UTF8);
writer.Write(feed.Channels[0]);
writer.Close();
rssXmlDoc.Load(memStream);

-- This will not work, since the stream is already closed

so because of that i thought it should be

rssXmlDoc.Load(memStream)
writer.Close();

-- But this doesnt work as well, because without writer.Close(), the last element is not closed properly.

so I modify the RSS.Net and create a method CloseDocument() that close the element without first closing the writer.

so the code becomes

var rssXmlDoc = new XmlDocument();
var memStream = new MemoryStream();
var writer = new RssWriter(memStream, Encoding.UTF8);
writer.Write(feed.Channels[0]);
writer.CloseDocument();
rssXmlDoc.Load(memStream);
writer.Close();

-- We are almost there. This will still throw an error and the error will be XML Root document element missing

we need to add the following line in red

var rssXmlDoc = new XmlDocument();
var memStream = new MemoryStream();
var writer = new RssWriter(memStream, Encoding.UTF8);
writer.Write(feed.Channels[0]);
writer.CloseDocument();
memStream.Position = 0;
rssXmlDoc.Load(memStream);
writer.Close();

this will move the read/write location back to the start of the stream.

Finally the method that i came out with to get FeedUrl to XPathNavigator is

public class ExternalFeed
{
//Default cache time will be 1 minute
public const int DefaultCacheTime = 60;
public string FeedUrl { get; private set; }
public int CacheTimeInSeconds { get; private set; }

public ExternalFeed(BaseItem definitionItem)
{
if (definitionItem == null)
{
Log.Error("ExternalFeed: definition item is null", this);
return;
}
FeedUrl = definitionItem[Constants.MeltwaterFields.FeedUrl];
var secondString = definitionItem[Constants.MeltwaterFields.CacheTime];
int second;
CacheTimeInSeconds = int.TryParse(secondString, out second) ? second : DefaultCacheTime;
}

///
/// Load data from the external source. to be called after construction
///
/// True if loaded successfully, otherwise false
public bool Load()
{
//Ensure that the URL is valid
if (string.IsNullOrEmpty(FeedUrl))
{
Log.Error("ExternalFeed: Feed Url is empty", this);
return false;
}
var feed = RssFeed.Read(FeedUrl);
CacheHelper.Add(feed, FeedUrl, CacheTimeInSeconds);
return true;
}

///
/// Get the data of the feed
///
/// The feed data
public XPathNavigator GetData()
{
RssFeed feed;
if (!CacheHelper.Get(FeedUrl, out feed)) {
Load();
CacheHelper.Get(FeedUrl, out feed);
}
var rssXmlDoc = new XmlDocument();
var memStream = new MemoryStream();
var writer = new RssWriter(memStream, Encoding.UTF8);
writer.Write(feed.Channels[0]);
writer.CloseDocument();
//This will move the read/write location back to the start of the Stream and allow the XMLDocument to read in the entire contents
memStream.Position = 0;
rssXmlDoc.Load(memStream);
writer.Close();
return rssXmlDoc.CreateNavigator();
}
}

Here is the other code for Closing Document in RSS.NET. The code below is from "RssWriter.cs" class
/// Closes instance of RssWriter.
/// Writes end elements, and releases connections
/// Occurs if the RssWriter is already closed or the caller is attempting to close before writing a channel.
public void Close()
{
writer.Close();
writer = null;
}

///
///
///
public void CloseDocument()
{
if (writer == null)
throw new InvalidOperationException("RssWriter has been closed, and can not be closed again.");
if (!wroteChannel)
throw new InvalidOperationException("Can't close RssWriter without first writing a channel.");
writer.WriteEndElement(); //
writer.WriteEndElement(); // or
writer.Flush();
}

No comments:

Post a Comment