Technical Article: Generating RSS Using Virtuoso and SQL/XML

Difficulty Level: Intermediate

SUMMARY: Organizations everywhere are looking to embrace XML technologies but are faced with the  arduous task of moving across their entire existing technologies. One of these problem areas is in the case of Relational Databases. There are now choices available for organizations looking at embracing both XML and their RDBMS. It does not involve moving away from your legacy systems, instead it means your Database administrator simply needs to learn a few more SQL operators.

Related Articles:

o        Create a Dynamic Website Using Virtuoso & VSP

Topics Covered:

o        VSP

o                    SQL/XML(SQLX)

o        WebDAV

o                    Event Scheduler

o        XML

o                    Stored Procedures & Triggers

One possible reason why organizations have still not wholly deployed XML, in spite of its versatility, across an organisation, can be attributed to their existing internal infrastructures, in particular legacy databases. A vast amount of business data is still stored within expensively assembled relational databases.

The demand for these systems is sill high, howeverthe question of how to represent this data, taking advantage of XML so that it can be consumed by various clients is still proving a headache for many. Furthermore, there is a growing need for disparate systems to exchange data at both a high and low level which makes the link between accessing content stored within relational databases a priority The way non traditional data (documents) is being produced in XML format puts relational data to shame.

There are several methods that can be used to retrieve XML data from relational databases. This paper will discuss one such method. SQL-XML, or SQLX or its most recent incarnation, SQL/XML (for the purpose of this paper we will use SQLX) is a simple process of running SQL statements and retrieving the results in XML as opposed to the common method of ASCII data. This is made possible by the addition of SQL operators which when combined with normal SQL syntax produce the desired results.

Figure 1 shows the five operators currently in operation:

Figure 1: SQL-XML(SQLX) Operators

Function/Operator

Description

XMLElement()

Creates an XML Element

XMLAttributes()

to XMLElement. Attributes are specified by an expression and an optional alias or names expression

XMLForest()

Creates an XML Fragment from passed-in components

XMLConcat()

Takes as input a series of XMLType instances, concatenates the series of elements for each row, and returns the concatenated series

XMLAGG()

An aggregate function that creates a forest of XML elements from a collection of XML elements

 

Please refer to the Virtuoso Function Guide documentation for more information on these operators

To demonstrate SQLX, we will reference the tables and content provided in the How-to create a dynamic Online webpage using Virtuoso & VSP Article. The table schemas, source code and sample data are all provided in a zip file.

This system will be used to to insert a daily syndication of news and information to customers via RSS feeds. RSS feeds. (Really Simple Syndication.)

Stage 1: Getting Started

In Figure 2, shows a diagrammatic representation of the stages involved when generating our RSS feeds using SQLX:

From the left of the diagram, we can follow the different stages involved from when we first insert our data via a web browser right through to the content been delivered as a RSS feed. The blue box represents the Virtuoso Server which can and hosts both the local and remote tables that contain our data

Figure 2:

The first stage involves generating the content which will make up the RSS feed. This can be done in two ways.

          i.            A feature of Virtuoso is its SQL-92 database. This means, it can host the content within its own database.

         ii.            Alternatively, if the content already exists in a different location, i.e. a remote database. Using Virtuoso's Virtual database capabilities, it is possible to link in these tables from remote database (Oracle, SQLServer, DB2, MySQL, Postgres, Progress etc... using ODBC, JDBC or OLE-DB)

Furthermore, the content from these tables does not have to be from a single remote source. It is possible to mix and match different database types simultaneously provided there is a common link between them. This is because, once linked into Virtuoso, they are seen as as local tables.

For this project, we will be using the local Virtuoso tables created earlier.

Stage 2: Creating the Results Page

In this stage we need to create the results page. The purpose of this page is to provide a template which will be the basis for the RSS feeds. As we're catering for two versions of RSS, v0.91 and v2.0, there will be two output pages. These pages will contain the SQLX functionality plus the logic required to produce the XML content.

Using the Virtuoso PL as the basis, a breakdown of the code is as follows:

Figure 3: XML and Variable Declaration.

<?xml version=”1.0”?>

<?vsp

  declare theserver, _chanid varchar;

  declare _xml any;

 

  http_header ('Content-type: text/xml'); 

  theserver := left(soap_current_url(), strrchr(ltrim(soap_current_url(), 'http://'), '/'));

  _chanid := get_keyword('channel', params, '');

  _xml := string_output();

In Figure 3 we first add our XML declaration. This is always the first item in any XML document. We next declare our variables along with their datatypes. At this point we're adding Virtuoso PL logic(vsp) so this needs to be within a vsp section.

 

The http_header function is used to add additional HTTP header lines to the server response. In this context, as we are dealing with XML content, we give a content type of XML which the Virtuoso server will use as default. This also means Virtuoso WebDAV will identify the mime-type and be able to use the content in an appropriate way. We follow this by making some initial placeholder declarations. 

theserver: uses the soap function soap_current_url to get the full url of the current server.  As the page  may wellbe accessed from another virtual directory with another port number any hard-coded server and Port number address could be wrong. As a result we use this function to dynamically find the proper address.

_chanid: provides the parameter used in our where clause in our SQL query. This value is derived from the parameter passed by rss_generate in its url String.

Figure 4: sqlx_rss20.vsp

  if ('' = _chanid) signal('rss001', 'Channel does not exist...');

  for (select        xmlelement ( "rss",

                        xmlattributes ( '2.0' as "version", 'http://purl.org/dc/elements/1.1/' as "xmlns:dc"),

                        xmlelement       ("channel",

                        xmlforest   ("chantitle" as title,

                                                "chandesc" as description,

                                                "chanlink" as link),

                        XMLAGG (xmlelement ("item",

                        xmlforest           ("itemtitle" as title,

                                                "itemlink" as link,

                                                "itemdesc" as description,

                                                "catname" as category,

                                                "subname" as subject,

                                                "authname" as author,

soap_print_box(cast(itempubdate as datetime), '', 1) as pubdate)))))  res

            from                 rss..item a,

                                    rss..subject b,

                                    rss..category c,

                                    rss..authors d,

                                    rss..channel e

            and                   a.itemsub =  b.subid

            where               a.itemcat = c.catid

            and                   a.itemauth = d.authid 

            and                   a.itemchanid = e.chanid

            and                   a.itemchanid = _chanid) do

In the section we first check that a value has been passed to the _chanid variable. If not, a custom error is sent instead. If there is a value, we issue a select statement.

This select statement contains our usual SQL but we have now added the SQLX operators. A breakdown of the SQL statement is as follows.

Figure 5

if ('' = _chanid) signal('rss001', 'Channel does not exist...');

  for (select        xmlelement ( "rss",        xmlattributes ( '2.0' as "version", 'http://purl.org/dc/elements/1.1/' as "xmlns:dc"),

In Figure 5 we begin by declaring an error handler. This checks for the existence of a valid channel ID. If none exists, we return the error message “Channel Does not Exist”.

 

In our select statement, we issue an initial XMLELEMENT. This creates the root element with an alias of rss. We follow this with the XMLATTRIBUTES operator. The atributes of rss are version and namespace declaration. We have a manually created attribute of version with a default value of ‘2.0'.  The same applies to our namespace declaration, xmlns:dc which also has a value of http://purl.org/dc/elements/1.1/'

 

Figure 6

xmlelement ("channel",
xmlforest ("chantitle" as title,
"chandesc" as description,
"chanlink" as link),

In Figure 6 We add another XMLELEMENT operator. This will be nested wihin our existing element, RSS. This new element is called “channel” and has no attributes. The Channel data is derived from our channel table so to return its values we use the XMLFOREST operator. This is represented as a fragment, hence the use of the XMLFOREST operator. Again, we see the use of aliasing to create our RSS elements e.g… “chandesc” (table column name) will now be displayed as “description” in our XML document.

Figure 7

XMLAGG (xmlelement ("item",
xmlforest ("itemtitle" as title,
"itemlink" as link,
"itemdesc" as description,
"catname" as category,
"subname" as subject,
"authname" as author,
soap_print_box(cast(itempubdate as datetime), '', 1) as pubdate))))) res

As we have more than one item in our table, as opposed to the Channel tag which should always return one record, we will combine all these item elements. This is made possible by the XMLAGG operator as we can see in Figure 7. Initially we add a nested XMLELEMENT, item, followed by a list of child elements which make up the item element. This is depicted by the XMLFOREST tag. We then enclose all this within the XMLAGG tag so we can combine them into one result.

We also see an onother useful function in practice. soap_print_box is a Virtuoso function for converting database datetime values into a format RSS can use. We simply take the values of our item column, itempubdate, and whatever datetime values exist, Virtuoso will convert them into the correct format.

TIP

There's no reason why more complex joins cannot be used. In this example, a basic equi-join query has been used for simplicity

Finally, we alias the whole select statement into a variable called resres will be used later when we output our results.

 

Finally, we provide some functionality which will enable us to display our results. There are two actions involved here. First, we need to write the output to a stream.

Figure 8

{

    http_value(res, null, _xml);

    http(string_output_string(_xml));
?>

In Figure 8, we make use of some important Functions for outputtig results to a client in this case browser. With http_value (val_expr, tag, stream). The first parameter takes our SQLX statement, rss; our expected result will be in XML. We first write it to the stream _xml; What is written into this stream will be a serialization of the entire XML fragment as text including all the markup, i.e. elements, attributes, namespaces, text nodes, etc. 

TIP

A string_output_stream is a special object within Virtuoso that is used to buffer arbitrarily long streams of data. This is useful here as we could end up dealing with huge rows of data. This is typically used when the normal data cannot fit within normal varchar size limitations.

http takes an input parameter of string_output_stream, which in this case will be _xml.

 

 

Stage 3: Preparing the Engine

On completing our results templates pages, the next stage involves creating a storage area within Virtuoso. This will be done via Virtuoso's Content Management feature and will act as the staging area where all the feeds can be accessed.

To create the WebDAV folders we execute the following:

DAV_COL_CREATE ('/DAV/rssfeeds/','110100100R', 'dav','dav','dav','dav');

This will create a folder called rssfeeds in the Virtuoso Server, e.g. http://localhost:8890/DAV/rssfeeds, with read permissions for all users.

The real engine in this project are three stored Procedures and the event trigger. Please run the script file to create these procedures and triggers.

Figure 9:  STORED PROCEDURE: filename_of

Create procedure filename_of(in _chanid integer)

{  declare fn varchar;

  select replace(chantitle, ' ', '_') into fn from rss..channel where chanid = _chanid;
  return fn;
};

This procedure derives the name of the channel from the Channel table and replaces any spaces with an underscore. On completion, the name will be the name of the actual RSS feed.

In Figure 10 this procedure rss_generate is used to select the correct XML template to be used. In this case, we will use the sqlx_rss20.vsp page we created. The logic in this code states that if the Channel ID is less than 2.0, we return, via the http_get function, the sqlx_rss91.vsp page, else we return the sqlx_rss20vsp page.

 

Figure 10: STORED PROCEDURE:  rss_generate

create procedure rss_generate (in _chanid integer, in schname varchar)

{ declare _rssversion varchar;

 

select rssversion into _rssversion from rss..channel where chanid = _chanid;

 

  if (atoi(_rssversion) < 2 )

 {

    DB.DBA.DAV_RES_UPLOAD (concat('/DAV/rssfeeds/', filename_of(_chanid),'.rss'),

    http_get(concat('http://localhost:8825/rssfeed/sqlx_rss91.vsp?channel=', cast(_chanid as varchar))),

    'text/xml','110100100','dav','dav','dav','dav');

}

  else

{
  DB.DBA.DAV_RES_UPLOAD (concat('/DAV/rssfeeds/', filename_of(_chanid),'.rss'),

    http_get(concat('http://localhost:8825/rssfeed/sqlx_rss20.vsp?channel=', cast(_chanid as varchar))),

    'text/xml','110100100','dav','dav','dav','dav');

}

  if (schname)

    delete from DB.DBA.SYS_SCHEDULED_EVENT

      where SE_NAME = schname;};

The name of the resultant RSS feed page is derived from the initial function, filename_of. The Virtuoso function DAV_RES_UPLOAD is then used to upload this file. We also check for the existence of the same scheduled event. If one exists, it is deleted from the list of scheduled events and replaced with this one .

Figure 11: STORED PROCEDURE: schedule_rss

create procedure schedule_rss (in chanid integer)

{

  declare schname varchar;

  schname := Concat('Generate RSS for ', cast(chanid as varchar));

   

  insert replacing DB.DBA.SYS_SCHEDULED_EVENT(SE_NAME, SE_START, SE_SQL, SE_INTERVAL)

    values (schname, dateadd('minute', 2, now()), sprintf('rss_generate(%d, \'%s\')',cast(chanid as integer), schname), 100);

};

In Figure11, our procedure schedule_rss creates an entry in the list of scheduled events in Virtuoso.  By scheduling an event, we get Virtuoso to perform the automatic updates. These updates are scheduled at intervals of 10mins. For the schedule to occur, it requires an event to happen. This event will be a trigger.

 

 

 The trigger, rss_generate occurs after every insert statement on the Item Table.

Figure 12: TRIGGER STATEMENT: rss_generate

 create trigger rss_generate after insert on rss..item

{

  schedule_rss(itemchanid);

};

TIP

 

At the moment this trigger occurs only after an insert. The same logic could be applied if the operation was an update or delete. In the createprocs.sql file are trigger statements for update and delete

 

Stage 4: Testing the system

Using the sample data provided in the zipattachment, we are ready to populate our system via the items.vsp page. On completion, we can check that the event we've scheduled, via our Stored Procedures and triggers, has infact been scheduled. This is done by issuing the following SQL statement below.

select * from db.dba.SYS_SCHEDULED_EVENT

As shown in Figure 13, the results will show in a list of scheduled events. On a successful execution of the event, a timestamp will be added to the table indicating when it was last run.

Figure 13: List of Scheduled Events

SE_NAME

SE_START

SE_SQL

SE_LAST_COMPLETED

SE_INTERVAL

Generate RSS for 2

2003-10-07 16:55:38.000000

rss_generate(2, 'Generate RSS for 2')

 

100

On completion, the RSS file will be stored within WebDAV (Content Management). From the Admin Assistant: Figure 14

Administration -> WebDAV -> Content Management

Figure 14: Dav Listing

 

Figure 15: Pressroom RSS Feed

Conclusion:

 

With our RSS feed complete, we've not only shown one method in retrieving XML data and converting it to XML, we' have also shown the versatility of Virtuoso. So far, we' have  explored Virtuoso's Database capabilities for hosting tables, HTTP server for hosting Web Content, WebDAV for Content Management, Event Scheduler for scheduling our RSS feed executions. Along the way we also looked at Virtuoso's PL langage , VSP, plus its Stored Procedure and Trigger support.

 

As the content is stored within a WebDAV folder, it is now accessible over the internet, provided the machine is accessible on the internet. As Virtuoso has its own Web Server, a domain can be registered to display the results. Alternatively, the same file can be added to an existing webserver. To reference this page, so News Aggregators can access this feed, we simply provide a link within our webpage, similar to:

 

<link rel="relatedLink" title="RSS"  href="http://localhost:8890/DAV/rssfeeds/pressroom-rss.xml" />

 

Alternatively, if we had stylesheets (.css), we could use the Virtuoso XSLT engine to transform this XML file into a different format. A virtuoso function exists for this task. In the next paper, we will look at another method of retrieving RSS.