Technical Article: Generating RSS Using Virtuoso and SQL/XML

Difficulty Level: Intermediate

SUMMARY: Organizations everywhere are looking to embrace XML technologies but are faced with the arduous task of moving across their entire existing technologies. One of these problem areas is in the case of Relational Databases. There are now choices available for organizations looking at embracing both XML and their RDBMS. It does not involve moving away from your legacy systems, instead it means your Database administrator simply needs to learn a few more SQL operators.

o Create a Dynamic Website Using Virtuoso & VSP

Topics Covered:
o VSP	o SQL/XML(SQLX)
o WebDAV	o Event Scheduler
o XML	o Stored Procedures & Triggers

One possible reason why organizations have still not wholly deployed XML, in spite of its versatility, across an organisation, can be attributed to their existing internal infrastructures, in particular legacy databases. A vast amount of business data is still stored within expensively assembled relational databases.

The demand for these systems is sill high, howeverthe question of how to represent this data, taking advantage of XML so that it can be consumed by various clients is still proving a headache for many. Furthermore, there is a growing need for disparate systems to exchange data at both a high and low level which makes the link between accessing content stored within relational databases a priority The way non traditional data (documents) is being produced in XML format puts relational data to shame.

There are several methods that can be used to retrieve XML data from relational databases. This paper will discuss one such method. SQL-XML, or SQLX or its most recent incarnation, SQL/XML (for the purpose of this paper we will use SQLX) is a simple process of running SQL statements and retrieving the results in XML as opposed to the common method of ASCII data. This is made possible by the addition of SQL operators which when combined with normal SQL syntax produce the desired results.

Figure 1 shows the five operators currently in operation:

Figure 1: SQL-XML(SQLX) Operators
Function/Operator	Description
XMLElement()	Creates an XML Element
XMLAttributes()	to XMLElement. Attributes are specified by an expression and an optional alias or names expression
XMLForest()	Creates an XML Fragment from passed-in components
XMLConcat()	Takes as input a series of XMLType instances, concatenates the series of elements for each row, and returns the concatenated series
XMLAGG()	An aggregate function that creates a forest of XML elements from a collection of XML elements

Please refer to the Virtuoso Function Guide documentation for more information on these operators

To demonstrate SQLX, we will reference the tables and content provided in the How-to create a dynamic Online webpage using Virtuoso & VSP Article. The table schemas, source code and sample data are all provided in a zip file.

This system will be used to to insert a daily syndication of news and information to customers via RSS feeds. RSS feeds. (Really Simple Syndication.)

Stage 1: Getting Started

In Figure 2, shows a diagrammatic representation of the stages involved when generating our RSS feeds using SQLX:

From the left of the diagram, we can follow the different stages involved from when we first insert our data via a web browser right through to the content been delivered as a RSS feed. The blue box represents the Virtuoso Server which can and hosts both the local and remote tables that contain our data

Figure 2:

The first stage involves generating the content which will make up the RSS feed. This can be done in two ways.

i. A feature of Virtuoso is its SQL-92 database. This means, it can host the content within its own database.

ii. Alternatively, if the content already exists in a different location, i.e. a remote database. Using Virtuoso's Virtual database capabilities, it is possible to link in these tables from remote database (Oracle, SQLServer, DB2, MySQL, Postgres, Progress etc... using ODBC, JDBC or OLE-DB)

Furthermore, the content from these tables does not have to be from a single remote source. It is possible to mix and match different database types simultaneously provided there is a common link between them. This is because, once linked into Virtuoso, they are seen as as local tables.

For this project, we will be using the local Virtuoso tables created earlier.

Stage 2: Creating the Results Page

In this stage we need to create the results page. The purpose of this page is to provide a template which will be the basis for the RSS feeds. As we're catering for two versions of RSS, v0.91 and v2.0, there will be two output pages. These pages will contain the SQLX functionality plus the logic required to produce the XML content.

Using the Virtuoso PL as the basis, a breakdown of the code is as follows:

Figure 3: XML and Variable Declaration.

<?xml version=”1.0”?>

<?vsp

declare theserver, _chanid varchar;

declare _xml any;

http_header ('Content-type: text/xml');

theserver := left(soap_current_url(), strrchr(ltrim(soap_current_url(), 'http://'), '/'));

_chanid := get_keyword('channel', params, '');

_xml := string_output();

In Figure 3 we first add our XML declaration. This is always the first item in any XML document. We next declare our variables along with their datatypes. At this point we're adding Virtuoso PL logic(vsp) so this needs to be within a vsp section.

The http_header function is used to add additional HTTP header lines to the server response. In this context, as we are dealing with XML content, we give a content type of XML which the Virtuoso server will use as default. This also means Virtuoso WebDAV will identify the mime-type and be able to use the content in an appropriate way. We follow this by making some initial placeholder declarations.

theserver: uses the soap function soap_current_url to get the full url of the current server. As the page may wellbe accessed from another virtual directory with another port number any hard-coded server and Port number address could be wrong. As a result we use this function to dynamically find the proper address.

_chanid: provides the parameter used in our where clause in our SQL query. This value is derived from the parameter passed by rss_generate in its url String.

Figure 4: sqlx_rss20.vsp

if ('' = _chanid) signal('rss001', 'Channel does not exist...');

for (select xmlelement ( "rss",

xmlattributes ( '2.0' as "version", 'http://purl.org/dc/elements/1.1/' as "xmlns:dc"),

xmlelement ("channel",

xmlforest ("chantitle" as title,

"chandesc" as description,

"chanlink" as link),

XMLAGG (xmlelement ("item",

xmlforest ("itemtitle" as title,

"itemlink" as link,

"itemdesc" as description,

"catname" as category,

"subname" as subject,

"authname" as author,

soap_print_box(cast(itempubdate as datetime), '', 1) as pubdate))))) res

from rss..item a,

rss..subject b,

rss..category c,

rss..authors d,

rss..channel e

and a.itemsub = b.subid

wherea.itemcat = c.catid

and a.itemauth = d.authid

and a.itemchanid = e.chanid

and a.itemchanid = _chanid) do

In the section we first check that a value has been passed to the _chanid variable. If not, a custom error is sent instead. If there is a value, we issue a select statement.

This select statement contains our usual SQL but we have now added the SQLX operators. A breakdown of the SQL statement is as follows.

Figure 5

if ('' = _chanid) signal('rss001', 'Channel does not exist...');

for (select xmlelement ( "rss", xmlattributes ( '2.0' as "version", 'http://purl.org/dc/elements/1.1/' as "xmlns:dc"),

In Figure 5 we begin by declaring an error handler. This checks for the existence of a valid channel ID. If none exists, we return the error message “Channel Does not Exist”.

In our select statement, we issue an initial XMLELEMENT. This creates the root element with an alias of rss. We follow this with the XMLATTRIBUTES operator. The atributes of rss are version and namespace declaration. We have a manually created attribute of version with a default value of ‘2.0'. The same applies to our namespace declaration, xmlns:dc which also has a value of ‘ http://purl.org/dc/elements/1.1/'

Figure 6

xmlelement ("channel",
xmlforest ("chantitle" as title,
"chandesc" as description,
"chanlink" as link),

In Figure 6 We add another XMLELEMENT operator. This will be nested wihin our existing element, RSS. This new element is called “channel” and has no attributes. The Channel data is derived from our channel table so to return its values we use the XMLFOREST operator. This is represented as a fragment, hence the use of the XMLFOREST operator. Again, we see the use of aliasing to create our RSS elements e.g… “chandesc” (table column name) will now be displayed as “description” in our XML document.

Figure 7

XMLAGG (xmlelement ("item",
xmlforest ("itemtitle" as title,
"itemlink" as link,
"itemdesc" as description,
"catname" as category,
"subname" as subject,
"authname" as author,
soap_print_box(cast(itempubdate as datetime), '', 1) as pubdate))))) res

As we have more than one item in our table, as opposed to the Channel tag which should always return one record, we will combine all these item elements. This is made possible by the XMLAGG operator as we can see in Figure 7. Initially we add a nested XMLELEMENT, item, followed by a list of child elements which make up the item element. This is depicted by the XMLFOREST tag. We then enclose all this within the XMLAGG tag so we can combine them into one result.

We also see an onother useful function in practice. soap_print_box is a Virtuoso function for converting database datetime values into a format RSS can use. We simply take the values of our item column, itempubdate, and whatever datetime values exist, Virtuoso will convert them into the correct format.

TIP	There's no reason why more complex joins cannot be used. In this example, a basic equi-join query has been used for simplicity

Finally, we alias the whole select statement into a variable called res. res will be used later when we output our results.

Finally, we provide some functionality which will enable us to display our results. There are two actions involved here. First, we need to write the output to a stream.

Figure 8

{

http_value(res, null, _xml);

http(string_output_string(_xml));
?>

In Figure 8, we make use of some important Functions for outputtig results to a client in this case browser. With http_value (val_expr, tag, stream). The first parameter takes our SQLX statement, rss; our expected result will be in XML. We first write it to the stream _xml; What is written into this stream will be a serialization of the entire XML fragment as text including all the markup, i.e. elements, attributes, namespaces, text nodes, etc.

TIP	A string_output_stream is a special object within Virtuoso that is used to buffer arbitrarily long streams of data. This is useful here as we could end up dealing with huge rows of data. This is typically used when the normal data cannot fit within normal varchar size limitations.

http takes an input parameter of string_output_stream, which in this case will be _xml.

Stage 3: Preparing the Engine

On completing our results templates pages, the next stage involves creating a storage area within Virtuoso. This will be done via Virtuoso's Content Management feature and will act as the staging area where all the feeds can be accessed.

To create the WebDAV folders we execute the following:

DAV_COL_CREATE ('/DAV/rssfeeds/','110100100R', 'dav','dav','dav','dav');

This will create a folder called rssfeeds in the Virtuoso Server, e.g. http://localhost:8890/DAV/rssfeeds, with read permissions for all users.

The real engine in this project are three stored Procedures and the event trigger. Please run the script file to create these procedures and triggers.

Figure 9: STORED PROCEDURE: filename_of

Create procedure filename_of(in _chanid integer)

{ declare fn varchar;

select replace(chantitle, ' ', '_') into fn from rss..channel where chanid = _chanid;
return fn; };

This procedure derives the name of the channel from the Channel table and replaces any spaces with an underscore. On completion, the name will be the name of the actual RSS feed.

In Figure 10 this procedure rss_generate is used to select the correct XML template to be used. In this case, we will use the sqlx_rss20.vsp page we created. The logic in this code states that if the Channel ID is less than 2.0, we return, via the http_get function, the sqlx_rss91.vsp page, else we return the sqlx_rss20vsp page.

Figure 10: STORED PROCEDURE: rss_generate

create procedure rss_generate (in _chanid integer, in schname varchar)

{ declare _rssversion varchar;

select rssversion into _rssversion from rss..channel where chanid = _chanid;

if (atoi(_rssversion) < 2 )

{

DB.DBA.DAV_RES_UPLOAD (concat('/DAV/rssfeeds/', filename_of(_chanid),'.rss'),

http_get(concat('http://localhost:8825/rssfeed/sqlx_rss91.vsp?channel=', cast(_chanid as varchar))),

'text/xml','110100100','dav','dav','dav','dav');

}

else

{
DB.DBA.DAV_RES_UPLOAD (concat('/DAV/rssfeeds/', filename_of(_chanid),'.rss'),

http_get(concat('http://localhost:8825/rssfeed/sqlx_rss20.vsp?channel=', cast(_chanid as varchar))),

'text/xml','110100100','dav','dav','dav','dav');

}

if (schname)

delete from DB.DBA.SYS_SCHEDULED_EVENT

where SE_NAME = schname;};

The name of the resultant RSS feed page is derived from the initial function, filename_of. The Virtuoso function DAV_RES_UPLOAD is then used to upload this file. We also check for the existence of the same scheduled event. If one exists, it is deleted from the list of scheduled events and replaced with this one .

Figure 11: STORED PROCEDURE: schedule_rss

create procedure schedule_rss (in chanid integer)

{

declare schname varchar;

schname := Concat('Generate RSS for ', cast(chanid as varchar));

insert replacing DB.DBA.SYS_SCHEDULED_EVENT(SE_NAME, SE_START, SE_SQL, SE_INTERVAL)

values (schname, dateadd('minute', 2, now()), sprintf('rss_generate(%d, \'%s\')',cast(chanid as integer), schname), 100);

};

In Figure11, our procedure schedule_rss creates an entry in the list of scheduled events in Virtuoso. By scheduling an event, we get Virtuoso to perform the automatic updates. These updates are scheduled at intervals of 10mins. For the schedule to occur, it requires an event to happen. This event will be a trigger.

The trigger, rss_generate occurs after every insert statement on the Item Table.

Figure 12: TRIGGER STATEMENT: rss_generate

create trigger rss_generate after insert on rss..item

{

schedule_rss(itemchanid);

};

TIP	At the moment this trigger occurs only after an insert. The same logic could be applied if the operation was an update or delete. In the createprocs.sql file are trigger statements for update and delete

Stage 4: Testing the system

Using the sample data provided in the zipattachment, we are ready to populate our system via the items.vsp page. On completion, we can check that the event we've scheduled, via our Stored Procedures and triggers, has infact been scheduled. This is done by issuing the following SQL statement below.

select * from db.dba.SYS_SCHEDULED_EVENT

As shown in Figure 13, the results will show in a list of scheduled events. On a successful execution of the event, a timestamp will be added to the table indicating when it was last run.

Figure 13: List of Scheduled Events
SE_NAME	SE_START	SE_SQL	SE_LAST_COMPLETED	SE_INTERVAL
Generate RSS for 2	2003-10-07 16:55:38.000000	rss_generate(2, 'Generate RSS for 2')		100

On completion, the RSS file will be stored within WebDAV (Content Management). From the Admin Assistant: Figure 14

Administration -> WebDAV -> Content Management

Figure 14: Dav Listing

Figure 15: Pressroom RSS Feed

Conclusion:

With our RSS feed complete, we've not only shown one method in retrieving XML data and converting it to XML, we' have also shown the versatility of Virtuoso. So far, we' have explored Virtuoso's Database capabilities for hosting tables, HTTP server for hosting Web Content, WebDAV for Content Management, Event Scheduler for scheduling our RSS feed executions. Along the way we also looked at Virtuoso's PL langage , VSP, plus its Stored Procedure and Trigger support.

As the content is stored within a WebDAV folder, it is now accessible over the internet, provided the machine is accessible on the internet. As Virtuoso has its own Web Server, a domain can be registered to display the results. Alternatively, the same file can be added to an existing webserver. To reference this page, so News Aggregators can access this feed, we simply provide a link within our webpage, similar to:

Alternatively, if we had stylesheets (.css), we could use the Virtuoso XSLT engine to transform this XML file into a different format. A virtuoso function exists for this task. In the next paper, we will look at another method of retrieving RSS.

This HTML5 document contains 76 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

Prefix	IRI
n22	http://data.openlinksw.com/oplweb/license/JDBCpostgresGenericLinuxWKSDBAgentLicense-16cores-2016-02-11#
n38	http://www.openlinksw.com/DAV/data/turtle/openlink.ttl#
n19	https://www.linkedin.com/company/27491/
n13	https://www.openlinksw.com/articles/rssvirtsqlx/
dbpedia	http://dbpedia.org/resource/
n21	https://shop.openlinksw.com/
n30	http://www.openlinksw.com/dataspace/organization/openlink/
n28	http://data.openlinksw.com/oplweb/product_family/virtuoso#
schema	http://schema.org/
n4	https://www.openlinksw.com/about/id/entity/urn/mdata:websites:google:
rdfs	http://www.w3.org/2000/01/rdf-schema#
n34	http://data.openlinksw.com/oplweb/offer/Offer-2016-02-11-UDA-MT-WKS-21-linux-oracle9#
n39	http://data.openlinksw.com/oplweb/license/JDBCGenericLinuxWKSRequestBrokerLicense-16cores-2016-02-11#
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n14	http://data.openlinksw.com/oplweb/license/JDBCoracle9GenericLinuxWKSDBAgentLicense-16cores-2016-02-11#
n23	http://www.openlinksw.com/
n33	https://www.linkedin.com/company/openlink-software/
n35	https://plus.google.com/100570109519069333827/
n15	https://www.facebook.com/OpenLinkSoftware/
n25	http://uda.openlinksw.com/images/
xsdh	http://www.w3.org/2001/XMLSchema#
n31	http://data.openlinksw.com/oplweb/product/util-dataexplorer-st#
n24	http://www.openlinksw.com/dataspace/organization/openlink#
n9	http://www.openlinksw.com/DAV/data/turtle/archives/openlink.ttl#
n37	http://linkedin/in/openlink/
n16	http://youid.openlinksw.com/#
n36	http://data.openlinksw.com/oplweb/offer/Offer-2016-02-11-UDA-MT-WKS-21-linux-postgres#
n18	http://data.openlinksw.com/oplweb/product_family/uda#
n40	http://creativecommons.org/licenses/by/4.0/
n29	http://www.openlinksw.com/skin/images/
n26	http://osds.openlinksw.com/#
n17	http://linkeddata.uriburner.com/#
n11	http://data.openlinksw.com/oplweb/license/JDBCoracle12GenericLinuxWKSDBAgentLicense-16cores-2016-02-11#
n20	https://twitter.com/OpenLink/
n10	http://kingsley.idehen.net/dataspace/person/kidehen#
wdrs	http://www.w3.org/2007/05/powder-s#
n8	https://plus.google.com/
n5	https://www.openlinksw.com/about/id/entity/http/www.openlinksw.com/data/turtle/
n2	http://www.openlinksw.com/#
n32	http://data.openlinksw.com/oplweb/offer/Offer-2016-02-11-UDA-MT-WKS-21-linux-oracle12#
n12	http://www.openlinksw.com/data/turtle/

Statements

Subject Item: schema:WebPage
rdfs:subClassOf: schema:CreativeWork
rdfs:comment: A web page. Every web page is implicitly assumed to be declared to be of type WebPage, so the various properties about that webpage, such as <code>breadcrumb</code> may be used. We recommend explicit declaration if these properties are specified, but if they are found outside of an itemscope, they will be assumed to be about the page.
rdfs:label: WebPage
schema:WebPage: Generic Linux (Workstation) Enterprise Edition JDBC Database Agent (Release 6.x) for PostgreSQL 6.x, 7.x, 8.x License Offer Web Page n2:this n11:this text/html n14:this n21:cart.vsp n22:this Generic Linux (Workstation) Enterprise Edition JDBC Database Agent (Release 6.x) for Oracle 12.x License Offer Web Page n25:jdbcpostgresmt.gif Transferable Server-class JDBC Database Agent (Release 6.x) for Oracle 9.x and Request Broker licenses for one (1) server host installation running Generic Linux (Workstation) with up to 16 logical processor cores without additional charges. Enables five (5) concurrent JDBC data access sessions to be shared by up to five (5) concurrent generic (ODBC, JDBC, ADO.NET, OLE-DB) driver client host installations. Transferable Server-class JDBC Database Agent (Release 6.x) for Oracle 12.x and Request Broker licenses for one (1) server host installation running Generic Linux (Workstation) with up to 16 logical processor cores without additional charges. Enables five (5) concurrent JDBC data access sessions to be shared by up to five (5) concurrent generic (ODBC, JDBC, ADO.NET, OLE-DB) driver client host installations. OpenLink Software Generic Linux (Workstation) Enterprise Edition JDBC Database Agent (Release 6.x) for Oracle 9.x License Offer Web Page n25:jdbcoraclemt.gif n32:this n34:this n36:this Transferable Server-class JDBC Database Agent (Release 6.x) for PostgreSQL 6.x, 7.x, 8.x and Request Broker licenses for one (1) server host installation running Generic Linux (Workstation) with up to 16 logical processor cores without additional charges. Enables five (5) concurrent JDBC data access sessions to be shared by up to five (5) concurrent generic (ODBC, JDBC, ADO.NET, OLE-DB) driver client host installations. n39:this
rdf:type: rdfs:Class

Subject Item: n2:this
wdrs:describedby: n4:seo n5:openlink.ttl n12:openlink.ttl
schema:name: OpenLink Software
schema:publisher: n16:product n17:service n26:product n18:this n28:this n31:this
schema:sameAs: n8:100570109519069333827 n15: n19: n20: n23: n24:this n30: n33: n35: n37: dbpedia:OpenLink_Software
schema:address: n9:Address n38:Address
schema:faxNumber: +1 781 229 8030
schema:founder: n10:this
schema:foundingDate: 1992-01-05T00:00:00 1992-01-05
schema:legalName: OpenLink Software
schema:logo: n29:oplogostd160.png
schema:owns: n16:product n17:service n18:this n26:product n28:this n31:this
schema:telephone: +1 781 273 0900
rdf:type: schema:Organization
schema:mainEntityOfPage: n12:openlink.ttl
schema:foundingLocation: dbpedia:London

Subject Item: n13:
schema:description: Technical Article: Generating RSS Using Virtuoso and SQL/XML Difficulty Level: Intermediate SUMMARY: Organizations everywhere are looking to embrace XML technologies but are faced with the arduous task of moving across their entire existing technologies. One of these problem areas is in the case of Relational Databases. There are now choices available for organizations looking at embracing both XML and their RDBMS. It does not involve moving away from your legacy systems, instead it m...
schema:name: OpenLink Technical Article: Generating RSS Using Virtuoso and SQL/XML Web Page
schema:fileFormat: text/html
schema:mainContentOfPage: Technical Article: Generating RSS Using Virtuoso and SQL/XML Difficulty Level: Intermediate SUMMARY: Organizations everywhere are looking to embrace XML technologies but are faced with the arduous task of moving across their entire existing technologies. One of these problem areas is in the case of Relational Databases. There are now choices available for organizations looking at embracing both XML and their RDBMS. It does not involve moving away from your legacy systems, instead it m...
schema:author: OpenLink Software
schema:copyrightHolder: n2:this
schema:dateModified: 2016-11-02 2017-10-28
schema:datePublished: 2015-05-29
schema:headline: OpenLink Technical Article: Generating RSS Using Virtuoso and SQL/XML Web Page
schema:publisher: n2:this
schema:url: n13:
rdf:type: schema:WebPage
schema:license: n40:deed.en_US