Details

OpenLink Software
Burlington, United States

Subscribe

Post Categories

Recent Articles

Community Member Blogs

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
Some Interesting VLDB 2009 Papers (2 of 5) [ Orri Erling ]

Intel on Hash Join

Intel and Oracle had measured hash and sort merge joins on Intel Core i7. The result was that hash join with both tables partitioned to match CPU cache was still the best but that sort/merge would catch up with more SIMD instructions in the future.

We should probably experiment with this but the most important partitioning of hash joins is still between cluster nodes. Within the process, we will see. The tradeoff of doing all in cache-sized partitions is larger intermediate results which in turn will impact the working set of disk pages in RAM. For one-off queries this is OK; for online use this has an effect.

1000 TABLE Queries

SAP presented a paper about federating relational databases. Queries would be expressed against VIEWs defined over remote TABLEs, UNIONed together and so forth. Traditional methods of optimization would run out of memory; a single 1000 TABLE plan is already a big thing. Enumerating multiple variations of such is not possible in practice. So the solution was to plan in two stages — first arrange the subqueries and derived TABLEs, and then do the JOIN orders locally. Further, local JOIN orders could even be adjusted at run time based on the actual data. Nice.

Oracle Subqueries and New Implementation of LOBs

Oracle presented some new SQL optimizations, combining and inlining subqueries and derived TABLEs. We do fairly similar things and might extend the repertoire of tricks in the direction outlined by Oracle as and when the need presents itself. This further confirms that SQL and other query optimization is really an incremental collection of specially recognized patterns. We still have not found any other way of doing it.

Another interesting piece by Oracle was about their re-implementation of large object support, where they compared LOB loading to file system and raw device speeds.

Amadeus CRS booking system, steady query time for arbitrary single table queries

There was a paper about a memory-resident database that could give steady time for any kind of single-table scan query. The innovation was to not use indices, but to have one partition of the table per processor core, all in memory. Then each core would have exactly two cursors — one reading, the other writing. The write cursor should keep ahead of the read cursor. Like this, there would be no read/write contention on pages, no locking, no multiple threads splitting a tree at different points, none of the complexity of a multithreaded database engine. Then, when the cursor would hit a row, it would look at the set of queries or updates and add the result to the output if there was a result. The data indexes the queries, not the other way around. We have done something similar for detecting changes in a full text corpus but never thought of doing queries this way.

Well, we are all about JOINs so this is not for us, but it deserves a mention for being original and clever. And indeed, anything one can ask about a table will likely be served with great predictability.

Greenplum

Google's chief economist said that the winning career choice would be to pick a scarce skill that made value from something that was plentiful. For the 2010s this career is that of the statistician/data analyst. We've said it before — the next web is analytics for all. The Greenplum talk was divided between the Fox use case, with 200TB of data about ads, web site traffic, and other things, growing 5TB a day. The message was that cubes and drill down are passé, that it is about complex statistical methods that have to run in the database, that the new kind of geek is the data geek, whose vocation it is to consume and spit out data, discover things in it, and so forth.

The technical part was about Greenplum, a SQL database running on a cluster with a PostgreSQL back-end. The interesting points were embedding MapReduce into SQL, and using relational tables for arrays and complex data types — pretty much what we also do. Greenplum emphasized scale-out and found column orientation more like a nice-to-have.

MonetDB, optimizing database for CPU cache

The MonetDB people from CWI in Amsterdam gave a 10 year best paper award talk about optimizing database for CPU cache. The key point was that if data is stored as columns, it ought also to be transferred as columns inside the execution engine. Materialize big chunks of state to cut down on interpretation overhead and use cache to best effect. They vector for CPU cache; we vector for scale-out, since the only way to ship operations is to ship many at a time. So we might as well vector also in single servers. This could be worth an experiment. Also we regularly visit the topic of column storage. But we are not yet convinced that it would be better than row-style covering indices for RDF quads. But something could certainly be tried, given time.

# PermaLink Comments [0]
09/01/2009 11:46 GMT Modified: 09/01/2009 17:32 GMT
Some Interesting VLDB 2009 Papers (2 of 5) [ Virtuso Data Space Bot ]

Intel on Hash Join

Intel and Oracle had measured hash and sort merge joins on Intel Core i7. The result was that hash join with both tables partitioned to match CPU cache was still the best but that sort/merge would catch up with more SIMD instructions in the future.

We should probably experiment with this but the most important partitioning of hash joins is still between cluster nodes. Within the process, we will see. The tradeoff of doing all in cache-sized partitions is larger intermediate results which in turn will impact the working set of disk pages in RAM. For one-off queries this is OK; for online use this has an effect.

1000 TABLE Queries

SAP presented a paper about federating relational databases. Queries would be expressed against VIEWs defined over remote TABLEs, UNIONed together and so forth. Traditional methods of optimization would run out of memory; a single 1000 TABLE plan is already a big thing. Enumerating multiple variations of such is not possible in practice. So the solution was to plan in two stages — first arrange the subqueries and derived TABLEs, and then do the JOIN orders locally. Further, local JOIN orders could even be adjusted at run time based on the actual data. Nice.

Oracle Subqueries and New Implementation of LOBs

Oracle presented some new SQL optimizations, combining and inlining subqueries and derived TABLEs. We do fairly similar things and might extend the repertoire of tricks in the direction outlined by Oracle as and when the need presents itself. This further confirms that SQL and other query optimization is really an incremental collection of specially recognized patterns. We still have not found any other way of doing it.

Another interesting piece by Oracle was about their re-implementation of large object support, where they compared LOB loading to file system and raw device speeds.

Amadeus CRS booking system, steady query time for arbitrary single table queries

There was a paper about a memory-resident database that could give steady time for any kind of single-table scan query. The innovation was to not use indices, but to have one partition of the table per processor core, all in memory. Then each core would have exactly two cursors — one reading, the other writing. The write cursor should keep ahead of the read cursor. Like this, there would be no read/write contention on pages, no locking, no multiple threads splitting a tree at different points, none of the complexity of a multithreaded database engine. Then, when the cursor would hit a row, it would look at the set of queries or updates and add the result to the output if there was a result. The data indexes the queries, not the other way around. We have done something similar for detecting changes in a full text corpus but never thought of doing queries this way.

Well, we are all about JOINs so this is not for us, but it deserves a mention for being original and clever. And indeed, anything one can ask about a table will likely be served with great predictability.

Greenplum

Google's chief economist said that the winning career choice would be to pick a scarce skill that made value from something that was plentiful. For the 2010s this career is that of the statistician/data analyst. We've said it before — the next web is analytics for all. The Greenplum talk was divided between the Fox use case, with 200TB of data about ads, web site traffic, and other things, growing 5TB a day. The message was that cubes and drill down are passé, that it is about complex statistical methods that have to run in the database, that the new kind of geek is the data geek, whose vocation it is to consume and spit out data, discover things in it, and so forth.

The technical part was about Greenplum, a SQL database running on a cluster with a PostgreSQL back-end. The interesting points were embedding MapReduce into SQL, and using relational tables for arrays and complex data types — pretty much what we also do. Greenplum emphasized scale-out and found column orientation more like a nice-to-have.

MonetDB, optimizing database for CPU cache

The MonetDB people from CWI in Amsterdam gave a 10 year best paper award talk about optimizing database for CPU cache. The key point was that if data is stored as columns, it ought also to be transferred as columns inside the execution engine. Materialize big chunks of state to cut down on interpretation overhead and use cache to best effect. They vector for CPU cache; we vector for scale-out, since the only way to ship operations is to ship many at a time. So we might as well vector also in single servers. This could be worth an experiment. Also we regularly visit the topic of column storage. But we are not yet convinced that it would be better than row-style covering indices for RDF quads. But something could certainly be tried, given time.

# PermaLink Comments [0]
09/01/2009 11:46 GMT Modified: 09/01/2009 17:32 GMT
New ADO.NET 3.x Provider for Virtuoso Released (Update 2) [ Kingsley Uyi Idehen ]

I am pleased to announce the immediate availability of the Virtuoso ADO.NET 3.5 data provider for Microsoft's .NET platform.

What is it?

A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.

Benefits?

Technical:

It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.

The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.

Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.


Strategic:

You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.


How do I use it?

Simply follow one of guides below:

Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.

Related

# PermaLink Comments [0]
01/08/2009 04:36 GMT Modified: 01/08/2009 09:12 GMT
Comparison of Open Source Databases with TPC D Queries [ Orri Erling ]

Last time we talked about database engine and transactions. Now we have come to the realm of query processing in our revisiting of the DBMS side of Virtuoso.

Now the well established, respectable standard benchmark for the basics of query processing is TPC D with its derivatives H and R. So we have, for testing how different SQL optimizers manage the 22 queries, run a mini version of the D queries with a 1% scale database, some 30M of data, all in memory. This basically catches whether SQL implementations miss some of the expected tricks and how efficient in memory loop and hash joins and aggregation are.

When we get to our next stop, high volume I/O, we will run the same with D databases in the 10G ballpark.

The databases were tested on the same machine, with warm cache, taking the best run of 3. All had full statistics and were running with read committed isolation, where applicable. The data was generated using the procedures from the Virtuoso test suite. The Virtuoso version tested was 5.0, to be released shortly. The MySQL was 5.0.27, the PostgreSQL was 8.1.6.

Query Query Times in Milliseconds
Virtuoso PostgreSQL MySQL MySQL with InnoDB
Q1 206 763 312 198
Q2 4 6 3 3
Q3 13 51 254 64
Q4 4 16 24 60
Q5 15 22 64 68
Q6 9 70 189 65
Q7 52 143 211 84
Q8 29 31 13 11
Q9 36 114 97 61
Q10 32 51 117 57
Q11 16 9 12 10
Q12 8 21 18 130
Q13 18 74 - -
Q14 7 21 418 1425
Q15 14 43 389 122
Q16 16 22 18 25
Q17 1 54 26 10
Q18 82 120 - -
Q19 19 8 2 17
Q20 7 15 66 52
Q21 34 86 524 278
Q22 4 323 3311 805
Total (msec) 626 2063 6068 3545

We lead by a fair margin but MySQL is hampered by obviously getting some execution plans wrong and not doing Q13 and Q18 at all, at least not under several tens of seconds; so we left these out of the table in the interest of having comparable totals.

As usual, we also ran the workload on Oracle 10g R2. Since Oracle does not like their numbers being published without explicit approval, we will just say that we are even with them within the parameters described above. Oracle has a more efficient decimal type so it wins where that is central, as on Q1. Also it seems to notice that the GROUP BYs of Q18 are produced in order of grouping columns, so it needs no intermediate table for storing the aggregates. If we addressed these matters, we'd lead by some 15% whereas now we are even. A faster decimal arithmetic implementation may be in the release after next.

In the next posts, we will look at IO and disk allocation, and also return to RDF and LUBM.

# PermaLink Comments [0]
02/05/2007 11:45 GMT Modified: 04/17/2008 21:04 GMT
Comparison of Open Source Databases with TPC D Queries [ Virtuso Data Space Bot ]
Comparison of Open Source Databases with TPC D Queries

Last time we talked about database engine and transactions. Now we have come to the realm of query processing in our revisiting of the DBMS side of Virtuoso.

Now the well established, respectable standard benchmark for the basics of query processing is TPC D with its derivatives H and R. So we have, for testing how different SQL optimizers manage the 22 queries, run a mini version of the D queries with a 1% scale database, some 30M of data, all in memory. This basically catches whether SQL implementations miss some of the expected tricks and how efficient in memory loop and hash joins and aggregation are.

When we get to our next stop, high volume I/O, we will run the same with D databases in the 10G ballpark.

The databases were tested on the same machine, with warm cache, taking the best run of 3. All had full statistics and were running with read committed isolation, where applicable. The data was generated using the procedures from the Virtuoso test suite. The Virtuoso version tested was 5.0, to be released shortly. The MySQL was 5.0.27, the PostgreSQL was 8.1.6.

Query Query Times in Milliseconds
Virtuoso PostgreSQL MySQL MySQL with InnoDB
Q1 206 763 312 198
Q2 4 6 3 3
Q3 13 51 254 64
Q4 4 16 24 60
Q5 15 22 64 68
Q6 9 70 189 65
Q7 52 143 211 84
Q8 29 31 13 11
Q9 36 114 97 61
Q10 32 51 117 57
Q11 16 9 12 10
Q12 8 21 18 130
Q13 18 74 - -
Q14 7 21 418 1425
Q15 14 43 389 122
Q16 16 22 18 25
Q17 1 54 26 10
Q18 82 120 - -
Q19 19 8 2 17
Q20 7 15 66 52
Q21 34 86 524 278
Q22 4 323 3311 805
Total (msec) 626 2063 6068 3545

We lead by a fair margin but MySQL is hampered by obviously getting some execution plans wrong and not doing Q13 and Q18 at all, at least not under several tens of seconds; so we left these out of the table in the interest of having comparable totals.

As usual, we also ran the workload on Oracle 10g R2. Since Oracle does not like their numbers being published without explicit approval, we will just say that we are even with them within the parameters described above. Oracle has a more efficient decimal type so it wins where that is central, as on Q1. Also it seems to notice that the GROUP BYs of Q18 are produced in order of grouping columns, so it needs no intermediate table for storing the aggregates. If we addressed these matters, we'd lead by some 15% whereas now we are even. A faster decimal arithmetic implementation may be in the release after next.

In the next posts, we will look at IO and disk allocation, and also return to RDF and LUBM.

# PermaLink Comments [1]
02/05/2007 06:44 GMT Modified: 04/17/2008 21:04 GMT
"Free" Databases: Express vs. Open-Source RDBMSs [ Kingsley Uyi Idehen ]

Very detailed and insightful peek into the state of affairs re. database engines (Open & Closed Source).

I added the missing piece regarding the "Virtuoso Conductor" (the Web based Admin UI for Virtuoso) to the original post below. I also added a link to our live SPARQL Demo so that anyone interested can start playing around with SPARQL and SPARQL integrated into SQL right away.

Another good thing about this post is the vast amount of valuable links that it contains. To really appreciate this point simply visit my Linkblog (excuse the current layout :-) - a Tab if you come in via the front door of this Data Space (what I used to call My Weblog Home Page).

"Free" Databases: Express vs. Open-Source RDBMSs: "Open-source relational database management systems (RDBMSs) are gaining IT mindshare at a rapid pace. As an example, BusinessWeek's February 6, 2006 ' Taking On the Database Giants ' article asks 'Can open-source upstarts compete with Oracle, IBM, and Microsoft?' and then provides the answer: 'It's an uphill battle, but customers are starting to look at the alternatives.'

There's no shortage of open-source alternatives to look at. The BusinessWeek article concentrates on MySQL, which BW says 'is trying to be the Ikea of the database world: cheap, needs some assembly, but has a sleek, modern design and does the job.' The article also discusses Postgre[SQL] and Ingres, as well as EnterpriseDB, an Oracle clone created from PostgreSQL code*. Sun includes PostgreSQL with Solaris 10 and, as of April 6, 2006, with Solaris Express.**

*Frank Batten, Jr., the investor who originally funded Red Hat, invested a reported $16 million into Great Bridge with the hope of making a business out of providing paid support to PostgreSQL users. Great Bridge stayed in business only 18 months , having missed an opportunity to sell the business to Red Hat and finding that selling $50,000-per-year support packages for an open-source database wasn't easy. As Batten concluded, 'We could not get customers to pay us big dollars for support contracts.' Perhaps EnterpriseDB will be more successful with a choice of $5,000, $3,000, or $1,000 annual support subscriptions .

**Interestingly, Oracle announced in November 2005 that Solaris 10 is 'its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems.'

There is a surfeit of reviews of current MySQL, PostgreSQL and—to a lesser extent—Ingres implementations. These three open-source RDBMSs come with their own or third-party management tools. These systems compete against free versions of commercial (proprietary) databases: SQL Server 2005 Express Edition (and its MSDE 2000 and 1.0 predecessors), Oracle Database 10g Express Edition, IBM DB2 Express-C, and Sybase ASE Express Edition for Linux where database size and processor count limitations aren't important. Click here for a summary of recent InfoWorld reviews of the full versions of these four databases plus MySQL, which should be valid for Express editions also. The FTPOnline Special Report article, 'Microsoft SQL Server Turns 17,' that contains the preceding table is here (requires registration.)

SQL Server 2005 Express Edition SP-1 Advanced Features

SQL Server 2005 Express Edition with Advanced Features enhances SQL Server 2005 Express Edition (SQL Express or SSX) dramatically, so it deserves special treatment here. SQL Express gains full text indexing and now supports SQL Server Reporting Services (SSRS) on the local SSX instance. The SP-1 with Advanced Features setup package, which Microsoft released on April 18, 2006, installs the release version of SQL Server Management Studio Express (SSMSE) and the full version of Business Intelligence Development Studio (BIDS) for designing and editing SSRS reports. My 'Install SP-1 for SQL Server 2005 and Express' article for FTPOnline's SQL Server Special Report provides detailed, illustrated installation instructions for and related information about the release version of SP-1. SP-1 makes SSX the most capable of all currently available Express editions of commercial RDBMSs for Windows.

OpenLink Software's Virtuoso Open-Source Edition

OpenLink Software announced an open-source version of it's Virtuoso Universal Server commercial DBMS on April 11, 2006. On the initial date of this post, May 2, 2006, Virtuoso Open-Source Edition (VOS) was virtually under the radar as an open-source product. According to this press release, the new edition includes:

  • SPARQL compliant RDF Triple Store
  • SQL-200n Object-Relational Database Engine (SQL, XML, and Free Text)
  • Integrated BPEL Server and Enterprise Service Bus
  • WebDAV and Native File Server
  • Web Application Server that supports PHP, Perl, Python, ASP.NET, JSP, etc.
  • Runtime Hosting for Microsoft .NET, Mono, and Java
VOS only lacks the virtual server and replication features that are offered by the commercial edition. VOS includes a Web-based administration tool called the "Virtuoso Conductor" According to Kingsley Idehen's Weblog, 'The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).'

InfoWorld's Jon Udell has tracked Virtuoso's progress since 2002, with an additional article in 2003 and a one-hour podcast with Kingsley Idehen on April 26, 2006. A major talking point for Virtuoso is its support for Atom 0.3 syndication and publication, Atom 1.0 syndication and (forthcoming) publication, and future support for Google's GData protocol, as mentioned in this Idehen post. Yahoo!'s Jeremy Zawodny points out that the 'fingerprints' of Adam Bosworth, Google's VP of Engineering and the primary force behind the development of Microsoft Access, 'are all over GData.' Click here to display a list of all OakLeaf posts that mention Adam Bosworth.

One application for the GData protocol is querying and updating the Google Base database independently of the Google Web client, as mentioned by Jeremy: 'It's not about building an easier onramp to Google Base. ... Well, it is. But, again, that's the small stuff.' Click here for a list of posts about my experiences with Google Base. Watch for a future OakLeaf post on the subject as the GData APIs gain ground.

Open-Source and Free Embedded Database Contenders

Open-source and free embedded SQL databases are gaining importance as the number and types of mobile devices and OSs proliferate. Embedded databases usually consist of Java classes or Windows DLLs that are designed to minimize file size and memory consumption. Embedded databases avoid the installation hassles, heavy resource usage and maintenance cost associated with client/server RDBMSs that run as an operating system service.

Andrew Hudson's December 2005 'Open Source databases rounded up and rodeoed' review for The Enquirer provides brief descriptions of one commercial and eight open source database purveyors/products: Sleepycat, MySQL, PostgreSQL, Ingres, InnoBase, Firebird, IBM Cloudscape (a.k.a, Derby), Genezzo, and Oracle. Oracle Sleepycat* isn't an SQL Database, Oracle InnoDB* is an OEM database engine that's used by MySQL, and Genezzo is a multi-user, multi-server distributed database engine written in Perl. These special-purpose databases are beyond the scope of this post.

* Oracle purchased Sleepycat Software, Inc. in February 2006 and purchased Innobase OY in October 2005 . The press release states: 'Oracle intends to continue developing the InnoDB technology and expand our commitment to open source software.'

Derby is an open-source release by the Apache Software Foundation of the Cloudscape Java-based database that IBM acquired when it bought Informix in 2001. IBM offers a commercial release of Derby as IBM Cloudscape 10.1. Derby is a Java class library that has a relatively light footprint (2 MB), which make it suitable for client/server synchronization with the IBM DB2 Everyplace Sync Server in mobile applications. The IBM DB2 Everyplace Express Edition isn't open source or free*, so it doesn't qualify for this post. The same is true for the corresponding Sybase SQL Anywhere components.**


* IBM DB2 Everyplace Express Edition with synchronization costs $379 per server (up to two processors) and $79 per user. DB2 Everyplace Database Edition (without DB2 synchronization) is $49 per user. (Prices are based on those when IBM announced version 8 in November 2003.)

** Sybase's iAnywhere subsidiary calls SQL Anywhere 'the industry's leading mobile database.' A Sybase SQL Anywhere Personal DB seat license with synchronization to SQL Anywhere Server is $119; the cost without synchronization wasn't available from the Sybase Web site. Sybase SQL Anywhere and IBM DB2 Everyplace perform similar replication functions.

Sun's Java DB, another commercial version of Derby, comes with the Solaris Enterprise Edition, which bundles Solaris 10, the Java Enterprise System, developer tools, desktop infrastructure and N1 management software. A recent Between the Lines blog entry by ZDNet's David Berlind waxes enthusiastic over the use of Java DB embedded in a browser to provide offline persistence. RedMonk analyst James Governor and eWeek's Lisa Vaas wrote about the use of Java DB as a local data store when Tim Bray announced Sun's Derby derivative and Francois Orsini demonstrated Java DB embedded in the Firefox browser at the ApacheCon 2005 conference.

Firebird is derived from Borland's InterBase 6.0 code, the first commercial relational database management system (RDBMS) to be released as open source. Firebird has excellent support for SQL-92 and comes in three versions: Classic, SuperServer and Embedded for Windows, Linux, Solaris, HP-UX, FreeBSD and MacOS X. The embedded version has a 1.4-MB footprint. Release Candidate 1 for Firebird 2.0 became available on March 30, 2006 and is a major improvement over earlier versions. Borland continues to promote InterBase, now at version 7.5, as a small-footprint, embedded database with commercial Server and Client licenses.

SQLite is a featherweight C library for an embedded database that implements most SQL-92 entry- and transitional-level requirements (some through the JDBC driver) and supports transactions within a tiny 250-KB code footprint. Wrappers support a multitude of languages and operating systems, including Windows CE, SmartPhone, Windows Mobile, and Win32. SQLite's primary SQL-92 limitations are lack of nested transactions, inability to alter a table design once committed (other than with RENAME TABLE and ADD COLUMN operations), and foreign-key constraints. SQLite provides read-only views, triggers, and 256-bit encryption of database files. A downside is the the entire database file is locked when while a transaction is in progress. SQLite uses file access permissions in lieu of GRANT and REVOKE commands. Using SQLite involves no license; its code is entirely in the public domain.

The Mozilla Foundation's Unified Storage wiki says this about SQLite: 'SQLite will be the back end for the unified store [for Firefox]. Because it implements a SQL engine, we get querying 'for free', without having to invent our own query language or query execution system. Its code-size footprint is moderate (250k), but it will hopefully simplify much existing code so that the net code-size change should be smaller. It has exceptional performance, and supports concurrent access to the database. Finally, it is released into the public domain, meaning that we will have no licensing issues.'

Vieka Technology, Inc.'s eSQL 2.11 is a port of SQLite to Windows Mobile (Pocket PC and Smartphone) and Win32, and includes development tools for Windows devices and PCs, as well as a .NET native data provider. A conventional ODBC driver also is available. eSQL for Windows (Win32) is free for personal and commercial use; eSQL for Windows Mobile requires a license for commercial (for-profit or business) use.

HSQLDB isn't on most reviewers' radar, which is surprising because it's the default database for OpenOffice.org (OOo) 2.0's Base suite member. HSQLDB 1.8.0.1 is an open-source (BSD license) Java dembedded database engine based on Thomas Mueller's original Hypersonic SQL Project. Using OOo's Base feature requires installing the Java 2.0 Runtime Engine (which is not open-source) or the presence of an alternative open-source engine, such as Kaffe. My prior posts about OOo Base and HSQLDB are here, here and here.

The HSQLDB 1.8.0 documentation on SourceForge states the following regarding SQL-92 and later conformance:

HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and here is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards.

Other less well-known embedded databases designed for or suited to mobile deployment are Mimer SQL Mobile and VistaDB 2.1 . Neither product is open-source and require paid licensing; VistaDB requires a small up-front payment by developers but offers royalty-free distribution.

Java DB, Firebird embedded, SQLite and eSQL 2.11 are contenders for lightweight PC and mobile device database projects that aren't Windows-only.

SQL Server 2005 Everywhere

If you're a Windows developer, SQL Server Mobile is the logical embedded database choice for mobile applications for Pocket PCs and Smartphones. Microsoft's April 19, 2006 press release delivered the news that SQL Server 2005 Mobile Editon (SQL Mobile or SSM) would gain a big brother—SQL Server 2005 Everywhere Edition.

Currently, the SSM client is licensed (at no charge) to run in production on devices with Windows CE 5.0, Windows Mobile 2003 for Pocket PC or Windows Mobile 5.0, or on PCs with Windows XP Tablet Edition only. SSM also is licensed for development purposes on PCs running Visual Studio 2005. Smart Device replication with SQL Server 2000 SP3 and later databases has been the most common application so far for SSM.

By the end of 2006, Microsoft will license SSE for use on all PCs running any Win32 version or the preceding device OSs. A version of SQL Server Management Studio Express (SSMSE)—updated to support SSE—is expected to release by the end of the year. These features will qualify SSE as the universal embedded database for Windows client and smart-device applications.

For more details on SSE, read John Galloway's April 11, 2006 blog post and my 'SQL Server 2005 Mobile Goes Everywhere' article for the FTPOnline Special Report on SQL Server."

(Via OakLeaf Systems.)

# PermaLink Comments [1]
05/05/2006 16:02 GMT Modified: 07/21/2006 07:21 GMT
Virtuoso is Officially Open Source! [ Kingsley Uyi Idehen ]

I am pleased to unveil (officially) the fact that Virtuoso is now available in Open Source form.

What Is Virtuoso?

A powerful next generation server product that implements otherwise distinct server functionality within a single server product. Think of Virtuoso as the server software analog of a dual core processor where each core represents a traditional server functionality realm.

Where did it come from?

The Virtuoso History page tells the whole story.

What Functionality Does It Provide?

The following:
    1. Object-Relational DBMS Engine (ORDBMS like PostgreSQL and DBMS engine like MySQL)
    2. XML Data Management (with support for XQuery, XPath, XSLT, and XML Schema)
    3. RDF Triple Store (or Database) that supports SPARQL (Query Language, Transport Protocol, and XML Results Serialization format)
    4. Service Oriented Architecture (it combines a BPEL Engine with an ESB)
    5. Web Application Server (supports HTTP/WebDAV)
    6. NNTP compliant Discussion Server
And more. (see: Virtuoso Web Site)

90% of the aforementioned functionality has been available in Virtuoso since 2000 with the RDF Triple Store being the only 2006 item.

What Platforms are Supported

The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).

Why Open Source?

Simple, there is no value in a product of this magnitude remaining the "best kept secret". That status works well for our competitors, but absolutely works against the legions of new generation developers, systems integrators, and knowledge workers that need to be aware of what is actually achievable today with the right server architecture.

What Open Source License is it under?

GPL version 2.

What's the business model?

Dual licensing.

The Open Source version of Virtuoso includes all of the functionality listed above. While the Virtual Database (distributed heterogeneous join engine) and Replication Engine (across heterogeneous data sources) functionality will only be available in the commercial version.

Where is the Project Hosted?

On SourceForge.

Is there a product Blog?

Of course!

Up until this point, the Virtuoso Product Blog has been a covert live demonstration of some aspects of Virtuoso (Content Management). My Personal Blog and the Virtuoso Product Blog are actual Virtuoso instances, and have been so since I started blogging in 2003.

Is There a product Wiki?

Sure! The Virtuoso Product Wiki is also an instance of Virtuoso demonstrating another aspect of the Content Management prowess of Virtuoso.

What About Online Documentation?

Yep! Virtuoso Online Documentation is hosted via yet another Virtuoso instance. This particular instance also attempts to demonstrate Free Text search combined with the ability to repurpose well formed content in a myriad of forms (Atom, RSS, RDF, OPML, and OCS).

What about Tutorials and Demos?

The Virtuoso Online Tutorial Site has operated as a live demonstration and tutorial portal for a numbers of years. During the same timeframe (circa. 2001) we also assembled a few Screencast style demos (their look feel certainly show their age; updates are in the works).

BTW - We have also updated the Virtuoso FAQ and also released a number of missing Virtuoso White Papers (amongst many long overdue action items).

# PermaLink Comments [1]
04/11/2006 18:01 GMT Modified: 07/21/2006 07:22 GMT
MacSOS Releases SyBrowser 6.2 [ Kingsley Uyi Idehen ]
MacSOS Releases SyBrowser 6.2 Dr. Gerard Hammond of MacSOS announced the release of SyBrowser 6.2, a Macintosh application that can query Sybase, FrontBase, PostgreSQL, Oracle, ODBC and MS SQL databases hosted on OSX, UNIX, Linux, and Windows servers. SyBrowser v6.2 features include: - Added FrontBase database support. - Added "Bachman" style ERD features (Tridents for 'Many' arms of a relationship, open circles for optional entities) - The arms of a relationship now track their entities correctly in all directions. - The arm of the selected relationship can be moved using the mouse or the keyboard - Fixed bug with SQL auto-completion popup with multiple monitors. - Enhanced the "Edit Relationships Info..." dialog. This dialog allows the properties of the selected relationship to be edited. - The mouse cursor changes to reflect the draggable direction when resizing ERD tables, or dragging the various arms of a relationship. The selected arm of the selected relationship now has a circle for a handle. - Documentation added. - Printing the ERD panel has been improved. - The Find dialog allows searching the returned datasets on the SQL panel and Results windows as well as the code in Sybase stored procedures. SyBrowser Overview SyBrowser is a table browser and alternative "isql" client for Sybase databases. It facilitates SQL generation thorough a point and click interface. SyBrowser also provides an overview of the tables in ODBC, MySQL, Oracle, FrontBase, PostgreSQL and MS SQL databases. Complex queries can be saved to disk for reuse. An ERD module allows the creation of visual representations of data models. $89 Shareware from MacSOS, Australia $49 upgrade from any previous version http://www.macsos.com.au
# PermaLink Comments [0]
11/19/2004 17:40 GMT Modified: 06/22/2006 08:56 GMT
Demo Hell and back [ Kingsley Uyi Idehen ]

This piece links to a great Mono presentation (bar the reference placement of MySQL/PostgreSQL in a box somewhat adjacent to ADO.NET (see slide 7). When ADO.NET should have be associated with Data Providers for ODBC, MySQL, PostgresSQL, and others for clarity (the natural goal of the presentation).

We have got to take time to understand the Data Access Layer, if we don't we will utlimately pay a hefty price (IMHO).

This blog post is also hillarious, especially if you have encountered the mercurial "Murphy" during live product demos.

So, today I went to hell. And then I came back. It was a short trip.

This year, I am giving a presentation on Mono at Brainshare in Salt Lake City, an intro to Mono for developers. I got a pretty good turnout with a few ximian people in the back (including Joe whom I saw for the first time without a hat).

 

So I plug in my PowerBook 12" as I always do but for some reason I have a hard time getting the projector to display its output. After struggling a little I resort to using the desktop provided by Novell, running Ximian Desktop 2 (and some version Suse Linux).

So I upload my presentation to www.frenchguys.com from my mac and then download it back to the desktop. Now I can make my presentation, which goes well. Then I get to a slide that just says : DEMO. Hmmm. Demo. I don't have Mono installed on that generic machine I was just given. I am going to need magic. So to magic I resort.

[via Monologue]

# PermaLink Comments [0]
03/23/2004 15:04 GMT Modified: 06/22/2006 08:56 GMT
How Databases Changed The World [ Kingsley Uyi Idehen ]

How Databases Changed The World by Tim DiChiara, Site Editor (SearchDatabase.com)

How did the database industry get started? How has it changed the face of business? What were the key milestones, the big obstacles and the lessons learned? I recently came across an interesting panel discussion addressing these very issues, featuring many of the database pioneers and leaders of the last 30 years:

Chris Date
Herb Edelstein
Bob Epstein (Sybase who shared code with Microsoft for remarketing on SQL Server on OS/2 which inevitably lead to the Microsoft SQL Server we know today)
Ken Jacobs (Oracle's Dr. DBA)
Pat Selinger  (DB2 precursor called System R)
Roger Sippl (Informix)
Michael Stonebraker (Ingres, Postgres, and Mariposa)

The event is available via streaming video and was recorded in February at the Computer History Museum in Mountain View, California. After a chatty and lengthy (45 minutes!) introduction only interesting to hardcore insiders, you can see Chris Date waxing eloquent about Ted Codd (complete with quotes from Shakespeare, no less), Herb Edelstein waxing eloquent about Chris Date, and Michael Stonebraker at his geeky best. There's also interesting trivia about the beginnings of SQL, the role of INGRES, why the relational model will stand the test of time and some friendly Oracle and IBM bashing (and Microsoft and Sybase and...). I urge all you data management pros interested in broadening your knowledge of the field to check it out! If you're still not satiated, don't forget about our collection of backgrounders about the DBMS and the data management industry.
# PermaLink Comments [0]
06/09/2003 05:28 GMT Modified: 06/22/2006 08:56 GMT
 <<     | 1 | 2 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform