Details
OpenLink Software
Burlington, United States
Subscribe
Post Categories
Recent Articles
Community Member Blogs
Display Settings
Translate
|
Showing posts in all categories Refresh
New ADO.NET 3.x Provider for Virtuoso Released (Update 2)
[
Kingsley Uyi Idehen
]
I am pleased to announce the immediate availability of the Virtuoso ADO.NET 3.5 data provider for Microsoft's .NET platform.
What is it?
A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
Benefits?
Technical:
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
Strategic:
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
How do I use it?
Simply follow one of guides below:
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
Related
|
01/08/2009 04:36 GMT
|
Modified:
01/08/2009 09:12 GMT
|
Crunchbase & Semantic Web Interview (Remix - Update 1)
[
Kingsley Uyi Idehen
]
After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages). Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
-
Amazon.com
-
Microsoft
-
Google
-
Apple
|
08/27/2008 18:16 GMT
|
Modified:
08/27/2008 20:35 GMT
|
Linked Data enabling PHP Applications
[
Kingsley Uyi Idehen
]
Daniel lewis has penned a variation of post about Linked Data enabling PHP applications such as: Wordpress, phpBB3, MediaWiki etc.
Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database. So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
|
04/10/2008 18:09 GMT
|
Modified:
04/10/2008 14:12 GMT
|
Virtuoso Cluster Preview
[
Virtuso Data Space Bot
]
Virtuoso Cluster Preview
I wrote the basics of the Virtuoso clustering support over the past three weeks. It can now manage connections, decide where things go, do two phase commits, insert and select data from tables partitioned over multiple Virtuoso instances. It works about enough to be measured, of which I will blog more over the next two weeks.
I will in the following give a features preview of what will be in the Virtuoso clustering support when it is released in the fall of this year (2007).
Data Partitioning
A Virtuoso database consists of indices only, so that the row of a table is stored together with the primary key. Blobs are stored on separate pages when they do not fit inline within the row. With clustering, partitioning can be specified index by index. Partitioning means that values of specific columns are used for determining where the containing index entry will be stored. Virtuoso partitions by hash and allows specifying what parts of partitioning columns are used for the hash, for example bits 14-6 of an integer or the first 5 characters of a string. Like this, key compression gains are not lost by storing consecutive values on different partitions.
Once the partitioning is specified, we specify which set of cluster nodes stores this index. Not every index has to be split evenly across all nodes. Also, all nodes do not have to have equal slices of the partitioned index, accommodating differences in capacity between cluster nodes.
Each Virtuoso instance can manage up to 32TB of data. A cluster has no definite size limit.
Load Balancing and Fault Tolerance
When data is partitioned, an operation on the data goes where the data is. This provides a certain natural parallelism but we will discuss this further below.
Some data may be stored multiple times in the cluster, either for fail-over or for splitting read load. Some data, such as database schema, is replicated on all nodes. When specifying a set of nodes for storing the partitions of a key, it is possible to specify multiple nodes for the same partition. If this is the case, updates go to all nodes and reads go to a randomly picked node from the group.
If one of the nodes in the group fails, operation can resume with the surviving node. The failed node can be brought back online from the transaction logs of the surviving nodes. A few transactions may be rolled back at the time of failure and again at the time of the failed node rejoining the cluster but these are aborts as in the case of deadlock and lose no committed data.
Shared Nothing
The Virtuoso architecture does not require a SAN for disk sharing across nodes. This is reasonable since a few disks on a local controller can easily provide 300MB/s of read and passing this over an interconnect fabric that would also have to carry inter-node messages could saturate even a fast network.
Client View
A SQL or HTTP client can connect to any node of the cluster and get an identical view of all data with full transactional semantics. DDL operations like table creation and package installation are limited to one node, though.
Applications such as ODS will run unmodified. They are installed on all nodes with a single install command. After this, the data partitioning must be declared, which is a one time operation to be done cluster by cluster. The only application change is specifying the partitioning columns for each index. The gain is optional redundant storage and capacity not limited to a single machine. The penalty is that single operations may take a little longer when not all data is managed by the same process but then the parallel throughput is increased. We note that the main ODS performance factor is web page logic and not database access. Thus splitting the web server logic over multiple nodes gives basically linear scaling.
Parallel Query Execution
Message latency is the principal performance factor in a clustered database. Due to this, Virtuoso packs the maximum number of operations in a single message. For example, when doing a loop join that reads one table sequentially and retrieves a row of another table for each row of the outer table, a large number of the join of the inner loop are run in parallel. So, if there is a join of five tables that gets one row from each table and all rows are on different nodes, the time will be spent on message latency. If each step of the join gets 10 rows, for a total of 100000 results, the message latency is not a significant factor and the cluster will clearly outperform a single node.
Also, if the workload consists of large numbers of concurrent short updates or queries, the message latencies will even out and throughput will scale up even if doing a single transaction were faster on a single node. Parallel SQL There are SQL extensions for stored procedures allowing parallelizing operations. For example, if a procedure has a loop doing inserts, the inserted rows can be buffered until a sufficient number is available, at which point they are sent in batches to the nodes concerned. Transactional semantics are kept but error detection is deferred to the actual execution.
Transactions
Each transaction is owned by one node of the cluster, the node to which the client is connected. When more than one node besides the owner of the transaction is updated, two phase commit is used. This is transparent to the application code. No external transaction monitor is required, the Virtuoso instances perform these functions internally. There is a distributed deadlock detection scheme based on the nodes periodically sharing transaction waiting information.
Since read transactions can operate without locks, reading the last committed state of uncommitted updated rows, waiting for locks is not very common.
Interconnect and Threading
Virtuoso uses TCP to connect between instances. A single instance can have multiple listeners at different network interfaces for cluster activity. The interfaces will be used in a round-robin fashion by the peers, spreading the load over all network interfaces. A separate thread is created for monitoring each interface. Long messages, such as transfers of blobs are done on a separate thread, thus allowing normal service on the cluster node while the transfer is proceeding.
We will have to test the performance of TCP over Infiniband to see if there is clear gain in going to a lower level interface like MPI. The Virtuoso architecture is based on streams connecting cluster nodes point to point. The design does not per se gain from remote DMA or other features provided by MPI. Typically, messages are quite short, under 100K. Flow control for transfer of blobs is however nice to have but can be written at the application level if needed. We will get real data on the performance of different interconnects in the next weeks.
Deployment and Management
Configuring is quite simple, with each process sharing a copy of the same configuration file. One line in the file differs from host to host, telling it which one it is. Otherwise the database configuration files are individual per host, accommodating different file system layouts etc. Setting up a node requires copying the executable and two configuration files, no more. All functionality is contained in a single process. There are no installers to be run or such.
Changing the number or network interface of cluster nodes requires a cluster restart. Changing data partitioning requires copying the data into a new table and renaming this over the old one. This is time consuming and does not mix well with updates. Splitting an existing cluster node requires no copying with repartitioning but shifting data between partitions does.
A consolidated status report shows the general state and level of intra-cluster traffic as count of messages and count of bytes.
Start, shutdown, backup, and package installation commands can only be issued from a single master node. Otherwise all is symmetrical.
Present State and Next Developments
The basics are now in place. Some code remains to be written for such things as distributed deadlock detection, 2-phase commit recovery cycle, management functions, etc. Some SQL operations like text index, statistics sampling, and index intersection need special support, yet to be written.
The RDF capabilities are not specifically affected by clustering except in a couple of places. Loading will be slightly revised to use larger batches of rows to minimize latency, for example.
There is a pretty much infinite world of SQL optimizations for splitting aggregates, taking advantage of co-located joins etc. These will be added gradually. These are however not really central to the first application of RDF storage but are quite important for business intelligence, for example.
We will run some benchmarks for comparing single host and clustered Virtuoso instances over the next weeks. Some of this will be with real data, giving an estimate on when we can move some of the RDF data we presently host to the new platform. We will benchmark against Oracle and DB2 later but first we get things to work and compare against ourselves.
We roughly expect a halving in space consumption and a significant increase in single query performance and linearly scaling parallel throughput through addition of cluster nodes.
The next update will be on this blog within two weeks.
|
08/27/2007 05:51 GMT
|
Modified:
04/25/2008 11:59 GMT
|
Virtuoso Cluster Preview
[
Orri Erling
]
I wrote the basics of the Virtuoso clustering support over the past three weeks. It can now manage connections, decide where things go, do two phase commits, insert and select data from tables partitioned over multiple Virtuoso instances. It works about enough to be measured, of which I will blog more over the next two weeks.
I will in the following give a features preview of what will be in the Virtuoso clustering support when it is released in the fall of this year (2007).
Data Partitioning
A Virtuoso database consists of indices only, so that the row of a table is stored together with the primary key. Blobs are stored on separate pages when they do not fit inline within the row. With clustering, partitioning can be specified index by index. Partitioning means that values of specific columns are used for determining where the containing index entry will be stored. Virtuoso partitions by hash and allows specifying what parts of partitioning columns are used for the hash, for example bits 14-6 of an integer or the first 5 characters of a string. Like this, key compression gains are not lost by storing consecutive values on different partitions.
Once the partitioning is specified, we specify which set of cluster nodes stores this index. Not every index has to be split evenly across all nodes. Also, all nodes do not have to have equal slices of the partitioned index, accommodating differences in capacity between cluster nodes.
Each Virtuoso instance can manage up to 32TB of data. A cluster has no definite size limit.
Load Balancing and Fault Tolerance
When data is partitioned, an operation on the data goes where the data is. This provides a certain natural parallelism but we will discuss this further below.
Some data may be stored multiple times in the cluster, either for fail-over or for splitting read load. Some data, such as database schema, is replicated on all nodes. When specifying a set of nodes for storing the partitions of a key, it is possible to specify multiple nodes for the same partition. If this is the case, updates go to all nodes and reads go to a randomly picked node from the group.
If one of the nodes in the group fails, operation can resume with the surviving node. The failed node can be brought back online from the transaction logs of the surviving nodes. A few transactions may be rolled back at the time of failure and again at the time of the failed node rejoining the cluster but these are aborts as in the case of deadlock and lose no committed data.
Shared Nothing
The Virtuoso architecture does not require a SAN for disk sharing across nodes. This is reasonable since a few disks on a local controller can easily provide 300MB/s of read and passing this over an interconnect fabric that would also have to carry inter-node messages could saturate even a fast network.
Client View
A SQL or HTTP client can connect to any node of the cluster and get an identical view of all data with full transactional semantics. DDL operations like table creation and package installation are limited to one node, though.
Applications such as ODS will run unmodified. They are installed on all nodes with a single install command. After this, the data partitioning must be declared, which is a one time operation to be done cluster by cluster. The only application change is specifying the partitioning columns for each index. The gain is optional redundant storage and capacity not limited to a single machine. The penalty is that single operations may take a little longer when not all data is managed by the same process but then the parallel throughput is increased. We note that the main ODS performance factor is web page logic and not database access. Thus splitting the web server logic over multiple nodes gives basically linear scaling.
Parallel Query Execution
Message latency is the principal performance factor in a clustered database. Due to this, Virtuoso packs the maximum number of operations in a single message. For example, when doing a loop join that reads one table sequentially and retrieves a row of another table for each row of the outer table, a large number of the join of the inner loop are run in parallel. So, if there is a join of five tables that gets one row from each table and all rows are on different nodes, the time will be spent on message latency. If each step of the join gets 10 rows, for a total of 100000 results, the message latency is not a significant factor and the cluster will clearly outperform a single node.
Also, if the workload consists of large numbers of concurrent short updates or queries, the message latencies will even out and throughput will scale up even if doing a single transaction were faster on a single node. Parallel SQL There are SQL extensions for stored procedures allowing parallelizing operations. For example, if a procedure has a loop doing inserts, the inserted rows can be buffered until a sufficient number is available, at which point they are sent in batches to the nodes concerned. Transactional semantics are kept but error detection is deferred to the actual execution.
Transactions
Each transaction is owned by one node of the cluster, the node to which the client is connected. When more than one node besides the owner of the transaction is updated, two phase commit is used. This is transparent to the application code. No external transaction monitor is required, the Virtuoso instances perform these functions internally. There is a distributed deadlock detection scheme based on the nodes periodically sharing transaction waiting information.
Since read transactions can operate without locks, reading the last committed state of uncommitted updated rows, waiting for locks is not very common.
Interconnect and Threading
Virtuoso uses TCP to connect between instances. A single instance can have multiple listeners at different network interfaces for cluster activity. The interfaces will be used in a round-robin fashion by the peers, spreading the load over all network interfaces. A separate thread is created for monitoring each interface. Long messages, such as transfers of blobs are done on a separate thread, thus allowing normal service on the cluster node while the transfer is proceeding.
We will have to test the performance of TCP over Infiniband to see if there is clear gain in going to a lower level interface like MPI. The Virtuoso architecture is based on streams connecting cluster nodes point to point. The design does not per se gain from remote DMA or other features provided by MPI. Typically, messages are quite short, under 100K. Flow control for transfer of blobs is however nice to have but can be written at the application level if needed. We will get real data on the performance of different interconnects in the next weeks.
Deployment and Management
Configuring is quite simple, with each process sharing a copy of the same configuration file. One line in the file differs from host to host, telling it which one it is. Otherwise the database configuration files are individual per host, accommodating different file system layouts etc. Setting up a node requires copying the executable and two configuration files, no more. All functionality is contained in a single process. There are no installers to be run or such.
Changing the number or network interface of cluster nodes requires a cluster restart. Changing data partitioning requires copying the data into a new table and renaming this over the old one. This is time consuming and does not mix well with updates. Splitting an existing cluster node requires no copying with repartitioning but shifting data between partitions does.
A consolidated status report shows the general state and level of intra-cluster traffic as count of messages and count of bytes.
Start, shutdown, backup, and package installation commands can only be issued from a single master node. Otherwise all is symmetrical.
Present State and Next Developments
The basics are now in place. Some code remains to be written for such things as distributed deadlock detection, 2-phase commit recovery cycle, management functions, etc. Some SQL operations like text index, statistics sampling, and index intersection need special support, yet to be written.
The RDF capabilities are not specifically affected by clustering except in a couple of places. Loading will be slightly revised to use larger batches of rows to minimize latency, for example.
There is a pretty much infinite world of SQL optimizations for splitting aggregates, taking advantage of co-located joins etc. These will be added gradually. These are however not really central to the first application of RDF storage but are quite important for business intelligence, for example.
We will run some benchmarks for comparing single host and clustered Virtuoso instances over the next weeks. Some of this will be with real data, giving an estimate on when we can move some of the RDF data we presently host to the new platform. We will benchmark against Oracle and DB2 later but first we get things to work and compare against ourselves.
We roughly expect a halving in space consumption and a significant increase in single query performance and linearly scaling parallel throughput through addition of cluster nodes.
The next update will be on this blog within two weeks.
|
08/27/2007 09:44 GMT
|
Modified:
04/25/2008 11:59 GMT
|
Virtuoso Cluster
[
Orri Erling
]
We often get questions on clustering support, especially around RDF, where databases quickly get rather large. So we will answer them here.
But first on some support technology. We have an entire new disk allocation and IO system. It is basically operational but needs some further tuning. It offers much better locality and much better sequential access speeds.
Specially for dealing with large RDF databases, we will introduce data compression. We have over the years looked at different key compression possibilities but have never been very excited by them since thy complicate random access to index pages and make for longer execution paths, require scraping data for one logical thing from many places, and so on. Anyway, now we will compress pages before writing them to disk, so the cache is in machine byte order and alignment and disk is compressed. Since multiple processors are commonplace on servers, they can well be used for compression, that being such a nicely local operation, all in cache and requiring no serialization with other things.
Of course, what was fixed length now becomes variable length, but if the compression ratio is fairly constant, we reserve space for the expected compressed size, and deal with the rare overflows separately. So no complicated shifting data around when something grows.
Once we are done with this, this could well be a separate intermediate release.
Now about clusters. We have for a long time had various plans for clusters but have not seen the immediate need for execution. With the rapid growth in the Linking Open Data movement and questions on web scale knowledge systems, it is time to get going.
How will it work? Virtuoso remains a generic DBMS, thus the clustering support is an across the board feature, not something for RDF only. So we can join Oracle, IBM DB2, and others at the multi-terabyte TPC races.
We introduce hash partitioning at the index level and allow for redundancy, where multiple nodes can serve the same partition, allowing for load balancing read and replacement of failing nodes and growth of cluster without interruption of service.
The SQL compiler, SPARQL, and database engine all stay the same. There is a little change in the SQL run time, not so different from what we do with remote databases at present in the context of our virtual database federation. There is a little extra complexity for distributed deadlock detection and sometimes multiple threads per transaction. We remember that one RPC round trip Is 3-4 index lookups, so we pipeline things so as to move requests in batches, a few dozen at a time.
The cluster support will be in the same executable and will be enabled by configuration file settings. Administration is limited to one node, but Web and SQL clients can connect to any node and see the same data. There is no balancing between storage and control nodes because clients can simply be allocated round robin for statistically even usage. In relational applications, as exemplified by TPC-C, if one partitions by fields with an application meaning (such as warehouse ID), and if clients have an affinity to a particular chunk of data, they will of course preferentially connect to nodes hosting this data. With RDF, such affinity is unlikely, so nodes are basically interchangeable.
In practice, we develop in June and July. Then we can rent a supercomputer maybe from Amazon EC2 and experiment away.
We should just come up with a name for this. Maybe something astronomical, like star cluster. Big, bright but in this case not far away.
|
05/23/2007 14:11 GMT
|
Modified:
04/24/2008 09:52 GMT
|
Virtuoso Cluster
We often get questions on clustering support, especially around RDF, where databases quickly get rather large. So we will answer them here.
But first on some support technology. We have an entire new disk allocation and IO system. It is basically operational but needs some further tuning. It offers much better locality and much better sequential access speeds.
Specially for dealing with large RDF databases, we will introduce data compression. We have over the years looked at different key compression possibilities but have never been very excited by them since thy complicate random access to index pages and make for longer execution paths, require scraping data for one logical thing from many places, and so on. Anyway, now we will compress pages before writing them to disk, so the cache is in machine byte order and alignment and disk is compressed. Since multiple processors are commonplace on servers, they can well be used for compression, that being such a nicely local operation, all in cache and requiring no serialization with other things.
Of course, what was fixed length now becomes variable length, but if the compression ratio is fairly constant, we reserve space for the expected compressed size, and deal with the rare overflows separately. So no complicated shifting data around when something grows.
Once we are done with this, this could well be a separate intermediate release.
Now about clusters. We have for a long time had various plans for clusters but have not seen the immediate need for execution. With the rapid growth in the Linking Open Data movement and questions on web scale knowledge systems, it is time to get going.
How will it work? Virtuoso remains a generic DBMS, thus the clustering support is an across the board feature, not something for RDF only. So we can join Oracle, IBM DB2, and others at the multi-terabyte TPC races.
We introduce hash partitioning at the index level and allow for redundancy, where multiple nodes can serve the same partition, allowing for load balancing read and replacement of failing nodes and growth of cluster without interruption of service.
The SQL compiler, SPARQL, and database engine all stay the same. There is a little change in the SQL run time, not so different from what we do with remote databases at present in the context of our virtual database federation. There is a little extra complexity for distributed deadlock detection and sometimes multiple threads per transaction. We remember that one RPC round trip Is 3-4 index lookups, so we pipeline things so as to move requests in batches, a few dozen at a time.
The cluster support will be in the same executable and will be enabled by configuration file settings. Administration is limited to one node, but Web and SQL clients can connect to any node and see the same data. There is no balancing between storage and control nodes because clients can simply be allocated round robin for statistically even usage. In relational applications, as exemplified by TPC-C, if one partitions by fields with an application meaning (such as warehouse ID), and if clients have an affinity to a particular chunk of data, they will of course preferentially connect to nodes hosting this data. With RDF, such affinity is unlikely, so nodes are basically interchangeable.
In practice, we develop in June and July. Then we can rent a supercomputer maybe from Amazon EC2 and experiment away.
We should just come up with a name for this. Maybe something astronomical, like star cluster. Big, bright but in this case not far away.
|
05/23/2007 10:09 GMT
|
Modified:
04/24/2008 09:52 GMT
|
"Free" Databases: Express vs. Open-Source RDBMSs
[
Kingsley Uyi Idehen
]
Very detailed and insightful peek into the state of affairs re. database engines (Open & Closed Source). I added the missing piece regarding the "Virtuoso Conductor" (the Web based Admin UI for Virtuoso) to the original post below. I also added a link to our live SPARQL Demo so that anyone interested can start playing around with SPARQL and SPARQL integrated into SQL right away. Another good thing about this post is the vast amount of valuable links that it contains. To really appreciate this point simply visit my Linkblog (excuse the current layout :-) - a Tab if you come in via the front door of this Data Space (what I used to call My Weblog Home Page). "Free" Databases: Express vs. Open-Source RDBMSs: "Open-source relational database management systems (RDBMSs) are gaining IT mindshare at a rapid pace. As an example, BusinessWeek's February 6, 2006 ' Taking On the Database Giants ' article asks 'Can open-source upstarts compete with Oracle, IBM, and Microsoft?' and then provides the answer: 'It's an uphill battle, but customers are starting to look at the alternatives.' There's no shortage of open-source alternatives to look at. The BusinessWeek article concentrates on MySQL, which BW says 'is trying to be the Ikea of the database world: cheap, needs some assembly, but has a sleek, modern design and does the job.' The article also discusses Postgre[SQL] and Ingres, as well as EnterpriseDB, an Oracle clone created from PostgreSQL code*. Sun includes PostgreSQL with Solaris 10 and, as of April 6, 2006, with Solaris Express.** *Frank Batten, Jr., the investor who originally funded Red Hat, invested a reported $16 million into Great Bridge with the hope of making a business out of providing paid support to PostgreSQL users. Great Bridge stayed in business only 18 months , having missed an opportunity to sell the business to Red Hat and finding that selling $50,000-per-year support packages for an open-source database wasn't easy. As Batten concluded, 'We could not get customers to pay us big dollars for support contracts.' Perhaps EnterpriseDB will be more successful with a choice of $5,000, $3,000, or $1,000 annual support subscriptions . **Interestingly, Oracle announced in November 2005 that Solaris 10 is 'its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems.' There is a surfeit of reviews of current MySQL, PostgreSQL and—to a lesser extent—Ingres implementations. These three open-source RDBMSs come with their own or third-party management tools. These systems compete against free versions of commercial (proprietary) databases: SQL Server 2005 Express Edition (and its MSDE 2000 and 1.0 predecessors), Oracle Database 10g Express Edition, IBM DB2 Express-C, and Sybase ASE Express Edition for Linux where database size and processor count limitations aren't important. Click here for a summary of recent InfoWorld reviews of the full versions of these four databases plus MySQL, which should be valid for Express editions also. The FTPOnline Special Report article, 'Microsoft SQL Server Turns 17,' that contains the preceding table is here (requires registration.) SQL Server 2005 Express Edition SP-1 Advanced Features SQL Server 2005 Express Edition with Advanced Features enhances SQL Server 2005 Express Edition (SQL Express or SSX) dramatically, so it deserves special treatment here. SQL Express gains full text indexing and now supports SQL Server Reporting Services (SSRS) on the local SSX instance. The SP-1 with Advanced Features setup package, which Microsoft released on April 18, 2006, installs the release version of SQL Server Management Studio Express (SSMSE) and the full version of Business Intelligence Development Studio (BIDS) for designing and editing SSRS reports. My 'Install SP-1 for SQL Server 2005 and Express' article for FTPOnline's SQL Server Special Report provides detailed, illustrated installation instructions for and related information about the release version of SP-1. SP-1 makes SSX the most capable of all currently available Express editions of commercial RDBMSs for Windows. OpenLink Software's Virtuoso Open-Source Edition OpenLink Software announced an open-source version of it's Virtuoso Universal Server commercial DBMS on April 11, 2006. On the initial date of this post, May 2, 2006, Virtuoso Open-Source Edition (VOS) was virtually under the radar as an open-source product. According to this press release, the new edition includes: - SPARQL compliant RDF Triple Store
- SQL-200n Object-Relational Database Engine (SQL, XML, and Free Text)
- Integrated BPEL Server and Enterprise Service Bus
- WebDAV and Native File Server
- Web Application Server that supports PHP, Perl, Python, ASP.NET, JSP, etc.
- Runtime Hosting for Microsoft .NET, Mono, and Java
VOS only lacks the virtual server and replication features that are offered by the commercial edition. VOS includes a Web-based administration tool called the "Virtuoso Conductor" According to Kingsley Idehen's Weblog, 'The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).' InfoWorld's Jon Udell has tracked Virtuoso's progress since 2002, with an additional article in 2003 and a one-hour podcast with Kingsley Idehen on April 26, 2006. A major talking point for Virtuoso is its support for Atom 0.3 syndication and publication, Atom 1.0 syndication and (forthcoming) publication, and future support for Google's GData protocol, as mentioned in this Idehen post. Yahoo!'s Jeremy Zawodny points out that the 'fingerprints' of Adam Bosworth, Google's VP of Engineering and the primary force behind the development of Microsoft Access, 'are all over GData.' Click here to display a list of all OakLeaf posts that mention Adam Bosworth. One application for the GData protocol is querying and updating the Google Base database independently of the Google Web client, as mentioned by Jeremy: 'It's not about building an easier onramp to Google Base. ... Well, it is. But, again, that's the small stuff.' Click here for a list of posts about my experiences with Google Base. Watch for a future OakLeaf post on the subject as the GData APIs gain ground. Open-Source and Free Embedded Database Contenders Open-source and free embedded SQL databases are gaining importance as the number and types of mobile devices and OSs proliferate. Embedded databases usually consist of Java classes or Windows DLLs that are designed to minimize file size and memory consumption. Embedded databases avoid the installation hassles, heavy resource usage and maintenance cost associated with client/server RDBMSs that run as an operating system service. Andrew Hudson's December 2005 'Open Source databases rounded up and rodeoed' review for The Enquirer provides brief descriptions of one commercial and eight open source database purveyors/products: Sleepycat, MySQL, PostgreSQL, Ingres, InnoBase, Firebird, IBM Cloudscape (a.k.a, Derby), Genezzo, and Oracle. Oracle Sleepycat* isn't an SQL Database, Oracle InnoDB* is an OEM database engine that's used by MySQL, and Genezzo is a multi-user, multi-server distributed database engine written in Perl. These special-purpose databases are beyond the scope of this post. * Oracle purchased Sleepycat Software, Inc. in February 2006 and purchased Innobase OY in October 2005 . The press release states: 'Oracle intends to continue developing the InnoDB technology and expand our commitment to open source software.' Derby is an open-source release by the Apache Software Foundation of the Cloudscape Java-based database that IBM acquired when it bought Informix in 2001. IBM offers a commercial release of Derby as IBM Cloudscape 10.1. Derby is a Java class library that has a relatively light footprint (2 MB), which make it suitable for client/server synchronization with the IBM DB2 Everyplace Sync Server in mobile applications. The IBM DB2 Everyplace Express Edition isn't open source or free*, so it doesn't qualify for this post. The same is true for the corresponding Sybase SQL Anywhere components.** * IBM DB2 Everyplace Express Edition with synchronization costs $379 per server (up to two processors) and $79 per user. DB2 Everyplace Database Edition (without DB2 synchronization) is $49 per user. (Prices are based on those when IBM announced version 8 in November 2003.) ** Sybase's iAnywhere subsidiary calls SQL Anywhere 'the industry's leading mobile database.' A Sybase SQL Anywhere Personal DB seat license with synchronization to SQL Anywhere Server is $119; the cost without synchronization wasn't available from the Sybase Web site. Sybase SQL Anywhere and IBM DB2 Everyplace perform similar replication functions. Sun's Java DB, another commercial version of Derby, comes with the Solaris Enterprise Edition, which bundles Solaris 10, the Java Enterprise System, developer tools, desktop infrastructure and N1 management software. A recent Between the Lines blog entry by ZDNet's David Berlind waxes enthusiastic over the use of Java DB embedded in a browser to provide offline persistence. RedMonk analyst James Governor and eWeek's Lisa Vaas wrote about the use of Java DB as a local data store when Tim Bray announced Sun's Derby derivative and Francois Orsini demonstrated Java DB embedded in the Firefox browser at the ApacheCon 2005 conference. Firebird is derived from Borland's InterBase 6.0 code, the first commercial relational database management system (RDBMS) to be released as open source. Firebird has excellent support for SQL-92 and comes in three versions: Classic, SuperServer and Embedded for Windows, Linux, Solaris, HP-UX, FreeBSD and MacOS X. The embedded version has a 1.4-MB footprint. Release Candidate 1 for Firebird 2.0 became available on March 30, 2006 and is a major improvement over earlier versions. Borland continues to promote InterBase, now at version 7.5, as a small-footprint, embedded database with commercial Server and Client licenses. SQLite is a featherweight C library for an embedded database that implements most SQL-92 entry- and transitional-level requirements (some through the JDBC driver) and supports transactions within a tiny 250-KB code footprint. Wrappers support a multitude of languages and operating systems, including Windows CE, SmartPhone, Windows Mobile, and Win32. SQLite's primary SQL-92 limitations are lack of nested transactions, inability to alter a table design once committed (other than with RENAME TABLE and ADD COLUMN operations), and foreign-key constraints. SQLite provides read-only views, triggers, and 256-bit encryption of database files. A downside is the the entire database file is locked when while a transaction is in progress. SQLite uses file access permissions in lieu of GRANT and REVOKE commands. Using SQLite involves no license; its code is entirely in the public domain. The Mozilla Foundation's Unified Storage wiki says this about SQLite: 'SQLite will be the back end for the unified store [for Firefox]. Because it implements a SQL engine, we get querying 'for free', without having to invent our own query language or query execution system. Its code-size footprint is moderate (250k), but it will hopefully simplify much existing code so that the net code-size change should be smaller. It has exceptional performance, and supports concurrent access to the database. Finally, it is released into the public domain, meaning that we will have no licensing issues.' Vieka Technology, Inc.'s eSQL 2.11 is a port of SQLite to Windows Mobile (Pocket PC and Smartphone) and Win32, and includes development tools for Windows devices and PCs, as well as a .NET native data provider. A conventional ODBC driver also is available. eSQL for Windows (Win32) is free for personal and commercial use; eSQL for Windows Mobile requires a license for commercial (for-profit or business) use. HSQLDB isn't on most reviewers' radar, which is surprising because it's the default database for OpenOffice.org (OOo) 2.0's Base suite member. HSQLDB 1.8.0.1 is an open-source (BSD license) Java dembedded database engine based on Thomas Mueller's original Hypersonic SQL Project. Using OOo's Base feature requires installing the Java 2.0 Runtime Engine (which is not open-source) or the presence of an alternative open-source engine, such as Kaffe. My prior posts about OOo Base and HSQLDB are here, here and here. The HSQLDB 1.8.0 documentation on SourceForge states the following regarding SQL-92 and later conformance: HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and here is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards. Other less well-known embedded databases designed for or suited to mobile deployment are Mimer SQL Mobile and VistaDB 2.1 . Neither product is open-source and require paid licensing; VistaDB requires a small up-front payment by developers but offers royalty-free distribution. Java DB, Firebird embedded, SQLite and eSQL 2.11 are contenders for lightweight PC and mobile device database projects that aren't Windows-only. SQL Server 2005 Everywhere If you're a Windows developer, SQL Server Mobile is the logical embedded database choice for mobile applications for Pocket PCs and Smartphones. Microsoft's April 19, 2006 press release delivered the news that SQL Server 2005 Mobile Editon (SQL Mobile or SSM) would gain a big brother—SQL Server 2005 Everywhere Edition. Currently, the SSM client is licensed (at no charge) to run in production on devices with Windows CE 5.0, Windows Mobile 2003 for Pocket PC or Windows Mobile 5.0, or on PCs with Windows XP Tablet Edition only. SSM also is licensed for development purposes on PCs running Visual Studio 2005. Smart Device replication with SQL Server 2000 SP3 and later databases has been the most common application so far for SSM. By the end of 2006, Microsoft will license SSE for use on all PCs running any Win32 version or the preceding device OSs. A version of SQL Server Management Studio Express (SSMSE)—updated to support SSE—is expected to release by the end of the year. These features will qualify SSE as the universal embedded database for Windows client and smart-device applications. For more details on SSE, read John Galloway's April 11, 2006 blog post and my 'SQL Server 2005 Mobile Goes Everywhere' article for the FTPOnline Special Report on SQL Server." (Via OakLeaf Systems.)
|
05/05/2006 16:02 GMT
|
Modified:
07/21/2006 07:21 GMT
|
Prerelational DBMS vendors — a quick overview
[
Kingsley Uyi Idehen
]
Prerelational DBMS vendors — a quick overview: "
IBM. With BOMP and D-BOMP, IBM was probably the first company to commercialize precursors to DBMS. (BOMP stood for Bill Of Materials Planning, foreshadowing the hierarchical architecture of IMS.) Out of those grew DL/1 and IMS, IBM’s flagship hierarchical DBMS, and the world’s first dominant DBMS product(s). Of course, IBM also innovated relational DBMS, via the research of E. F. ‘Ted’ Codd, then some prototype products, and eventual the mainframe version of DB2. To this day DB2 on the mainframe remains one of the world’s major DBMS, as does the separate but related product of DB2 for ‘open systems.’
Cincom. In the 1970s, Cincom was probably the most successful independent software product company. Its flagship product was Total, a shallow-network DBMS that was a little more general than the strictly hierarchical IMS. What’s more, Total ran on almost any brand of computer hardware. Cincom remains independent and privately held to this day.
Cullinane/Cullinet. Charlie Bachman innovated a true network DBMS at Honeywell, but it didn’t turn into a serious product at that time. B. F. Goodrich, however, ran a version. This is what John Cullinane’s company bought and turned into IDMS, which at least on the mainframe supplanted Total as the technical, mind share, and probably revenue market leader. Cullinet (as it was then called) ran into technical difficulties, however, losing ground to the more flexible index-based DBMS. It was eventually sold to Computer Associates.
A lot of software industry leaders cut their teeth at Cullinet, notably Andrew ‘Flip’ Filipowski, later the colorful founder of Platinum. Other alumni include Renato ‘Ron’ Zambonini, Dave Litwack, Dave Ireland, and the original PowerBuilder development team. John Landry and Bob Weiler ran the firm for a while toward the end, but they don’t really count; rather, they’re the most prominent alumni of applications pioneer McCormack & Dodge.
Note: Index-based is a term I used in and probably coined for my first report in 1982, comprising both inverted-list and relational RDBMS, as opposed to the link(ed)-list hierarchical and network products such as IMS, Total, and IDBMS. The companies that beat Cullinet were long-time rival Software AG, and then especially Applied Data Research; then all three of those independents were blown out by IBM’s DB2. And then the whole mainframe DBMS business was in turn obsoleted by the rise of UNIX … but I’m getting ahead of my story.
Software AG. Like Cincom, Germany-based Software AG is a 1970s DBMS pioneer that has always remained independent and privately held. Sort of. Twice, Software AG of North America was spun off as a separate, eventually public company. Software AG’s flagship DBMS was the inverted list product ADABAS. SAP’s MaxDB was also owned by Software AG for a while (and seemingly by every other significant German computer company as well – or more precisely, by Nixdorf where it was developed, and by Siemens after it bought Nixdorf).
I actually visited Software AG in Darmstadt once. Founder Peter Schnell and key techie Peter Page were both gracious hosts. Schnell was proud of their new building, and especially of the hexagon-based wooden dual desks he’d personally designed. General analytic rule – when the CEO is focused on the décor, this is not a good sign for the company’s near-term prospects. (I call this having an ‘edifice complex.’)
Applied Data Research (ADR). ADR is often credited as being the first independent software company, having introduced products in the late 1960s and prevailed in antitrust struggles against IBM to allow the business to survive. Basically, it sold programmer productivity tools. This led it to acquire Datacom/DB, an inverted-list DBMS developed in the Dallas area. In the early 1980s, Datacom/DB began to boom, and was on a track to surpass both IDMS and ADABAS in market share until DB2 showed up and blew them all away. ADR was particularly aided by its fourth-generation language (4GL) IDEAL, which was an excellent product notwithstanding the famous State of New Jersey fiasco. (As John Landry said to me about that one, ‘4GLs are powerful tools. In particular, they allow you to write bad programs really quickly.’)
ADR was an underappreciated powerhouse, boasting all of the Fortune 100 as customers way back in the early 1980s (yes, even archrival IBM). When the DBMS business stalled, however, ADR was quickly sold — first to Ameritech (the Illinois-based Baby Bell company), and soon thereafter to Computer Associates.
Computer Corporation of America (CCA). CCA’s DBMS Model 204 may have been the best of the prerelational products, boasting an inverted-list architecture akin to that of ADABAS and Datacom/DB. The company was also interesting in that it was first and foremost a government contract research shop, and hence did all sorts of interesting prototype work that sadly never got commercialized. In about 1983 it became that the company wasn’t going anywhere, and it put itself up for sale.
I was personally instrumental in that decision. Our investment banker pretended he was considering taking CCA public. CCA President Jim Rothnie showed us revenue projections. I asked how he had gotten them. He replied that he had taken the market size projection 5 years out, assumed 10%, and drawn a ‘plausible curve.’ However, I quickly got Socratic with him. ‘How many salesmen do you have?’ ‘How much revenue does the average experienced salesman produce?’ ‘How many experienced salesmen do you expect to have next year?’ ‘How high do you think their average productivity can grow?’ ‘Let us multiply.’ (Yes, I really said that. I can be a jerk. And anyway Jim was the sort of analytic guy one can say that to without giving serious offense.)
CCA was sold to a Canadian insurance company whose name I’ve now forgotten. Eventually, it was spun back out (perhaps after some intermediate changes of ownership), and resurfaced as primarily a data integration company, called Praxis.
In the real old days (mid 1970s, perhaps), Model 204 was resold by Informatics (later Informatics General, later the hostile takeover that became the guts of Sterling Software, which like so many other companies was eventually absorbed into Computer Associates). I know this because Richard Currier used to sell the product when he worked at Informatics. That probably makes Richard and me about the only two people who still remember the fact.
Hmm. I forgot to mention Intel’s System 2000. Well, truth be told it was a dying product even back when I first became an analyst in 1981, and I recall nothing about it, except Gene Lowenthal’s observation that Intel had had trouble selling chips and DBMS through the same salesforce. I think Al Sisto, who I probably met when he was head of sales at RTI (Relational Technology, Inc. — later called Ingres), came out of that business, but I’m not 100% sure. I remember Pete Tierney from that RTI management team more clearly anyway, although that’s mainly because we stayed in touch at subsequent companies over the years. "
(Via Software Memories.)
|
04/13/2006 19:04 GMT
|
Modified:
06/22/2006 08:56 GMT
|
|
|