Hugh Williams and I (Orri Erling) went to the Open PHACTS Steering Committee meeting in Vienna last week. I am a great fan of Open PHACTS; the meetings are fun, with a great team spirit, and there is always something new to learn.

Paul Groth gave a talk about the stellar success of the the initial term of Open PHACTS.

  • Three releases of platform and data
  • 18 applications
  • Open PHACTS Foundation for sustainable exploitation and further development of the platform
  • Superb culture of collaboration
    • great team spirit
    • great output from distributed organization
    • lots of face-to-face time
    • example to every other big collaborative project

"The reincarnation of Steve Jobs," commented someone from the audience. "Except I am a nice guy," retorted Paul.

Commented one attendee, "The semantic web…., I just was in Boston at a semantic web meeting – so nerdy, something to make you walk out of the room… so it is a definite victory for Open PHACTS and why not also semantic web, that something based on these principles actually works."

It is a win anyhow, so I did not say anything at the meeting. So I will say something here, where I have more space as the message bears repeating.

We share part of the perception, so we hardly ever say "semantic web." The word is "linked data," and it means flexible schema and global identifiers. Flexible schema means that everything does not have to be modeled upfront. Global identifiers means that data, when transferred out of its silo of origin, remains interpretable and self-describing, so you can mix it with other data without things getting confused. "Desiloization" is a wonderful new word for describing this.

This ties right into FAIRport and FAIR data: Findable, Accessible, Interoperable, Reusable. Barend Mons talked a lot about this: open just means downloadable; fair means something you can do science with. Barend’s take is that RDF with a URI for everything is the super wire format for exchanging data. When you process it, you will diversely cook it, so an RDF store is one destination but not the only possibility. It has been said before: there is a range of choices between storing triples verbatim, and making application specific extractions, including ones with a schema, whether graph DB or relational.

Nanopublications are also moving ahead. Christine Chichester told me about pending publications involving Open PHACTS nanopublictions about post-translation modification of proteins and their expression in different tissues. So there are nanopublications out there and they can be joined, just as intended. Victory of e-science and data integration.

The Open PHACTS project is now officially extended for another two-year term, bringing the total duration to five years. The Open PHACTS Foundation exists as a legal entity and has its first members. This is meant to be a non-profit industry association for sharing of pre-competitive data and services around these between players in the pharma space, in industry as well as academia. There are press releases to follow in due time.

I am looking forward to more Open PHACTS. From the OpenLink and Virtuoso side, there are directly relevant developments that will enter production in the next few months, including query caching discussed earlier on this blog, as well as running on the TPC-H tuned analytics branch for overall better query optimization. Adaptive schema is something of evident value to Open PHACTS, as much of the integrated data comes from relational sources, so is regular enough. Therefore taking advantage of this for storage cannot hurt. We will see this still within the scope of the project extension.

Otherwise, more cooperation in formulating the queries for the business questions will also help.

All in all, Open PHACTS is the celebrated beauty queen of all the Innovative Medicine Initiative, it would seem. Superbly connected, unparalleled logo cloud, actually working and useful data integration, delivering on time on all in fact very complex business questions.