Mapping Relational Databases to RDF with OpenLink Virtuoso

Comments

Great slides

Posted By: Vancouver Printing On: 06/13/10 11:35 PM

Awesome slides

Posted By: Kelowna Mortgages On: 07/12/10 6:16 PM

Add Comment

Please enter a valid email address.

Optional. Enter a URL for your website.

  Remember Me
  Notify me of follow up comments

Transcript

no image

Slide Text

Slide Notes


Mapping Relational Databases to RDF with OpenLink Virtuoso

no text exists for this slide

no notes exist for this slide

Who Wants to Map



Semantic Web Scalers
Expose whatever there is as RDF, the next guy will unify terms, make search and apps
Data Warehouse Keepers
Data is spread out, has implicit semantics, complex schemas, heterogeneous sources, ambiguous terms but we must make it join and aggregate cleanly

no notes exist for this slide

Present State



SPARQL to SQL exists but still, complex integrations are data warehouses
We'd really like to map, but...
Can it be otherwise?

no notes exist for this slide

Why RDF Data Warehouse


Pros
Pros
Even query performance across all data
Possibility of forward-chaining inference
Some SPARQL features may be better supported, e.g. Unspecified predicates
Cons
Keeping data up-to-date
Complex set up, needs dedicated servers: you don't build them on a whim

no notes exist for this slide

Why Map



No copying, no timeliness issues
RDBMS outperforms RDF for analytics workloads
Agile reconfiguration without reloading data

no notes exist for this slide

Virtuoso


Mapping of SPARQL to SQL against any existing schema - whether stored in Virtuoso or elsewhere
Mapping of SPARQL to SQL against any existing schema - whether stored in Virtuoso or elsewhere
Physical quad store
Federated/local RDBMS

no notes exist for this slide

For Mapping to Deliver


Tackle any SQL analytics workload in SPARQL without extra cost
Tackle any SQL analytics workload in SPARQL without extra cost
Deal with arbitrary SQL schema
Produce single SQL statements, optimizable by target RDBMS
Have intelligence for cases where one RDF entity can come from many relational sources

no notes exist for this slide

The Cases of Integration


Bring similar but heterogeneous schemas into a unified ontology - Union View
Bring similar but heterogeneous schemas into a unified ontology - Union View
Translate FKs of one schema to PKs in another - Distributed Join
Hide differences in normalization - Views for hiding joins
- Unit/Terminology conversions

no notes exist for this slide

Defining a Mapping


Define URI formats and their subclass relations
Define URI formats and their subclass relations
Define which key-column-value combinations make a triple
Arbitrary SQL is allowed for mapping values and filtering
A single RDF node can be a composite of many columns, e.g. multipart key

no notes exist for this slide

The TPCH Case




The 22 queries as extended SPARQL
Each generates a single SQL statement, executable by Virtuoso, Oracle, Others
Next make several TPC-H databases on different servers and run the queries against the union

no notes exist for this slide

Where Problems Begin


In OpenLink Data Spaces, 6 Collaborative apps all mapped to SIOC:
In OpenLink Data Spaces, 6 Collaborative apps all mapped to SIOC:
Trivially becomes a union of everything, 1000+ lines of SQL
Intelligently (once per app) becomes a Union of :

no notes exist for this slide

What One Must Know


Mapping for integration is not trivial
Mapping for integration is not trivial
Be careful when mapping multiple tables/columns to one class/property
Make URI schemes which encode type and source, so that senseless joins are not attempted if types not specified in query
Understand what the mapping logic can and cannot optimize
Understand what SQL can and cannot optimize
View resulting SQL for sanity check

no notes exist for this slide

SQL Extensions


Mapping must work against any RDBMS/Schema, as is
Mapping must work against any RDBMS/Schema, as is
But there is Virtuoso SQL between the mapping and target RDBMS(s)
Location and latency - conscious distributed cost model
Breakup for making a wide result set into a row per property
Inverse functions

no notes exist for this slide

Use Cases


OpenLink Data Spaces - Blog, Wiki, News, Social Network, Feed Aggregation, Tag Clouds, Bookmarks etc.
OpenLink Data Spaces - Blog, Wiki, News, Social Network, Feed Aggregation, Tag Clouds, Bookmarks etc.
OpenLink's own MIS - “total information awareness”: URI for any CRM Object, Account, Product, Support Case, Email etc..
Musicbrainz
phpBB, Drupal, MediaWiki, WordPress, Bugzilla, and others.

no notes exist for this slide

OpenLink Software

no text exists for this slide

no notes exist for this slide