Virtuoso_Deploying_Linked_Data

Comments

There aren't any comments for this presentation.

Add Comment

Please enter a valid email address.

Optional. Enter a URL for your website.

  Remember Me
  Notify me of follow up comments

Transcript

no image

Slide Text

Slide Notes


OpenLink Virtuoso Linked Data

no text exists for this slide

no notes exist for this slide

Linked Data


“Linked Data” – Title of a Web Design Issues Note by Tim Berners-Lee
“Linked Data” – Title of a Web Design Issues Note by Tim Berners-Lee
An effort to evolve current “Web of Documents” into a “Web of Linked Data”
Describes recommended best practice for injecting data into the Web
Use the RDF data model
Name real or abstract things (resources) in your ‘universe of discourse’ (Data Spaces), using URIs as unique IDs
Make URIs accessible via HTTP so people can discover and explore your data via the Web
Expose useful information via your URIs
Enhance your URIs by adding links to other data on the Web using their URIs, enhancing the link density and richness of the Web

no notes exist for this slide

Common Web amp Different Nature of URIs


‘Linked Data Web’ and the ‘Document Web’:
- two dimensions of the Web separated by a common element
- the Uniform Resource Identifier (URI)
‘Linked Data Web’ and the ‘Document Web’:
- two dimensions of the Web separated by a common element
- the Uniform Resource Identifier (URI)
Document Web URIs
These always point to “physical” Web documents (aka information resources)
URI = a URL when it specifies a location
URI = a URN when it specifies a name (i.e. when not location bound)
Linked Data Web
URIs identify physical or abstract resources

no notes exist for this slide

What are Resources


Web parlance for a Data Object or Entity that may be physical or abstract
Web parlance for a Data Object or Entity that may be physical or abstract
Document Web Resources are physical units of information (containers of contextualized data)
Linked Data Web Resources are generic real-world data objects or entities that include:
People, Places, and other Things
Abstract concepts (e.g. Emotion)
Subject Matter (e.g. Science, Geography, Economics etc.)

no notes exist for this slide

Resource Identity Representation and Access


Identity (URI) of an Object or Entity should be unambiguous and globally unique
Identity (URI) of an Object or Entity should be unambiguous and globally unique
On the Web a URI should provide an unambiguous data access path
Reference to abstract (physically inaccessible) Objects or Entities is only achievable via conduit documents that carry representations of entity descriptions (which at best are facets of an entire description)
The descriptive representations of an Object or Entity must be distinct from their URIs
Data Access mechanisms must be independent and facilitate negotiation of representation.

no notes exist for this slide

Linked Data Deployment Requirements


To establish real-world object URIs in the Linked Data Web realm, a Linked Data Server needs to honour the following requirements:
To establish real-world object URIs in the Linked Data Web realm, a Linked Data Server needs to honour the following requirements:
Unique Global Identity for Resources using HTTP-based URIs
Deployment platform needs ability to generate proxy Web resources to convey descriptions of real-world (possibly abstract) resources
Challenges:
Separation of Identity and Representation within the context of HTTP protocol mechanics
Negotiable representation of resource descriptions through Transparent Content Negotiation and client-side or server-side QoS algorithms
URL rewriting and query association

no notes exist for this slide

RealWorld Object Naming URI Schemes


Linked Data Web URIs can take two forms:
Linked Data Web URIs can take two forms:
‘Slash’ URIs - don’t contain a fragment identifier (#)‏
http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
http://demo.openlinksw.com/Northwind/Customer/ALFKI/page
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
Identify an entity, it’s HTML representation (document),
and it’s RDF representation (document) respectively
‘Hash’ URIs - contain a fragment identifier
http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
Identifies the entity ALFKI, distinct from its representation
(http://demo.openlinksw.com/Northwind/Customer/ALFKI)

no notes exist for this slide

Slash URI Semantics


Slash URI Semantics

no notes exist for this slide

Hash URI Semantics


Hash URI Semantics

no notes exist for this slide

Handling Identity with Slash URIs


For this URI scheme HTTP redirection (30X response) is required in order for resource “Identity” to be separated from “representation”. Examples:

For this URI scheme HTTP redirection (30X response) is required in order for resource “Identity” to be separated from “representation”. Examples:

http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
- URI of an Organization Entity
http://demo.openlinksw.com/Northwind/Customer/ALFKI/ page
- HTML representation of Entity description
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
- RDF representation that describes the Entity which could be: Turtle, N3, RDF/XML etc. based data serialization

no notes exist for this slide

Handling Identity with Hash URIs


For this URI scheme HTTP redirection isn’t required in order for resource “Identity” to be separated from “representation”. Examples:

For this URI scheme HTTP redirection isn’t required in order for resource “Identity” to be separated from “representation”. Examples:

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
- URI of an Organization Entity
http://demo.openlinksw.com/Northwind/Customer/ALFKI
- a document (HTML, Turtle, N3, RDF/XML) representation of Entity description

no notes exist for this slide

Negotiable Representation of Resource Descriptions


Use HTTP’s in-built Content Negotiation mechanism to:

Use HTTP’s in-built Content Negotiation mechanism to:

Serve different format variants of the same resource description from one location
Enable user agent (client-side) specification of preferred description representations by order of preference
Enable server-side specification of preferred description representations by order of preference

no notes exist for this slide

Content Negotiation Example


HTTP Request:
HTTP Request:
HTML browser requests a HTML/XHTML document in English or French
GET /whitepapers/data_mngmnt HTTP/1.1
Host: www.openlinksw.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, fr
Accept header indicates preferred MIME types
RDF browser might instead stipulate a MIME type of
application/rdf+xml or application/rdf+n3

no notes exist for this slide

Slide 14


HTTP Response:
HTTP Response:
Server redirects to a URL where the appropriate version can be found
HTTP/1.1 302 Found
Location: http://www.openlinksw.com/whitepapers/data_mngmnt.en.html
Redirect is indicated by HTTP status code 302 (Found)‏
Client then sends another HTTP request to the new URL
HTTP defines several 3xx status codes for redirection

no notes exist for this slide

Content Negotiation Decision Table

no text exists for this slide

no notes exist for this slide

Dynamic RDF Renderings


If entity descriptions are held in an RDF quad store:
If entity descriptions are held in an RDF quad store:
To provide a dynamic RDF rendering of the entity being dereferenced by the client:
Use SPARQL DESCRIBE or CONSTRUCT
DESCRIBE <entity-uri> FROM <graph-uri>
‘Unconstrained’ – DESCRIBE output not prescribed by SPARQL specification
Virtuoso supports custom procedures for generating output through SPARQL define sql:describe-mode
CONSTRUCT { <entity-uri> ?p ?o } FROM <graph-uri> WHERE { <entity-uri> ?p ?o }

no notes exist for this slide

Slide 17

no text exists for this slide

no notes exist for this slide

URL Rewriting


Is the act of modifying a URL prior to final processing by a Web server
Is the act of modifying a URL prior to final processing by a Web server
Provides a means to build a URL ‘on the fly’ identifying the resource in the required representation format referred to by a 303 redirection
Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions

no notes exist for this slide

URL Rewriting Example Pipeline

no text exists for this slide

no notes exist for this slide

Content negotiation for RDF representation

no text exists for this slide

no notes exist for this slide

Deploying Linked Data Using Virtuoso


Virtuoso’s approach is to implement the generic solution outlined so far, using
Virtuoso’s approach is to implement the generic solution outlined so far, using
Content negotiation
URL rewriting
Virtuoso includes a Rules-based URL Rewriter
Can be used to inject Linked Data into the Document Web

no notes exist for this slide

Virtuoso URL Rewriter Key Elements


Rewriting Rule
Rewriting Rule
Describes how to parse a source URL and compose the URL of the resource returned in “Location:” response headers
Two types: sprintf-based and regex-based
Rewriting Rule List
Named, ordered list of rewriting rules or rule lists
Tried from top to bottom, first matching rule is applied
Conductor UI for rewriting rule configuration
Configuration API – alternative to Conductor UI, for scripts
Functions for creating, dropping, enumerating rules & rule lists

no notes exist for this slide

Conductor UI for URL Rewriter

no text exists for this slide

no notes exist for this slide

Slide 24

no text exists for this slide

no notes exist for this slide

Slide 25

no text exists for this slide

no notes exist for this slide

Rewrite Rule Components in Conductor UI


Request Path Pattern e.g. (/[^#]*)
Request Path Pattern e.g. (/[^#]*)
a regular expression matched against the input path
Substitution parameters
Each successive pair of parentheses in the regex denotes a parameter referred to elsewhere in the rewrite rule as $U1, $U2, $U3 … or $s1, $s2, $s3 …
Can be used to substitute the part of the input path that was matched into the new URL being composed
$accept parameter substitutes matched content types specified in Accept header
‘U’ format specifier – URL encodes inserted text
‘s’ format specifier – inserts matched text ‘as is’

no notes exist for this slide

URL Rewriter URIQADefaultHost Macro


URIQADefaultHost Macro
URIQADefaultHost Macro
Makes rewriting rules (& RDF View definitions) more portable
Each occurrence is substituted with the value of the DefaultHost parameter in URIQA section of virtuoso.ini configuration file
DefaultHost ::= server name. e.g. www.example.com:8890
DESCRIBE <http:///^{URIQADefaultHost}^$U1#this>
FROM <http://^{URIQADefaultHost}^/Northwind>

no notes exist for this slide

URL Rewriting Process for RDF Requests

no text exists for this slide

no notes exist for this slide

URL Rewriting Process for HTML Requests

no text exists for this slide

no notes exist for this slide

descriptionvsp Rendering RDF as HTML


Destination path in rewrite rule for HTML requests:
Destination path in rewrite rule for HTML requests:
/about/html/http://^{URIQADefaultHost}^$s1
Redirects client to the Virtuoso ‘Page Description Service’ via proxy interface /about/html
Page description services invokes description.vsp which in turn invokes the Virtuoso Sponger
Sponger: a customizable RDFizer with pluggable cartridges
Extracts RDF from the target URL
Native RDF sources: RDF is returned ‘as is’
Non-RDF sources: Meta-data is extracted and converted to RDF using ontology mapping and XSLT
description.vsp renders the extracted RDF as HTML
Substitutes RDF ‘hyperdata’ links with HTML hyperlinks

no notes exist for this slide

Exporting URL Rewriting Rules from Conductor


Rewrite rules configured in Conductor can be exported as Virtuoso PL for backup, use on another system etc.
Rewrite rules configured in Conductor can be exported as Virtuoso PL for backup, use on another system etc.
Exported script recreates rules using Virtuoso’s
URL Rewriting Configuration API

no notes exist for this slide

Example Exported Rule Definitions

no text exists for this slide

no notes exist for this slide

URL Rewriter API Enabling Rewriting


Enabled through vhost_define( ) function
Enabled through vhost_define( ) function
vhost_define( ) defines a virtual host or virtual path
opts parameter is a vector of field-value pairs
Field url_rewrite controls / enables URL rewriting
Field value is the IRI of the rule list to apply
e.g.
DB.DBA.VHOST_DEFINE (
lhost=>'*ini*', vhost=>'*ini*',
lpath=>'/Northwind',ppath=>'/DAV/home/demo/',
is_dav=>1, vsp_user=>'dba',
ses_vars=>0, opts=>vector ('url_rewrite', 'demo_nw_rule_list1'), is_default_host=>0);

no notes exist for this slide

URL Rewriter API Summary


Functions in DB.DBA schema:
Functions in DB.DBA schema:
URLREWRITE_CREATE_SPRINTF_RULE
URLREWRITE_CREATE_REGEX_RULE
URLREWRITE_CREATE_RULELIST
URLREWRITE_DROP_RULE
URLREWRITE_DROP_RULELIST
URLREWRITE_ENUMERATE_RULES
URLREWRITE_ENUMERATE_RULELISTS

no notes exist for this slide

Nice URLs vs Long URLs


Rewriter developed with broader objectives than Linked Data – consequently influenced terminology
Rewriter developed with broader objectives than Linked Data – consequently influenced terminology
Rewriter takes a ‘nice’ URL and rewrites it as a ‘long’ URL
‘Nice’ URL
Free from parameters, typically short
‘Long’ URL
Typically contains query string with named parameters
Often ignored by web crawlers (viewed as highly dynamic) => low page ranking

no notes exist for this slide

Sprintf Rules vs Regex Rules


Rewrite rules take two forms: sprintf-based & regex-based:
Rewrite rules take two forms: sprintf-based & regex-based:
For ‘nice’ to ‘long’ URL conversion
Functionally equivalent
Only difference is syntax of match pattern definition
For ‘long’ to ‘nice’ URL conversion
Only works for sprintf-based rules
Regex-based rules are unidirectional

no notes exist for this slide

URLREWRITECREATEREGEXRULE


URLREWRITE_CREATE_REGEX_RULE (
rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null,
accept_pattern := null, do_not_continue := 0,
http_redirect_code := null, http_headers := null) ;
URLREWRITE_CREATE_REGEX_RULE (
rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null,
accept_pattern := null, do_not_continue := 0,
http_redirect_code := null, http_headers := null) ;
rule_iri: rule’s name / identifier
nice_match: regex to parse URL into a vector of ‘occurrences’
nice_params: vector of names of the parsed parameters.
Length of vector equals # of ‘(…)’ specifiers in the regex
target_compose: ‘compose’ regex for the destination URL
target_params: vector of names of parameters to pass to the ‘compose’ expression as $1, $2 etc
target_expn: optional SQL text to execute instead of a regex compose
accept_pattern: regex expression to match the HTTP Accept header
do_not_continue: on a match, try / don’t try next rule in rule list
http_redirect_code: null, 301, 302 or 303. 30x => HTTP redirect
http_headers: HTTP headers to supply with the rewritten request

no notes exist for this slide

URL Rewriter Verification with curl


curl utility provides a useful tool for verifying HTTP server responses and rewriting rules
curl utility provides a useful tool for verifying HTTP server responses and rewriting rules
$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/
Customer/ALFKI
HTTP/1.1 303 See Other
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 12 Feb 2009 11:23:31 GMT
Accept-Ranges: bytes
Location: http://demo.openlinksw.com/sparql?query=DESCRIBE+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%23this%3E+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%3E+FROM+%3Chttp
%3A//demo.openlinksw.com/Northwind%3E&format=application%2Frdf%2Bxml
Content-Length: 0
Note: default rule for RDF requests changed to return HTTP response 303, rather than use an internal redirect, to allow the generated SPARQL query to be viewed and checked with curl

no notes exist for this slide

Browsing amp Exploring Linked Data


OpenLink Data Explorer (ODE)
OpenLink Data Explorer (ODE)
Browser extension (Firefox, support for others to follow)
See http://ode.openlinksw.com
RDF and HTML views of Linked Data
RDF view incorporates ‘hyperdata’ links between entities
HTML view substitutes hyperlinks
Also available as a hosted service
E.g. http://demo.openlinksw.com/ode
iSparql Query Tool
Interactive SPARQL Query Builder
E.g. http://demo.openlinksw.com/isparql
See http://wikis.openlinksw.com/dataspace/owiki/wiki/OATWikiWeb/InteractiveSparqlQueryBuilder

no notes exist for this slide

Content Negotiation Revisited TCN


Virtuoso supports two flavours of content negotiation:
Virtuoso supports two flavours of content negotiation:
HTTP/1.1 style content negotiation (introduced earlier)
Server-driven negotiation only
Transparent Content Negotiation (TCN)
Server-driven or agent-driven negotiation
Suitably enabled user agents / browsers can take advantage of TCN
Non-TCN capable user agents continue to be handled using HTTP/1.1 content negotiation

no notes exist for this slide

Transparent Content Negotiation


A protocol defined by RFC2295, layered on top of HTTP/1.1
A protocol defined by RFC2295, layered on top of HTTP/1.1
Addresses deficiencies in HTTP/1.1 content negotiation
Limited to server selecting best variant
(server-driven negotiation)
Server doesn’t always know/select best variant
User agent might often be better placed to decide what is best for its needs
Inefficient
Sending details of user agent's capabilities and preferences with every request is inefficient
Large number of Accept headers required
Very few Web resources have multiple variants

no notes exist for this slide

Slide 42


Supports variant selection by user agent or by server
Supports variant selection by user agent or by server
Transparent - all variants on server are visible to the agent
Variant Selection by User Agent:
User agent chooses best variant itself from variant list sent by server
Requires sending fewer/smaller ‘Accept’ headers
Variant Selection by Server:
User agent can instruct server to select best variant on its behalf
Server uses ‘remote variant selection algorithm’ (RFC2296)

no notes exist for this slide

TCN Basic Mechanics


Client
Client
Supplies Negotiate* request header
Content negotiation directives include:
"trans" => user agent supports TCN for the current request
"vlist" - user agent wants a variant list for the resource
Variant list is expressed as an Alternates header.
Implies "trans".
"*" - user agent allows servers and proxies to run any
    remote variant selection algorithm
Server
Returns a TCN* response header signalling that the resource is transparently negotiated and either a choice or a list response as appropriate
*New headers introduced by RFC2295

no notes exist for this slide

Example Preferred format XML


Assumes Virtuoso WebDAV server contains 3 variants of resource named ‘page’:
Assumes Virtuoso WebDAV server contains 3 variants of resource named ‘page’:
/DAV/TCN/page.xml
/DAV/TCN/page.html
/DAV/TCN/page.txt
User agent indicates preference for XML
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
         -H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2009 15:44:07 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.xml
Content-Type: text/xml
ETag: "8b09f4b8e358fcb7fd1f0f8fa918973a"
Content-Length: 39
<?xml version="1.0" ?>
<a>some xml</a>

no notes exist for this slide

Example Preferred format HTML


User agent indicates preference for HTML
User agent indicates preference for HTML
$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;q=0.3"
         -H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2009 15:43:18 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.html
Content-Type: text/html
ETag: "14056a25c066a6e0a6e65889754a0602"
Content-Length: 49
<html>
   <body>
      some html
   </body>
</html>

no notes exist for this slide

Example Variant list request


User agent asks for a list of variants
User agent asks for a list of variants
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
   -H "Negotiate: vlist" http://localhost:8890/DAV/TCN/page
HTTP/1.1 300 Multiple Choices
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Wed, 31 Oct 2009 15:44:35 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"page.html" 0.900000 {type text/html}}, {"page.txt" 0.500000 {type
text/plain}}, {"page.xml" 1.000000 {type text/xml}}
Content-Length: 368
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head><title>300 Multiple Choices</title></head>
<body><h1>Multiple Choices</h1>Available variants:<ul>
<li><a href="page.html">HTML variant</a>, type text/html</li>
<li><a href="page.txt">Text document</a>, type text/plain</li>
<li><a href="page.xml">XML variant</a>, type text/xml</li>
</ul></body></html>

no notes exist for this slide

TCN Configuration Variant Description


Variant descriptions held in SQL table HTTP_VARIANT_MAP
Variant descriptions held in SQL table HTTP_VARIANT_MAP
Added/updated/removed through Virtuoso/PL or Conductor UI
create table DB.DBA.HTTP_VARIANT_MAP (
   VM_ID integer identity, -- unique ID
   VM_RULELIST varchar, -- HTTP rule list name
   VM_URI varchar, -- name of requested resource e.g. 'page'
   VM_VARIANT_URI varchar, -- name of variant e.g. 'page.xml','page.de.html' etc.
   VM_QS float, -- Source quality, number in the range 0.001-1.000, with 3 digit precision
   VM_TYPE varchar, -- Content type of the variant e.g. text/xml
   VM_LANG varchar, -- Content language e.g. 'en', 'de' etc.
   VM_ENC varchar, -- Content encoding e.g. 'utf-8', 'ISO-8892‘ etc.
   VM_DESCRIPTION long varchar, -- human readable variant description
                         e.g. 'Profile in RDF format'
   VM_ALGO int default 0, -- reserved for future use
   primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI)
)
create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID)

no notes exist for this slide

TCN Configuration via Conductor UI

no text exists for this slide

no notes exist for this slide

TCN Configuration via VirtuosoPL


Adding or Updating a Resource Variant
Adding or Updating a Resource Variant
DB.DBA.HTTP_VARIANT_ADD (
   in rulelist_uri varchar, -- HTTP rule list name
   in uri varchar, -- Requested resource name e.g. 'page'
   in variant_uri varchar,    -- Variant name e.g. 'page.xml', 'page.de.html' etc.
   in mime varchar, -- Content type of the variant e.g. text/xml
   in qs float := 1.0, -- Source quality, a floating point number with 3
                   digit precision in 0.001-1.000 range
   in description varchar := null, -- a human readable description of the
                           variant e.g. 'Profile in RDF format'
   in lang varchar := null, -- Content language e.g. 'en', 'bg'. 'de' etc.
   in enc varchar := null -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
   )
Removing a Resource Variant
DB.DBA.HTTP_VARIANT_REMOVE (
   in rulelist_uri varchar, -- HTTP rule list name
   in uri varchar, -- Name of requested resource e.g. 'page'
   in variant_uri varchar := '%' -- Variant name filter   
   )

no notes exist for this slide

Slide 50


Adding resource variant descriptions
Adding resource variant descriptions
Define variant descriptions & associate them with a rule list
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html', 'text/html',
   0.900000, 'HTML variant');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain',
   0.500000, 'Text document');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml',
   1.000000, 'XML variant');
Define a virtual directory & associate the rule list with it
DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/', ppath=>'/DAV/TCN/', is_dav=>1,
   vsp_user=>'dba', opts=>vector ('url_rewrite', 'http_rule_list_1'));

no notes exist for this slide