Comments
There aren't any comments for this presentation.
Add Comment
Transcript
Slide Text
Slide Notes
OpenLink Virtuoso Linked Data
no text exists for this slide
no notes exist for this slide
Linked Data
âLinked Dataâ â Title of a Web Design Issues Note by Tim Berners-Lee
âLinked Dataâ â Title of a Web Design Issues Note by Tim Berners-Lee
An effort to evolve current âWeb of Documentsâ into a âWeb of Linked Dataâ
Describes recommended best practice for injecting data into the Web
Use the RDF data model
Name real or abstract things (resources) in your âuniverse of discourseâ (Data Spaces), using URIs as unique IDs
Make URIs accessible via HTTP so people can discover and explore your data via the Web
Expose useful information via your URIs
Enhance your URIs by adding links to other data on the Web using their URIs, enhancing the link density and richness of the Web
no notes exist for this slide
Common Web amp Different Nature of URIs
âLinked Data Webâ and the âDocument Webâ:
- two dimensions of the Web separated by a common element
- the Uniform Resource Identifier (URI)
âLinked Data Webâ and the âDocument Webâ:
- two dimensions of the Web separated by a common element
- the Uniform Resource Identifier (URI)
Document Web URIs
These always point to âphysicalâ Web documents (aka information resources)
URI = a URL when it specifies a location
URI = a URN when it specifies a name (i.e. when not location bound)
Linked Data Web
URIs identify physical or abstract resources
no notes exist for this slide
What are Resources
Web parlance for a Data Object or Entity that may be physical or abstract
Web parlance for a Data Object or Entity that may be physical or abstract
Document Web Resources are physical units of information (containers of contextualized data)
Linked Data Web Resources are generic real-world data objects or entities that include:
People, Places, and other Things
Abstract concepts (e.g. Emotion)
Subject Matter (e.g. Science, Geography, Economics etc.)
no notes exist for this slide
Resource Identity Representation and Access
Identity (URI) of an Object or Entity should be unambiguous and globally unique
Identity (URI) of an Object or Entity should be unambiguous and globally unique
On the Web a URI should provide an unambiguous data access path
Reference to abstract (physically inaccessible) Objects or Entities is only achievable via conduit documents that carry representations of entity descriptions (which at best are facets of an entire description)
The descriptive representations of an Object or Entity must be distinct from their URIs
Data Access mechanisms must be independent and facilitate negotiation of representation.
no notes exist for this slide
Linked Data Deployment Requirements
To establish real-world object URIs in the Linked Data Web realm, a Linked Data Server needs to honour the following requirements:
To establish real-world object URIs in the Linked Data Web realm, a Linked Data Server needs to honour the following requirements:
Unique Global Identity for Resources using HTTP-based URIs
Deployment platform needs ability to generate proxy Web resources to convey descriptions of real-world (possibly abstract) resources
Challenges:
Separation of Identity and Representation within the context of HTTP protocol mechanics
Negotiable representation of resource descriptions through Transparent Content Negotiation and client-side or server-side QoS algorithms
URL rewriting and query association
no notes exist for this slide
RealWorld Object Naming URI Schemes
Linked Data Web URIs can take two forms:
Linked Data Web URIs can take two forms:
âSlashâ URIs - donât contain a fragment identifier (#)â
http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
http://demo.openlinksw.com/Northwind/Customer/ALFKI/page
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
Identify an entity, itâs HTML representation (document),
and itâs RDF representation (document) respectively
âHashâ URIs - contain a fragment identifier
http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
Identifies the entity ALFKI, distinct from its representation
(http://demo.openlinksw.com/Northwind/Customer/ALFKI)
no notes exist for this slide
Slash URI Semantics
Slash URI Semantics
no notes exist for this slide
Hash URI Semantics
Hash URI Semantics
no notes exist for this slide
Handling Identity with Slash URIs
For this URI scheme HTTP redirection (30X response) is required in order for resource âIdentityâ to be separated from ârepresentationâ. Examples:
For this URI scheme HTTP redirection (30X response) is required in order for resource âIdentityâ to be separated from ârepresentationâ. Examples:
http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
- URI of an Organization Entity
http://demo.openlinksw.com/Northwind/Customer/ALFKI/ page
- HTML representation of Entity description
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
- RDF representation that describes the Entity which could be: Turtle, N3, RDF/XML etc. based data serialization
no notes exist for this slide
Handling Identity with Hash URIs
For this URI scheme HTTP redirection isnât required in order for resource âIdentityâ to be separated from ârepresentationâ. Examples:
For this URI scheme HTTP redirection isnât required in order for resource âIdentityâ to be separated from ârepresentationâ. Examples:
http://demo.openlinksw.com/Northwind/Customer/ALFKI#this
- URI of an Organization Entity
http://demo.openlinksw.com/Northwind/Customer/ALFKI
- a document (HTML, Turtle, N3, RDF/XML) representation of Entity description
no notes exist for this slide
Negotiable Representation of Resource Descriptions
Use HTTPâs in-built Content Negotiation mechanism to:
Use HTTPâs in-built Content Negotiation mechanism to:
Serve different format variants of the same resource description from one location
Enable user agent (client-side) specification of preferred description representations by order of preference
Enable server-side specification of preferred description representations by order of preference
no notes exist for this slide
Content Negotiation Example
HTTP Request:
HTTP Request:
HTML browser requests a HTML/XHTML document in English or French
GET /whitepapers/data_mngmnt HTTP/1.1
Host: www.openlinksw.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, fr
Accept header indicates preferred MIME types
RDF browser might instead stipulate a MIME type of
application/rdf+xml or application/rdf+n3
no notes exist for this slide
Slide 14
HTTP Response:
HTTP Response:
Server redirects to a URL where the appropriate version can be found
HTTP/1.1 302 Found
Location: http://www.openlinksw.com/whitepapers/data_mngmnt.en.html
Redirect is indicated by HTTP status code 302 (Found)â
Client then sends another HTTP request to the new URL
HTTP defines several 3xx status codes for redirection
no notes exist for this slide
Content Negotiation Decision Table
no text exists for this slide
no notes exist for this slide
Dynamic RDF Renderings
If entity descriptions are held in an RDF quad store:
If entity descriptions are held in an RDF quad store:
To provide a dynamic RDF rendering of the entity being dereferenced by the client:
Use SPARQL DESCRIBE or CONSTRUCT
DESCRIBE <entity-uri> FROM <graph-uri>
âUnconstrainedâ â DESCRIBE output not prescribed by SPARQL specification
Virtuoso supports custom procedures for generating output through SPARQL define sql:describe-mode
CONSTRUCT { <entity-uri> ?p ?o } FROM <graph-uri> WHERE { <entity-uri> ?p ?o }
no notes exist for this slide
Slide 17
no text exists for this slide
no notes exist for this slide
URL Rewriting
Is the act of modifying a URL prior to final processing by a Web server
Is the act of modifying a URL prior to final processing by a Web server
Provides a means to build a URL âon the flyâ identifying the resource in the required representation format referred to by a 303 redirection
Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions
no notes exist for this slide
URL Rewriting Example Pipeline
no text exists for this slide
no notes exist for this slide
Content negotiation for RDF representation
no text exists for this slide
no notes exist for this slide
Deploying Linked Data Using Virtuoso
Virtuosoâs approach is to implement the generic solution outlined so far, using
Virtuosoâs approach is to implement the generic solution outlined so far, using
Content negotiation
URL rewriting
Virtuoso includes a Rules-based URL Rewriter
Can be used to inject Linked Data into the Document Web
no notes exist for this slide
Virtuoso URL Rewriter Key Elements
Rewriting Rule
Rewriting Rule
Describes how to parse a source URL and compose the URL of the resource returned in âLocation:â response headers
Two types: sprintf-based and regex-based
Rewriting Rule List
Named, ordered list of rewriting rules or rule lists
Tried from top to bottom, first matching rule is applied
Conductor UI for rewriting rule configuration
Configuration API â alternative to Conductor UI, for scripts
Functions for creating, dropping, enumerating rules & rule lists
no notes exist for this slide
Conductor UI for URL Rewriter
no text exists for this slide
no notes exist for this slide
Slide 24
no text exists for this slide
no notes exist for this slide
Slide 25
no text exists for this slide
no notes exist for this slide
Rewrite Rule Components in Conductor UI
Request Path Pattern e.g. (/[^#]*)
Request Path Pattern e.g. (/[^#]*)
a regular expression matched against the input path
Substitution parameters
Each successive pair of parentheses in the regex denotes a parameter referred to elsewhere in the rewrite rule as $U1, $U2, $U3 ⦠or $s1, $s2, $s3 â¦
Can be used to substitute the part of the input path that was matched into the new URL being composed
$accept parameter substitutes matched content types specified in Accept header
âUâ format specifier â URL encodes inserted text
âsâ format specifier â inserts matched text âas isâ
no notes exist for this slide
URL Rewriter URIQADefaultHost Macro
URIQADefaultHost Macro
URIQADefaultHost Macro
Makes rewriting rules (& RDF View definitions) more portable
Each occurrence is substituted with the value of the DefaultHost parameter in URIQA section of virtuoso.ini configuration file
DefaultHost ::= server name. e.g. www.example.com:8890
DESCRIBE <http:///^{URIQADefaultHost}^$U1#this>
FROM <http://^{URIQADefaultHost}^/Northwind>
no notes exist for this slide
URL Rewriting Process for RDF Requests
no text exists for this slide
no notes exist for this slide
URL Rewriting Process for HTML Requests
no text exists for this slide
no notes exist for this slide
descriptionvsp Rendering RDF as HTML
Destination path in rewrite rule for HTML requests:
Destination path in rewrite rule for HTML requests:
/about/html/http://^{URIQADefaultHost}^$s1
Redirects client to the Virtuoso âPage Description Serviceâ via proxy interface /about/html
Page description services invokes description.vsp which in turn invokes the Virtuoso Sponger
Sponger: a customizable RDFizer with pluggable cartridges
Extracts RDF from the target URL
Native RDF sources: RDF is returned âas isâ
Non-RDF sources: Meta-data is extracted and converted to RDF using ontology mapping and XSLT
description.vsp renders the extracted RDF as HTML
Substitutes RDF âhyperdataâ links with HTML hyperlinks
no notes exist for this slide
Exporting URL Rewriting Rules from Conductor
Rewrite rules configured in Conductor can be exported as Virtuoso PL for backup, use on another system etc.
Rewrite rules configured in Conductor can be exported as Virtuoso PL for backup, use on another system etc.
Exported script recreates rules using Virtuosoâs
URL Rewriting Configuration API
no notes exist for this slide
Example Exported Rule Definitions
no text exists for this slide
no notes exist for this slide
URL Rewriter API Enabling Rewriting
Enabled through vhost_define( ) function
Enabled through vhost_define( ) function
vhost_define( ) defines a virtual host or virtual path
opts parameter is a vector of field-value pairs
Field url_rewrite controls / enables URL rewriting
Field value is the IRI of the rule list to apply
e.g.
DB.DBA.VHOST_DEFINE (
lhost=>'*ini*', vhost=>'*ini*',
lpath=>'/Northwind',ppath=>'/DAV/home/demo/',
is_dav=>1, vsp_user=>'dba',
ses_vars=>0, opts=>vector ('url_rewrite', 'demo_nw_rule_list1'), is_default_host=>0);
no notes exist for this slide
URL Rewriter API Summary
Functions in DB.DBA schema:
Functions in DB.DBA schema:
URLREWRITE_CREATE_SPRINTF_RULE
URLREWRITE_CREATE_REGEX_RULE
URLREWRITE_CREATE_RULELIST
URLREWRITE_DROP_RULE
URLREWRITE_DROP_RULELIST
URLREWRITE_ENUMERATE_RULES
URLREWRITE_ENUMERATE_RULELISTS
no notes exist for this slide
Nice URLs vs Long URLs
Rewriter developed with broader objectives than Linked Data â consequently influenced terminology
Rewriter developed with broader objectives than Linked Data â consequently influenced terminology
Rewriter takes a âniceâ URL and rewrites it as a âlongâ URL
âNiceâ URL
Free from parameters, typically short
âLongâ URL
Typically contains query string with named parameters
Often ignored by web crawlers (viewed as highly dynamic) => low page ranking
no notes exist for this slide
Sprintf Rules vs Regex Rules
Rewrite rules take two forms: sprintf-based & regex-based:
Rewrite rules take two forms: sprintf-based & regex-based:
For âniceâ to âlongâ URL conversion
Functionally equivalent
Only difference is syntax of match pattern definition
For âlongâ to âniceâ URL conversion
Only works for sprintf-based rules
Regex-based rules are unidirectional
no notes exist for this slide
URLREWRITECREATEREGEXRULE
URLREWRITE_CREATE_REGEX_RULE (
rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null,
accept_pattern := null, do_not_continue := 0,
http_redirect_code := null, http_headers := null) ;
URLREWRITE_CREATE_REGEX_RULE (
rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null,
accept_pattern := null, do_not_continue := 0,
http_redirect_code := null, http_headers := null) ;
rule_iri: ruleâs name / identifier
nice_match: regex to parse URL into a vector of âoccurrencesâ
nice_params: vector of names of the parsed parameters.
Length of vector equals # of â(â¦)â specifiers in the regex
target_compose: âcomposeâ regex for the destination URL
target_params: vector of names of parameters to pass to the âcomposeâ expression as $1, $2 etc
target_expn: optional SQL text to execute instead of a regex compose
accept_pattern: regex expression to match the HTTP Accept header
do_not_continue: on a match, try / donât try next rule in rule list
http_redirect_code: null, 301, 302 or 303. 30x => HTTP redirect
http_headers: HTTP headers to supply with the rewritten request
no notes exist for this slide
URL Rewriter Verification with curl
curl utility provides a useful tool for verifying HTTP server responses and rewriting rules
curl utility provides a useful tool for verifying HTTP server responses and rewriting rules
$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/
Customer/ALFKI
HTTP/1.1 303 See Other
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 12 Feb 2009 11:23:31 GMT
Accept-Ranges: bytes
Location: http://demo.openlinksw.com/sparql?query=DESCRIBE+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%23this%3E+%3Chttp
%3A//demo.openlinksw.com%2FNorthwind%2FCustomer%2FALFKI%3E+FROM+%3Chttp
%3A//demo.openlinksw.com/Northwind%3E&format=application%2Frdf%2Bxml
Content-Length: 0
Note: default rule for RDF requests changed to return HTTP response 303, rather than use an internal redirect, to allow the generated SPARQL query to be viewed and checked with curl
no notes exist for this slide
Browsing amp Exploring Linked Data
OpenLink Data Explorer (ODE)
OpenLink Data Explorer (ODE)
Browser extension (Firefox, support for others to follow)
See http://ode.openlinksw.com
RDF and HTML views of Linked Data
RDF view incorporates âhyperdataâ links between entities
HTML view substitutes hyperlinks
Also available as a hosted service
E.g. http://demo.openlinksw.com/ode
iSparql Query Tool
Interactive SPARQL Query Builder
E.g. http://demo.openlinksw.com/isparql
See http://wikis.openlinksw.com/dataspace/owiki/wiki/OATWikiWeb/InteractiveSparqlQueryBuilder
no notes exist for this slide
Content Negotiation Revisited TCN
Virtuoso supports two flavours of content negotiation:
Virtuoso supports two flavours of content negotiation:
HTTP/1.1 style content negotiation (introduced earlier)
Server-driven negotiation only
Transparent Content Negotiation (TCN)
Server-driven or agent-driven negotiation
Suitably enabled user agents / browsers can take advantage of TCN
Non-TCN capable user agents continue to be handled using HTTP/1.1 content negotiation
no notes exist for this slide
Transparent Content Negotiation
A protocol defined by RFC2295, layered on top of HTTP/1.1
A protocol defined by RFC2295, layered on top of HTTP/1.1
Addresses deficiencies in HTTP/1.1 content negotiation
Limited to server selecting best variant
(server-driven negotiation)
Server doesnât always know/select best variant
User agent might often be better placed to decide what is best for its needs
Inefficient
Sending details of user agent's capabilities and preferences with every request is inefficient
Large number of Accept headers required
Very few Web resources have multiple variants
no notes exist for this slide
Slide 42
Supports variant selection by user agent or by server
Supports variant selection by user agent or by server
Transparent - all variants on server are visible to the agent
Variant Selection by User Agent:
User agent chooses best variant itself from variant list sent by server
Requires sending fewer/smaller âAcceptâ headers
Variant Selection by Server:
User agent can instruct server to select best variant on its behalf
Server uses âremote variant selection algorithmâ (RFC2296)
no notes exist for this slide
TCN Basic Mechanics
Client
Client
Supplies Negotiate* request header
Content negotiation directives include:
"trans" => user agent supports TCN for the current request
"vlist" - user agent wants a variant list for the resource
Variant list is expressed as an Alternates header.
Implies "trans".
"*" - user agent allows servers and proxies to run any
remote variant selection algorithm
Server
Returns a TCN* response header signalling that the resource is transparently negotiated and either a choice or a list response as appropriate
*New headers introduced by RFC2295
no notes exist for this slide
Example Preferred format XML
Assumes Virtuoso WebDAV server contains 3 variants of resource named âpageâ:
Assumes Virtuoso WebDAV server contains 3 variants of resource named âpageâ:
/DAV/TCN/page.xml
/DAV/TCN/page.html
/DAV/TCN/page.txt
User agent indicates preference for XML
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2009 15:44:07 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.xml
Content-Type: text/xml
ETag: "8b09f4b8e358fcb7fd1f0f8fa918973a"
Content-Length: 39
<?xml version="1.0" ?>
<a>some xml</a>
no notes exist for this slide
Example Preferred format HTML
User agent indicates preference for HTML
User agent indicates preference for HTML
$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: *" http://demo.openlinksw.com/DAV/TCN/page
HTTP/1.1 200 OK
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: Keep-Alive
Date: Wed, 31 Oct 2009 15:43:18 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Content-Location: page.html
Content-Type: text/html
ETag: "14056a25c066a6e0a6e65889754a0602"
Content-Length: 49
<html>
<body>
some html
</body>
</html>
no notes exist for this slide
Example Variant list request
User agent asks for a list of variants
User agent asks for a list of variants
$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3"
-H "Negotiate: vlist" http://localhost:8890/DAV/TCN/page
HTTP/1.1 300 Multiple Choices
Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Wed, 31 Oct 2009 15:44:35 GMT
Accept-Ranges: bytes
TCN: list
Vary: negotiate,accept
Alternates: {"page.html" 0.900000 {type text/html}}, {"page.txt" 0.500000 {type
text/plain}}, {"page.xml" 1.000000 {type text/xml}}
Content-Length: 368
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head><title>300 Multiple Choices</title></head>
<body><h1>Multiple Choices</h1>Available variants:<ul>
<li><a href="page.html">HTML variant</a>, type text/html</li>
<li><a href="page.txt">Text document</a>, type text/plain</li>
<li><a href="page.xml">XML variant</a>, type text/xml</li>
</ul></body></html>
no notes exist for this slide
TCN Configuration Variant Description
Variant descriptions held in SQL table HTTP_VARIANT_MAP
Variant descriptions held in SQL table HTTP_VARIANT_MAP
Added/updated/removed through Virtuoso/PL or Conductor UI
create table DB.DBA.HTTP_VARIANT_MAP (
VM_ID integer identity, -- unique ID
VM_RULELIST varchar, -- HTTP rule list name
VM_URI varchar, -- name of requested resource e.g. 'page'
VM_VARIANT_URI varchar, -- name of variant e.g. 'page.xml','page.de.html' etc.
VM_QS float, -- Source quality, number in the range 0.001-1.000, with 3 digit precision
VM_TYPE varchar, -- Content type of the variant e.g. text/xml
VM_LANG varchar, -- Content language e.g. 'en', 'de' etc.
VM_ENC varchar, -- Content encoding e.g. 'utf-8', 'ISO-8892â etc.
VM_DESCRIPTION long varchar, -- human readable variant description
e.g. 'Profile in RDF format'
VM_ALGO int default 0, -- reserved for future use
primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI)
)
create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID)
no notes exist for this slide
TCN Configuration via Conductor UI
no text exists for this slide
no notes exist for this slide
TCN Configuration via VirtuosoPL
Adding or Updating a Resource Variant
Adding or Updating a Resource Variant
DB.DBA.HTTP_VARIANT_ADD (
in rulelist_uri varchar, -- HTTP rule list name
in uri varchar, -- Requested resource name e.g. 'page'
in variant_uri varchar, -- Variant name e.g. 'page.xml', 'page.de.html' etc.
in mime varchar, -- Content type of the variant e.g. text/xml
in qs float := 1.0, -- Source quality, a floating point number with 3
digit precision in 0.001-1.000 range
in description varchar := null, -- a human readable description of the
variant e.g. 'Profile in RDF format'
in lang varchar := null, -- Content language e.g. 'en', 'bg'. 'de' etc.
in enc varchar := null -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
)
Removing a Resource Variant
DB.DBA.HTTP_VARIANT_REMOVE (
in rulelist_uri varchar, -- HTTP rule list name
in uri varchar, -- Name of requested resource e.g. 'page'
in variant_uri varchar := '%' -- Variant name filter
)
no notes exist for this slide
Slide 50
Adding resource variant descriptions
Adding resource variant descriptions
Define variant descriptions & associate them with a rule list
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html', 'text/html',
0.900000, 'HTML variant');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain',
0.500000, 'Text document');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml',
1.000000, 'XML variant');
Define a virtual directory & associate the rule list with it
DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/', ppath=>'/DAV/TCN/', is_dav=>1,
vsp_user=>'dba', opts=>vector ('url_rewrite', 'http_rule_list_1'));
no notes exist for this slide
