TODO
TODO, citing [[RDFSTARFOUNDATION]]
TODO (the purpose of this section will be to provide an informal introduction to the approach for practitioners)
The syntax of RDF is defined in two layers:
Similarly, this document defines the abstract syntax of RDF* in , and one concrete syntax based on Turtle [[TURTLE]] in .
An RDF* graph is a set of RDF* triples.
An RDF* triple is a 3tuple defined recursively as follows:
As for RDF triples, we call the 3 components of an RDF* triple its subject, predicate and object, respectively. From the definitions above, it follows that any RDF graph is also an RDF* graph. Note also that, by definition, an RDF* triple cannot contain itself and cannot be nested infinitely.
IRIs, literals, blank nodes and RDF* triples are collectively known as RDF* terms.
For every RDF* triple t, we define its constituent terms (or simply constituents) as the set containing its subject, its predicate, its object, plus all the constituent terms of its subject and/or its object if they are themselves RDF* triples. By extension, we define the constituent terms of an RDF* graph to be the union set of the constituent terms of all its triples.
Its set of constituent terms comprises the IRIs `:name`, `:statedBy`, `:bob`, the blank node `_:a`, the literal `"Alice"`, and the triple `<< _:a :name "Alice" >>`.
An RDF* triple used as the subject or object of another RDF* triple is called an embedded triple. An RDF* triple that is an element of an RDF* graph is called an asserted triple. Note that, in a given RDF* graph, the same triple MAY be both embedded and asserted.
An RDF* dataset is a collection of RDF* graphs, and comprises:
Again, this definition is an extension of the notion of RDF dataset, hence it follows that any RDF dataset is also an RDF* dataset.
In this section, we present Turtle*, an extension of the Turtle format [[TURTLE]] allowing to represent RDF* graphs. For the sake of conciseness, we only describe here the differences between Turtle* and Turtle.
Turtle* is defined to follow the same grammar as Turtle, except for the EBNF productions specified below, which replace the productions with the same number (if any) in the original grammar.
[10]  `subject`  ::=  iri `` BlankNode `` collection `` embTriple 
[12]  `object`  ::=  iri `` BlankNode `` collection `` blankNodePropertyList `` literal `` embTriple 
[27]  `embTriple`  ::= 
'<<'
embSubject
verb
embObject
'>>'

[28]  `embSubject`  ::=  iri `` BlankNode `` embTriple 
[29]  `embObject`  ::=  iri `` BlankNode `` literal `` embTriple 
The only changes are that `subject` and `object` productions have been extended to accept embedded triples, which are described by the new productions 27 to 29. Note that embedded triples accept a more restricted range of subject and object expressions than asserted triples.
A Turtle* parser is similar to a Turtle parser as defined in Section 7 of the Turtle specification [[TURTLE]], with an additional item in its state :
Additionally, the curSubject can be bound to any RDF* term (including a embedded triple).
A Turtle* document defines an RDF* graph composed of a set of RDF* triples. The `subject` and `embSubject` productions sets the curSubject. The `verb` production sets the curPredicate. The `embObject` productions sets the curObject. For each `object` N, an RDF* triple curSubject curPredicate N is generated and added to the RDF* graph.
Beginning the `embTriple` production records the curSubject and curPredicate. Finishing the `embTriple` production yields the RDF* triple curSubject curPredicate curObject and restores the recorded values of curSubject and curPredicate.
All other productions MUST be handled as specified by Section 7 of the Turtle specification [[TURTLE]], while still applying the changes above recursively.
While this document specifies only one concrete syntax, nothing prevents other concrete syntaxes of RDF* to be proposed. In particular, other existing concrete syntaxes for RDF, such as RDF/XML [[RDFSYNTAXGRAMMAR]], could be extended to support RDF*. In particular, the NTriples syntax [[NTRIPLES]] being a subset of Turtle, an appropriate subset of Turtle* could be defined to extend NTriples accordingly.
This Section introduces SPARQL*, which is an RDF*aware extension of the RDF query language SPARQL [[SPARQL11QUERY]]; i.e., SPARQL* can be used to query RDF* graphs.
In the following, we introduce a number of SPARQL* specific definitions, which rely on the following notions, defined in [[[SPARQL11QUERY]]] [[SPARQL11QUERY]]: RDF term, query variable, triple pattern, property path pattern, property path expression, and solution mapping.
A SPARQL* triple pattern is a 3tuple that is defined recursively as follows:
As for RDF* triples, a SPARQL* triple pattern MUST NOT contain itself.
A SPARQL* basic graph pattern (BGP*) is a set of SPARQL* triple patterns.
A SPARQL* property path pattern is a 3tuple (s,p,o) where
A SPARQL* solution mapping μ is a partial function from the set of all query variables to the union set of all RDF* terms. The domain of μ, denoted by dom(μ), is the set of query variables for which μ is defined.
The notion of a SPARQL* solution mapping extends the notion of a standard SPARQL solution mapping; that is, every SPARQL solution mapping is a SPARQL* solution mapping. However, in contrast to SPARQL solution mappings, SPARQL* solution mappings may map variables also to RDF* triples.
SPARQL* is defined to follow the same grammar as SPARQL, except for the EBNF productions specified below, which replace the productions with the same number (if any) in the original grammar.
[60]  `Bind`  ::= 
'BIND'
'('
(
Expression ``
EmbTP
)
'AS'
Var
')'

[75]  `TriplesSameSubject`  ::=  VarOrTermOrEmbTP PropertyListNotEmpty `` TriplesNode PropertyList 
[80]  `Object`  ::=  GraphNode `` EmbTP 
[81]  `TriplesSameSubjectPath`  ::=  VarOrTermOrEmbTP PropertyListPathNotEmpty `` TriplesNode PropertyListPath 
[105]  `GraphNodePath`  ::=  VarOrTermOrEmbTP `` TriplesNodePath `` 
[174]  `EmbTP`  ::= 
'<<
EmbSubjectOrObject
Verb
EmbSubjectOrObject
'>>

[175]  `EmbSubjectOrObject`  ::=  Var `` BlankNode `` iri `` RDFLiteral `` NumericLiteral `` BooleanLiteral `` EmbTP 
[176]  `VarOrTermOrEmbTP`  ::=  Var `` GraphTerm `` EmbTP 
This introduces a notation for embedded triple patterns (productions [174] and following), which is similar to the one defined for embedded triples in , but accepting also variables. These embedded triple patterns are allowed in subject ([75], [81]) and object ([80], [105]) position of SPARQL* triple patterns, as well as in BIND statements ([60]).
Based on the SPARQL grammar the SPARQL specification defines the process of converting graph patterns and solution modifiers in a SPARQL query string into a SPARQL algebra expression
[SPARQL11QUERY, Section 18.2]. This process must be adjusted to consider the extended grammar introduced above. In the following, any step of the conversion process that requires adjustment is discussed.
As a basis of the translation, the SPARQL specification introduces a notion of inscope variables. To cover the new syntax elements introduced in this notion MUST be extended as follows.
not [be] inscope from the preceeding elements in the group graph pattern in which [the BIND clause] is used[SPARQL11QUERY, Section 18.2.1]].
The translation process starts with expanding abbreviations for IRIs and triple patterns
[SPARQL11QUERY, Section 18.2.2.1]. This step MUST be extended in two ways:
Abbreviations for triple patterns with embedded triple patterns MUST be expanded as if each embedded triple pattern was a variable (or an RDF term).
must be expanded to
Abbreviations for IRIs in all embedded triple patterns MUST be expanded.
must be expanded to
The translation of property path patterns has to be adjusted because the extended grammar allows for property path patterns whose subject or object is an embedded triple pattern (cf. ).
The translation as specified in the W3C specification distinguishes four cases. The first three of these cases do not require adjustment because they are taken care of either by recursion or by the adjusted translation of basic graph patterns (as defined in below). However, the fourth case MUST be adjusted as follows.
Let X P Y be a string that corresponds to the fourth case in [SPARQL11QUERY, Section 18.2.2.4]. Given the grammar introduced in , X and Y may be an RDF term, a variable, or an embedded triple pattern, respectively (and P is a property path expression). The string X P Y is translated to the algebra expression `Path`(X’,P,Y’) where X’ and Y’ are the result of calling a function named `Lift` for X and Y, respectively. For some input string Z (such as X or Y) that can be an RDF term, a variable, or an embedded triple pattern, the function `Lift` is defined as follows:
After translating property path patterns, the translation process collects any adjacent triple patterns [...] to form a basic graph pattern
[SPARQL11QUERY, Section 18.2.2.5]. This step has to be adjusted because triple patterns in the extended syntax may have an embedded triple pattern in their subject position or in their object position (or in both). To ensure that every result of this step is a BGP*, before adding a triple pattern to its corresponding collection, its subject and object MUST be replaced by the result of calling function `Lift` for the subject and the object, respectively.
The extended grammar in allows for BIND clauses with an embedded triple pattern. The translation of such a BIND clause to a SPARQL algebra expression requires a new algebra symbol:
Note that this symbol corresponds to SPARQL* expressions of the form (`tp` AS `?v`).
Then, any string of the form `BIND( T AS v )` with T being an embedded triple pattern (i.e., not a standard BIND expression) is translated to the algebra expression `TR`(T’, v) where T’ is the result of the function `Lift` for T.
Notice, the translation of BIND clauses with an embedded triple pattern as defined in this section is used during the translation of group graph patterns. The case of BIND clauses with an embedded triple pattern is covered in this translation of group graph patterns by the last, “catch all other” `IF` statement (i.e., the `IF` statement with the condition `E is any other form`) and not by the `IF` statement for BIND clauses with an expression.
TODO here the definitions of SPARQL* expression, evaluation...
The SPARQL specification defines a function eval(D(G), algebra expression) as the evaluation of an algebra expression with respect to a dataset D having active graph G
[SPARQL11QUERY, Section 18.6]. Recall that the active graph G in the context of SPARQL* is an RDF* graph, and so is any other graph in dataset D. The definition of function eval is recursive; the two base cases of this definition for SPARQL* are given as follows:
For any other algebra expression, the SPARQL specification defines algebra operators [[SPARQL11QUERY]]. These definitions can be extended naturally to operate over multisets of SPARQL* solution mappings (instead of ordinary solution mappings). Given this extension, the recursive steps of the definition of function eval for SPARQL* are the same as in the SPARQL specification.
In SPARQL, queries can take four forms: SELECT, CONSTRUCT, DESCRIBE, and ASK  see SPARQL1.1 Query, Section 16 [[SPARQL11QUERY]]. The first of these returns a query solution as a set of variable bindings. The second and third both return an RDF graph, and the last returns a boolean value.
The result of the ASK query form is not changed by the introduction of RDF*, and the result of the CONSTRUCT and DESCRIBE forms can be represented by Turtle*. However, since the SELECT form deals with returning individual RDF terms, the specific serialization formats for representing such query results need to be extended so that the new embedded triple RDF term can be represented. In this section, we propose extensions for the two most common formats for this purpose: [[[sparql11resultsjson]]], and [[[rdfsparqlXMLres]]].
The result of a SPARQL SELECT query is serialized in JSON as defined in [[[sparql11resultsjson]]], which specifies a JSON representation of variable bindings to RDF terms (see [sparql11resultsjson, Section 3.2]). To accomodate the new RDF term for embedded triples that RDF* introduces, the table of RDF term JSON representations in sparql11resultsjson, Section 3.2.2 is extended with the following entry:
{ "type": "triple", "value": { "subject": S, "predicate": P, "object": O } }where `S`, `P` and `O` are encoded using the same format, recursively.
This term is represented in JSON as follows:
{ "type": "triple", "value": { "subject": { "type": "uri", "value" "http://example.org/alice" }, "predicate": { "type": "uri", "value" "http://example.org/name" }, "object": { "type": "literal", "value" "Alice", "datatype": "http://www.w3.org/2001/XMLSchema#string" }, } }
The result of a SPARQL SELECT query is serialized in XML as defined in [[[rdfsparqlXMLres]]]. This format proposes an XML representation of variable bindings to RDF terms.
To accomodate the new RDF term for embedded triples that RDF* introduces, the list of RDF terms and their XML representations in [rdfsparqlXMLres, Section 2.3.1] is extended as follows:
where `S`, `P` and `O` are encoded recursively, using the same format, without the enclosing `<binding>` tag.
This term is represented in XML as follows:
In this section, we provide a modeltheoretic semantics for RDF*, by extending the one defined in [[[RDF11MT]]] [[RDF11MT]].
An RDF* simple interpretation I is a structure consisting of:
This definition is identical to the definition of simple interpretation [[RDF11MT]] up to item 5 included. Items 6 and 7 extend it to support RDF* triples. As a consequence, any RDF simple interpretation can be considered as an RDF* simple interpretation with IT=ITEXT=∅.
We say that an RDF* graph is ground if its set of constituent terms contains no blank node. This is a generalization of the notion of ground RDF graph [[RDF11MT]]. The denotation of a ground RDF* graph in an RDF* simple interpretation I is then given by the following rules, where the interpretation is also treated as a function from expressions (terms, triples and graphs) to elements of the universe and truth values:
Since IL and IT are partial mappings, I(E) may be undefined for some literal or embedded triple E. In that case, E has no semantic value in I, so any triple containing it will be false, hence any graph containing that triple will also be false.
An invertible mapping from the set blank nodes into itself is called a blank node renaming. By extension, we define the application of a blank node renaming R to other RDF* terms and to RDF* graphs as follow:
Suppose I is an RDF* simple interpretation and A is a mapping from a set of blank nodes to the universe IR of I. Define the mapping [I+A] of RDF* terms into IR to be A on blank nodes of the set, and I on any other term; and extend this mapping to RDF* triples and RDF* graphs using the rules given above for ground graphs. Then the denotation of any RDF* graph in I is given by:
Following [[[RDF11MT]]], we extend the notions of satisfiability and entailment. An RDF* simple interpretation satisfies E when I(E)=true. E is (simply) satisfiable when an RDF* simple interpretation exists which satisfies it, otherwise (simply) unsatisfiable. An RDF* graph G simply entails an RDF* graph H when every interpretation which satisfies G also satisfies H. If two RDF* graphs G and H each entail the other then they are logically equivalent.
Any semantic extension of RDF MAY be extended to RDF* by replacing the semantic conditions, the notion of satisfiability and the notion of entailment, defined in [[[RDF11MT]]], by their corresponding extension defined above. This is notably the case for Datatype entailment and RDFS entailment.
In this section, we discuss a number of desired features of RDF* semantics, in order to shed light on the design choices made in the previous section
RDF* must be able to quote a triple without asserting it, so that we can represent peoples' beliefs without endorsing them, or facts are no longer or not yet true. This is ensured by the fact that the semantic condition on embedded triples (introduced for in RDF*) is different from the one on asserted triples (inherited from RDF).
For example, the following graph:does not entail `:bob :name "Charlie"`, and the SPARQL* query below executed against the graph above would return no result.
Blank nodes in embedded triples have the same scope as blank nodes used in the subject or object position of asserted triples (usually the whole graph or the whole dataset in which they appear). This means that the same blank node identifier used in different embedded triples, or at different levels of nesting, will refer to the same thing.
For example, in the following graph:
the three occurrence of `_:x` must refer to the same resource in every interpretation of the graph. In other words, it must be the same resource that Alice knows, that she believes is named "Charlie", and that she believes works for ACME.
As a consequence, the following query will return `"Charlie"`:
As another consequence, the following graph does not entail the graph above (because the graph below allows the resource known by Alice to be different from the one about which she has beliefs).
Formally, the second graph is satisfied by an interpretation where:
but this interpretation can not satisfy the first graph, because the mapping `A` must map `_:x` to both Y (to satisfy IEXT(K)) and to X (to satisfy ITEXT(T1) and ITEXT(T2)).
Embedded triples are referentially opaque, meaning that triples using different terms are considered different, even if their terms can be inferred to be synonyms. Although RDF* simple entailment has no mean to entail any kind of synonymy, it is possible in some semantic extensions, such as OWL [[OWL2RDFBASEDSEMANTICS]].
A well known example is the superman problem:
Intuitively: this graph states that Superman and Clark Kent are the same person, so if Superman can fly, then it follows that Clark Kent can as well. So, under OWL2entailment, this graph entails `:clarkKent :can :fly`. However, Lois Lane does not know that Superman and Clark Kent are the same person. So from her point of view, the two triples are not equivalent, and she can believe one without believing the other.
Referential opacity is ensured by differentiating the intension of embedded triples (represented by the IT mapping) from their extension (represented by ITEXT). Since IT is based solely on the syntax of triples, two syntactically different triples can always have different intentions, even if they are semantically equivalent (i.e. their extensions are identical).
In standard RDF, renaming a blank node does not change the semantics of a graph (provided that the new name was not already in use in that graph). This is ensured by the fact that RDF simple interpretations do not depend on blank node identifiers at all. In RDF*, however, because of the need for referential opacity, the IT mapping does depend on the blank node identifiers used in the embedded triple. This is why the notion of satisfiability in RDF additionally relies on a blank node renaming.
Consider the following graph:
Every interpretation satisfying this graph must have, for some A, B, C, N, T and X:
Any such interpretation can be shown to also satisfy the graph below:
For this, we must chose R such that R(`_:y`)=`_:x`, and A such that A(`_:x`)=X.