.. include:: common.inc.rst ====================== Web and Semantic Web ====================== Since the Web was invented some 25 years ago, its pervasiveness has grown to the point of becoming trite. The Web has become a primary means of communication between colleagues, family members and friends; it is being used for work, leisure, shopping, paying taxes, finding a restaurant, a job, or a date... It connects our computers, our phones, our TV-sets, and a growing number of other (and sometimes unexpected) items, such as fridges, cars, electricity-meters or flowerpots. Not only has it changed the amount of information available to us, it has changed dramatically how we acquire, handle and use this information, and turn it into knowledge. Therefore, it brings both a challenge and an opportunity, to better understand and to assist these new practices `[@berners-lee:2006framework, chap.5]`. Since these are computer-mediated, using digital traces seems a natural way to achieve those goals. And since the Web is so pervasive, the available information is bound to be interpreted differently by different agents, or even by the same agent in different contexts, hence a need to take ambivalence into account. This chapter gathers a variety of works that we have done in a research context, but also in the context of standardization groups, more precisely groups in the W3C, of which Université de Lyon is a member since 2012. Those works are presented along three dimensions: the use of activity traces on the Web, the acknowledgment of ambivalence in Web technologies and Web standards, and how those aspects may lead to new paradigms to design Web applications. .. index:: MOOC Putting Web traces to good use ============================== As stated above, our activity on the web accounts so much for our lives that the traces of that activity can provide a huge amount of knowledge about us. Indeed Web companies such as Facebook `[@Kramer2012]` or Google `[@Bonneau2015]` have a long history of tapping the traces of their users to provide better targetted services (and advertisements). Some of them, as Netflix\ [#netflixprize]_ or Yahoo\ [#yahooopendata]_, have even opened some of their data to a wider research community, in order to find better ways to capture the collective knowledge of a large number of users. However, as stated in `Section 2.2 `:ref:, our approach is more focused on capturing the individual experience of each traced user. In her presentation of the "small data" approach, `@EstrinSmall2014` explains how health problems could be detected earlier through changes in individual behavioral patterns. While of primary importance for the users themselves, those changes may be less relevant for the companies currently holding and exploiting the users' traces. Furthermore, many users would have concerns about those companies monitoring their health status. Another effect of our traces being exploited out of our control is what `@PariserBubble2011` calls the "filter bubble": the fact that the content provided to us by search engines and social networks are tailored to meet our preferences, as computed from our activity traces. Of course it can be seen as a benefit, helping us find what we are looking for in an overwhelming amount of information. But on the other hand, those social tools paradoxically *isolate* us from whole parts of the Web, and this is all the more pernicious that they keep an aura of exhaustivity and objectivity (after all, *they* have access to the whole Web). Querying the Web ++++++++++++++++ Pariser advocates a way to disable the personalization mechanisms, in order to be able to access the Web more objectively. This has been supported by some search engines such as DuckDuckGo\ [#duckduckgo]_ or Qwant\ [#qwant]_. An alternative, and a way to compensate the lack of that feature in other systems, is to provide users with access to their traces and tools to analyze them. While this would not suppress the filter bubble, at least it would allow users to know what bubble they are in, and how their behavior alters that bubble. The work about Heystaks presented in `Chapter 2 `:ref: provides such an alternative: by choosing a given stak as the context of their Web searches, users consciously select their filter bubble. Furthermore, they can register to as many staks as they like, allowing them to chose the bias on each of their search results\ [#information_vegetable]_. Finally, the stak curation tools give users access to the full history of searches performed in a given stak, providing insight on which parts of one's behavior participates to stak recommendations. In the PhD work of Rakebul `@HasanPredicting2014`, we focused on analyzing and synthesizing the information of traces, to help users understand and predict the outcome of querying linked data on the Web. First, machine learning techniques have been used on a set of SPARQL query evaluations, in order to identify which features of a query are predictive of its execution time. This differs from other approaches, which rather rely on the structure of the queried data. Second, by tracing the execution of the query engine itself, explanations in the form of *why-provenance* are generated and provided to users, in order to help them understand the query results, especially when inferences are involved (see `Fig. %s-a `:numref:). Such explanations can in turn be published as linked data, using RQ4 `[@HasanGenerating2014]`, an extension of the PROV-O vocabulary `[@LeboProv2013]`. Finally, as those explanations can become quite verbose for complex queries, a summarization process for explanation (illustrated in `Fig. %s-b `:numref:) has been proposed and evaluated. .. figure:: _static/rakeb_explanation.* :name: fig:rakeb_explanation :figclass: wide Examples of a full explanation and a summarized explanation `[@HasanPredicting2014]` Traces for learning on the Web ++++++++++++++++++++++++++++++ We have also proposed a number of innovative uses of Web traces in the context of COAT\ [#coat]_, an exploratory project aimed at studying the research opportunities of e-learning, more specifically of Massive Open Online Courses (MOOCs). In those systems, learners are so many and so heterogeneous that standard indicators and monitoring tools can not be used `[@MilleMooc2015]`; more flexible and scalable ones must be proposed. Furthermore, with |MOOC|\s, the learning activity is no more confined to the hosting platform, as the learners are often pointed to external contents. Meaningful indicators can therefore only be computed by monitoring the learners' activity inside *and outside* the |MOOC| platform. TraceMe\ [#traceme]_ is a browser extension designed to trace the whole browsing activity of a user. Since it runs on the client-side, it is not restricted to tracing the activity on a given server. In addition, it can trace interactions that would be otherwise invisible to the server (such as navigation through internal links inside a page, interactions with an embedded video or audio player, |etc|). But most importantly, TraceMe has to be installed voluntarily by the user, who may enable or disable the tracing at any moment. TraceMe can also be configured with several |MTMS|\s, and the user can choose on which of them the traces should be collected. For example, a user may collect her traces on the |MTMS| provided by the |MOOC| when she is browsing content in relation with the course, and on a personal |MTMS| when she is browsing for other purposes. This kind of practice is actually encouraged in the emerging standard Tin Can (also known as the Experience API\ [#xapi]_), in which the notion of Learning Record Store (LRS) closely resembles our notion of |MTMS|. We are currently working on making TraceMe and kTBS (our |MTMS| reference implementation) interoperable with Tin Can\ [#tincan-interoperability]_. SamoTraceMe\ [#samotraceme]_ is a Web application aimed as a companion application to TraceMe. As illustrated by `fig:samotraceme`:numref:, it provides various ways to visualize one's trace, as well as tools to customize those visualizations. In order to help learners and teachers to analyze traces, SamoTraceMe also provides tools to build, execute and share indicators, |ie| synthetic representation of the information conveyed by the trace. More precisely, the "Indicator Back-Office" provides user-friendly tools to transform traces (as described in `Chapter 2 `:ref:), and query them using the natural-language interface proposed by `@KongNaturel2015`. That way, users can explore and design new indicators, better suited to |MOOC|\s than those available in the litterature, as emphasized above. For the same reason, SamoTraceMe not only encourages the building of new indicators, but also their sharing with others (in the last tab "Indicator Store"). In that sense, it is a tool for learners and teachers as much as for researchers in education sciences, making all of them actors of that research. .. figure:: _static/samotraceme.png :name: fig:samotraceme :figclass: wide The main screen of SamoTraceMe `[@MilleMooc2015]` It contains (from top to bottom): tabs providing access to the various functionalities; a graphical timeline representing the whole trace; a graphical timeline zooming on the time-window selected in the above timeline; a hyper-textual representation of the selected time window, as a list of events (obsels). Recently, we have started working on Taaabs\ [#taaabs]_, a set of reusable components for visualizing and interacting with traces, aimed at capitalizing on the experience acquired with SamoTraceMe and other works `[@BarazzuttiTransmute2015;@KongNaturel2015]`. Taaabs relies on Web Components, a coming W3C standard `[@GlazkovCustom2016]`, so that each component is available as a custom |HTML| element. The goal is to make it as easy as possible for developers to add trace-based functionalities to their applications. A longer term goal is to allow end-users to interactively build their customized dashboard by drag-and-dropping visual components. .. _interoperability: Interoperability ++++++++++++++++ .. figure:: _static/prov.* :name: fig:prov PROV Core Structure `[@LeboProv2013]` We also aim to improve the integration of our trace meta-model (as described in `Chapter 2 `:doc:) with other models gaining momentum on the Web. One of them is PROV `[@MoreauProv2013;@LeboProv2013]`, a standard data-model for representing provenance information on the Web, hence concerned with *traceability*. A central element of this data-model (depicted in `fig:prov`:numref:) is the notion of *Activity*, during which the object of the provenance information (the *Entity*) was produced or altered. This notion of Activity has an obvious kinship with our notion of obsel. PROV also defines interesting relations between entities. An entity *specializes* another entity "if it shares all aspects of the latter, and additionally presents more specific aspects"; for example, the second edition of a book is a specialization of that book (as a work); P-A. Champin as a researcher is a specialization of P-A. Champin as a person; P-A. Champin as mentioned in this document is a specialization of P-A. Champin as a researcher. While the specialized entity inherits the properties of the general one, the opposite is not true. This allows to make assertions with a limited scope, hence to have different interpretations coexist in the same knowledge base. PROV has its own data model, but defines a mapping with |RDF|. Another model for representing traces on the Web is Activity Streams, an independent format `[@SnellActivity2011]` recently endorsed by the W3C Social Web Working Group `[@SnellActivity2016]`. This format is intended to represent actions performed by a user, typically in the context of a social network application. It is extensible, and the most recent version is based on JSON-LD `[@SpornyJson2014]`, making it interoperable with |RDF| and other Semantic Web technologies. As mentioned above, Tin Can\ [#xapi]_ is another format for capturing traces, focused on the domain of e-learning. Originally based on Activity Streams, it has then slightly diverged from it. In particular it is not based on the |RDF| data model, but `@NiesTincan2015` have proposed an approach to bridge this gap, by mapping it to PROV. Tin Can is currently being considered in the Hubble project\ [#hubble]_ as an interoperability layer between the different platforms of the partners (including our own). In his master's thesis, `@CazenaveInteroperability2016` has compared those emerging standards (and others) to our meta-model. While the latter focuses exclusively on time-anchored elements (obsels), the others allow to describe a number of objects that are not (or at least not explicitly) related to time. We have therefore proposed an extension of our meta-model, where additional information can be attached to a trace, or to an obsel. The first case is useful for representing background knowledge, that is assumed to hold for the whole duration of the trace (such as names and addresses of the persons involved in the activity). The second case is useful for representing contextual information captured at the same time as the obsel, and that is only assumed to hold for the duration of that obsel (such as the heart rate or mood of the person performing an observed action). We have shown that those extensions allow us to capture the semantics of PROV and Tin Can in our meta-model, and hence to integrate existing Web traces in an |MTBS|. Ambivalence on the Web ====================== .. _ambivalent-documents: Ambivalent documents ++++++++++++++++++++ The distinction, in a document, between its physical structure and its logical structure, has long been identified and theorized. This is, in particular, why the original |HTML| (mixing concerns about both structures) was later split into |CSS| (addressing the physical structure, |ie| presentation) and the cleaner |HTML| 4 (restricted to the logical structure). But this dichotomy, however useful, is not always sufficient to capture the different overlapping structures of more complex documents, such as acrostics\ [#acrostics]_, multimedia or hypermedia documents. In such cases, the multiple structures can lead to multiple interpretations. .. _multi-structured-documents: In 2003, a working group funded by the Rhône-Alpes region was formed in Lyon to investigate that topic. We proposed a formal model and an |XML|-based syntax for representing documents with an arbitrary number of structures `[@AbascalMultiple2003;@AbascalMultiples2004]`. This model allowed us to represent not only a multi-structured document, but also a curated corpus of such documents, where the corpus is considered itself as a document, with its own additional structures spanning the documents it contains (for example, a thematic index). It was also a way to capture altogether the original structures of a document, and the *annotations* added afterwards by a community of reader. This last point was further explored in the works described in `the next chapter