2. Experiential knowledge and trace-based systems

This chapter is mostly based on the papers by Champin, Mille, and Prié (2013) and Cordier et al. (2013), and presents a synthesis of our work on modeling experiential knowledge and reasoning with it.

2.1. Motivation

By design, computers continuously produce and use traces. Every computational process works with data stored in more or less volatile memories, and produces new data that is in turn stored in those memories. Every digital inscription is therefore by definition a trace of the processes that allowed it to be produced. Those digital inscriptions account for computational processes, insofar as they result from the execution of programs (Deransart, Ducassé, and Langevine 2002), but they also account for the interactive processes between the human and the machine, insofar as they pertain to computer applications that are used by humans, in the context of their activity which is mediated by the computer (Kaptelinin 1996).

A computer-mediated activity produces traces linked to every computational process taking part in that activity. For example, a typical workday on a computer connected to the World Wide Web would produce different kinds of traces both on that computer and on other involved machines. Should we consider the resulting traces at the end of that day, we would have all documents, created or received, e-mail and instant messages, browsing history, to name only those inscriptions handled by the user. But we would also have the log files of all involved applications and servers. Should we be interested in the traces related to the proceeding of the activity, we would add different data structures handled by those applications: messages being written, open windows, load indicators for CPU, memory or network, etc.

Those traces are digital traces in the broad sense (Laflaquiere et al. 2006), that can be of any kind: document, data structure, log file, etc. We can make two remarks about them. First, their status of trace is only marginally taken into account by computer systems, and the interpretation of such inscriptions as traces is usually performed outside the system that produced them. Consider for example a contact in an address book, interpreted as a trace of the activity resulting in its recording. The trace status of those inscription is nevertheless acknowledged through the recording of temporal information regarding the processes producing and altering these inscriptions, e.g. the creation and modification dates of a file, or the timestamp of each entry in a log file. Second, each application handling traces as such has its own dedicated models and formats for representing traces, for example Learning Management Systems (George, Michel, and Ollagnier-Beldame 2013) or browser history. Despite some recents efforts to unify digital traces in some domains, such as social applications (Snell and Prodromou 2016), health (Hsieh et al. 2013), or data provenance (Moreau and Missier 2013), there is not yet a general model that would allow a cross-application and cross-domain use of digital traces, and provide generic processes for manipulating them.

Our goal, as ambitious as it may seem, is however to make traces a first-class citizen of computer systems. That way, we aim at capturing an important, and often overlooked, kind of knowledge: the experience that result from remembering and reusing past activities. We need to define a new digital trace object, which includes all the features (especially temporal ones) allowing it to be explicitly treated as a trace by applications, while remaining generic enough to be usable across various application domains.

In this chapter, we present the knowledge-engineering approach to digital traces that we have been developing for a number of years. This approach aims at building modeled-trace based systems (MTBS). Modeled-traces (or m-traces for short) are made of timestamped elements named obsels (contraction of “observed element”) and are associated with a trace-model. The trace-model provides a guideline for building and interpreting the m-trace. Computations on m-traces are most of the time transformations into new m-traces, that can be seen as a form of automated interpretation of the source m-trace.

2.2. A knowledge based approach for modeling and transforming digital traces

Ad-hoc uses of digital traces for observing

Observing an activity in order to understand it consists in collecting observable elements related to that activity, in order to build evidences guiding an interpretation. The sequence of evidences forming a trace can therefore be used to support, justify and explain interpretations. When the activity is computer-mediated, it is relatively easy to instrument the computer environment so that it collects digital traces made of potentially meaningful elements.

Digital traces were first used to ease the debugging of computer programs, with the idea that a knowledgeable observer (usually the programmer) could analyze the collected observations and interpret those traces, in order to understand the program’s behavior and fix it if needed. Computer systems have long been able to produce a memory dump whenever an exception1 is raised during the execution of a program. The produced trace can be completely standard or customized by analysts, who can set up tracing tools in order to follow only the elements that they deem relevant (Deransart, Ducassé, and Langevine 2002).

Like programmers, who can analyze the behavior of a computational process they designed, one can observe computer-mediated processes or activity as soon as the environment is instrumented in order to leave persistent traces. Such an analyst can be a professional one, or simply somebody willing to review or understand that activity—possibly the very person having performed that activity.

Then they need an interpretable representation of the collected traces. Such representations are always the result of a computation, either elementary (e.g. the hexadecimal representation of a memory dump) or more complex (e.g. a histogram of the time spent on that activity per day). A statistical processing of those elements can also be performed, using heuristics that depend on the purposed interpretation. A notable example is digital trace mining, which seeks to detect structural recurring patterns, in order to identify relevant behaviors or processes (Song, Günther, and Van der Aalst 2009; Van der Aalst et al. 2003; Cook and Wolf 1998). An important and well established use case of those techniques is to provide recommendations and personalization to the users of the traced system. This is applied in various contexts, such as Learning Management Systems (Marty, Carron, and Heraud 2009) or web sites2 (Sachan et al. 2012).

While trace mining is usually associated with big-data, and the analysis of trends among a large population of users, digital traces may prove valuable at a much smaller scale, namely that of the individual traced user. Deborah Estrin (2014) captures this idea with the concept of “small data, where n=me”, advocating for the extensive use of personal traces for providing insight on one’s behaviour (implying, among other things, the right for users to access their traces collected by third-party applications).

Whatever the techniques or the scale of trace analysis, the analyst has a fundamental role to play in the process of knowledge discovery. The use of sequential interaction traces has been studied by Fisher and Sanderson (1996) who showed that the crucial task for the analyst is to find which transformations to apply to the raw observations in order to discover useful descriptions for explaining the observed process. More recently, Amer-Yahia et al. (2014) have proposed a formal algebra to describe those transformations, for preparing data produced by social applications before applying data mining techniques. Indeed, raw observations are piecemeal and expressed from the perspective of collecting devices, i.e. in a low level register. Knowledge, on the other hand, is expressed in the register of the activity. Hence the need, to interpret traces, for a transformation carrying the skills and knowledge of the analyst, so as to rephrase sequences of raw observations into sequences of meaningful activity elements.

The research works and practices described above suggest that collecting, modeling, transforming, rephrasing and interactively exploring are necessary steps whenever the observation requires multiple interpretations. We have proposed a unifying approach in order to integrate those steps with a rich representation structure dedicated to observation traces.

Modeling digital traces: associating an interpretation model to observed elements

In numerous applications processing traces and sequential data, the semantics of those data is mostly implicit. Even when documented, it is often loosely defined, reducing developers and analysts to a hazardous guesswork based on data labels and sample values. The outcome of knowledge discovery processes could be used to improve this situation, but it can not in general be reliably attached to the original traces, as their format is not designed to allow such linking3.

We propose a new perspective on traces, considering them as knowledge inscriptions meant to carry not only the collected information, but also the elements allowing their interpretation by humans as well as computers. This brings traces into the domain of knowledge engineering, thus significantly widening the range of available tools for processing, transforming, sharing and reusing them, thanks to an explicit and operational semantics. We choose to express this semantics as a trace-model associated to the set of observed elements, and playing three roles. First, it plays the role of a vocabulary used to describe the observed elements, unambiguously relating them to the model. Second, it plays the role of a schema, constraining the structure of the observed elements. As such, it can be used to distinguish valid (or consistent) observations from invalid ones. Third, it plays the role of an ontology (Bachimont 2004, p.160-), allowing to infer new information from what has been actually observed.

But obviously, a unique model can not be sufficient to describe all computer-mediated activities, not to mention the multiplicity of perspectives on a given activity. We therefore propose a generic meta-model specifying how trace-models can be described, which will be described in Section 2.3.

We consider the interpretation of a trace, expressed using an initial trace-model, as a “rephrasing” of that trace into another model, working at a different level of abstraction. For example, it would seem natural to interpret the sequence [click icon foo], [word processor starting], [foo loading], [window displayed] as the user opening a document named “foo”. Hence, that sequence could be rephrased into a single observarion [open document foo]. The observed element in this new trace belongs to a new trace-model, which has a higher level of abstraction than the one of the initial sequence.

2.3. Modeled-trace based systems

I now present the meta-model that we have proposed for representing and processing m-traces in dedicated knowledge-based systems, MTBSs.

Example

In the rest of this chapter, I will illustrate the presented notions with the following running example: Alice uses an e-mail application to communicate and exchange documents with her colleagues.

Modeled-trace

The central notion of our meta-model is that of modeled-trace (m-trace), but we first need to define the notions of obsel and trace-model.

Every traced activity is represented by a list of observed elements or obsels. This neologism is inspired by the word “pixel” (picture element), and was coined to insist on the fact that the content of any trace is the result of an observation, hence unavoidably biased4. Every obsel has:

  • a begin time-stamp and an end time-stamp, anchoring the obsel in the time of the activity; both time-stamps can be equal, in the case of an instantaneous observation;

  • a type, associating this particular obsel to an explicit category from the trace-model;

  • a set of attributes, of the form <attribute-type, value>.

Let us note that the components of an obsel are, on purpose, only loosely specified by the meta-model. They are highly dependent on the represented activity, which should therefore be described by a trace-model. That model must specify:

  • how time is to be represented (simply a time unit, as discussed in the next subsection “Representing time”);

  • the obsel types that can be used to describe the activity;

  • for each obsel type, which attribute types can be used, and what type of value they may have;

  • a set of binary relation types that may exist between obsels;

  • a set of integrity constraints that an m-trace and its obsels must satisfy to comply with this trace-model.

Example

In Alice’s “e-mail” activity, we decide to measure time to the second.

There are three obsel types: the receiving of a message (RecvMsg), the sending of a message (SendMsg) and the saving of an attachment (SaveAtt). Obsels of types RecvMsg and SendMsg have two attributes in common: the content of the message, and the content of their attachment if any. Moreover, obsels of type RecvMsg have an extra attribute holding the e-mail address of the sender, while obsels of type SendMsg have an attribute holding the e-mail address(es) of the recipient(s) of the message, and one holding the path of the attached file, if any. Finally, obsels of type SaveAtt have an attribute holding the name under which the attachment was saved.

The trace-model also defines three relation types. The first two, RepliesTo and Forwards, both link an obsel of type SendMsg to one of type RecvMsg, to indicate that the sent message was, respectively, a response to the received message, or its forwarding to another recipient. The third relation type, From, links an obsel of type SaveAtt to one of type RecvMsg to indicate which message the saved attachment came from.

This trace-model constrains all obsels to be instantaneous, i.e. to have the same begin and end times-stamps. Furthermore, the second member of a From relation must have an attachment, i.e. the corresponding attribute must not be empty. Finally, in a SendMsg obsel, the two attributes holding the attachment content and its file-name must be either both empty or both non-empty.

An obsel type in a trace-model can also be associated to one or more parent type(s). This relationship has the standard subclass semantics (also called “a kind of”), and is interesting at several levels. At the syntax level, it allows the children types to inherit attribute definitions from their parent types, and encourages modularity in the design of the trace-model. At the semantic level, it implies that all obsels of the children types will also belong to the parent types, enabling more reasoning (and hence transformations) on the m-traces. Relation types can also have parent relation types.

Example

In our trace-model above, the common attributes of RecvMsg and SendMsg can be moved up in a parent type, which we can call Message. The resulting trace-model is represented in Fig. 2.1.

_images/trace-model.svg

Fig. 2.1 An example trace-model (Champin, Mille, and Prié 2013).

Finally, a trace-model can be linked to a number of parent trace-models, provided that they all share the same representation of time. In that case, the child trace-model will inherit all obsel types, attribute types, relation types and integrity constraints of all its parents. This is valuable from a knowledge engineering perspective, as it encourages the reuse of previously defined trace-models, together with the reasoning processes and transformations associated with them.

Example

The “e-mail” trace-model described above could be inherited by a broader trace-model, also inheriting a trace-model for “word processing”, providing a more holistic view on Alice’s (or any office worker’s) activity. Another trace-model, dedicated to a specific e-mail application, could also inherit our example trace-model, and extend it with functionalities that are specific to this application (e.g. contact management, message folders…).

We are now ready to precisely define a modeled-trace. It is specified by:

  • a reference to a trace-model,

  • a time interval called the temporal extension of the trace,

  • a set of obsels,

  • a set of typed binary relations between those obsels.

The temporal extension is the period of time during which the traced activity was recorded. While the obsels of the m-trace must all be between the bounds of the temporal extension, the time-stamp of the first and last obsel may not match exactly these bounds. Indeed, the absence of obsels, in some parts of the temporal extension, may be relevant for interpreting the trace.

The temporal extension is described using the time representation specified by the trace-model. Of course, the obsels and their relations are also described accordingly to the trace-model.

Example

Fig. 2.2 shows an m-trace representing Alice’s e-mail activity. It refers to the “e-mail” trace-model described above. Its temporal extension spans from Monday 9:00 AM to 11:00 AM.

It is composed of four obsels. To keep it simple, we have not represented the end time-stamps (as they are always equal to the begin time-stamp). At 9:15, Alice receives an e-mail from Bob. At 9:31, she saves the attached file as report.docx, then replies to Bob at 9:32. At 9:47, she sends a message to Charlie, attaching a file named report-summary.docx.

_images/trace1a.svg

Fig. 2.2 An example modeled-trace, complying with the model from Fig. 2.1 (Champin, Mille, and Prié 2013).

Representing time

The goal of our meta-model is to represent a wide range of activities, requiring different ways of representing time. In our running example, a granularity of one second seemed appropriate; but in other domains, such as traces of car driving or eye tracking, one might want more precise time-stamps. On the contrary, other activities may only require a granularity of one hour or one day, and in some cases, more precise timing information is not even available.

Besides, in some contexts, one may only have a relative mesure of time for the collected obsels. For example, the m-trace depicted in Fig. 2.2 spans from Monday 9:00 to 11:00, but there is no indication on which Monday it is. This information may be unavailable for several reasons: either it was not recorded (some log files do not store a complete date), or it was removed on purpose, for example for privacy reasons. In other contexts, the temporal information may be even scanter, obsels being merely ordered in a sequence.

To account for all those situations, our meta-model requires that:

  • per its definition, a trace-model specifies a time unit \(u\);

  • every m-trace has an origin \(o\) (see below);

  • every time-stamps in an m-trace (its temporal extension and its obsels) is an integer \(t\), representing the instant \(o + tu\).

The origin is a character string. If it is a standard representation of an instant, e.g. using the RFC 3339 format (Klyne and Newman 2002), at least as precise as the unit of the trace-model, then the temporal extension and the obsels can be absolutely dated. Their time-stamps can be converted to other time formats, and compared with any other absolute time-stamp5. On the other hand, an origin not complying with a standard format is called an opaque origin. The time-stamps of the corresponding trace can be compared with each other, but not with any arbitrary other time-stamp. Note however that an opaque origin is assumed to always identify the same instant, so if two m-traces have the same opaque origin, their time-stamps are assumed to be comparable6. As most transformations do not alter time-stamps, they usually preserve the origin of the m-trace, making the source and the transformed trace comparable with each other.

Example

The trace from Fig. 2.2 must be represented with an opaque origin, as we do not know on which Monday it was recorded. We chose to keep “Monday” in the origin to provide a hint to users. The temporal extension spans from 32400 (i.e. 9 hours) to 39600 (i.e. 11 hours). All time-stamps of the obsels are converted accordingly. The resulting m-trace is depicted in Fig. 2.3.

_images/trace1b.svg

Fig. 2.3 The example from Fig. 2.2, with a unified representation of time (Champin, Mille, and Prié 2013).

Finally, to represent a sheer ordered sequence of obsels without any quantifiable temporal information, we define the special time unit sequence. This unit imposes the following constraints:

  • the origin of the m-trace must be opaque;

  • every obsel must have equal begin and end time-stamps, and all obsels of the m-trace must have different time-stamps;

  • only the order of the time-stamps is significant; their absolute value gives no information of duration. One can not assume, for example, that the duration between time-stamps 1 and 2 is the same as between 2 and 3.

This allows to handle cases where the only information about the obsels is a total ordering. Other special units could be proposed to handle other kinds of limited temporal information.

Architecture of an MTBS

We are now ready to describe the overall architecture of an MTBS, illustrated in Fig. 2.4.

_images/mtbs.svg

Fig. 2.4 General architecture of a MTBS built around a MTMS (Champin, Mille, and Prié 2013)

The core component of such a system is the modeled-trace management system (MTMS). It plays a similar role to that of a the database management system in a standard application, but manages instead m-traces complying with the meta-model presented above. It must be flexible enough to allow several trace-models to coexist (and evolve). It must also support the intrinsic dynamics of traces. Finally, it must be able to handle modeled-trace transformations (that will be discussed in more detail in Section 2.4).

The MTMS is fed by a number of collectors, whose role is to gather the information required to build one or several m-traces. That information can be gathered synchronously, by observing the traced activity while it is taking place, or a posteriori, for example by examining log files. The trace-model of the collected m-trace determines which part of the available information is kept, and how it is organized to constitute the obsels of the m-trace7. Any m-trace produced by a collector is called a primary trace, as opposed to the transformed traces that are computed by the MTMS from other m-traces (either primary or transformed).

Finally, all m-traces can be used by application modules. Some of them can be used to display m-traces to the user in different ways, either very generic (a table listing all the obsels) or specific to a given trace-model, or even to a specific task. Other modules will process the m-traces in order to alter their own behavior, such as assistance system reusing past experiences of the user.

2.4. Trace based reasoning

Transformed m-traces

Most of the time, primary traces are not directly (or easily) usable by application modules; it is necessary to pre-process and transform them. One of the key roles of the MTMS is to perform those transformations, in order to support multiple interpretations and reasoning with m-traces.

A transformed trace is specified by:

  • one or more source m-traces (which can be either primary or transformed),

  • a reference to a transformation method,

  • optionally one or more parameters influencing the execution of the transformation method.

All the properties of the transformed trace (its model, its temporal extension, its obsels and their relations) are deterministically computed by the transformation method, provided with the source traces and the parameters. Also, note that transformations can be chained (as the sources of a transformed trace can be transformed traces themselves), in order to produce complex workflows.

While the range of possible transformation methods is very large, we can distinguish three main classes of elementary methods.

  • Selection methods keep only a subset of the obsels of a unique source trace, with respect to a given criterion. The model of the transformed trace is usually the same as the source trace, as well as the temporal extension (unless the criterion is about time-stamps).

    Example

    In the “e-mail” trace-model of our running example, the following selections can be considered: keep only obsels between 9:30 and 9:40 (temporal criterion), keep only obsels of type SendMsg (typology criterion), keep only obsels with an non-empty attachment (attribute criterion), keep only obsels that have been replied to (relation criterion).

  • Fusion methods gather in the transformed trace all obsels from several source m-traces. If the sources have different trace-models, the model of the transformed trace should inherit all their model (which implies that they have the same representation of time).

    Example

    We could combine the trace of our running example with another of Alice’s traces, also complying with the “e-mail” trace-model, but covering the period between 11:00 and 13:00 that same day, to analyze a longer part of her activity. We could also merge her trace with the “e-mail” trace from Bob at the same time, in order to study more precisely how the two of them communicate. Finally, we could combine that trace with Alice’s “word processing” trace, to analyse her office activity in a larger context. That larger context could in particular provide insight on the “e-mail” part of the activity, for example by showing that report-summary.docx is a modified version of report.docx.

  • Rewriting methods populate the transformed trace with new obsels, that are derived from the obsels of a unique source trace. It may consist in copying those obsels with less information (removing or altering some of their attributes) or more (inferring new attributes or relations from the content of the source trace or from external knowledge). But rewriting is not necessarily injective; obsels in the transformed trace may be derived from several source obsels, collectively satisfying a number of constraints.

    Example

    A trace complying with the “e-mail” trace-model can be anonymized by removing all sender and recipient attributes8. On the other hand, we could imagine to enrich the source trace by tagging obsels with an emotion detection algorithm (which would require to extend the original trace-model). Another rewriting could consist in summarizing e-mail activity, by generating one obsel per day, its attributes indicating the number of sent and received messages (this would of course require a dedicated trace-model, different from the one presented earlier). Finally, we can imagine a more elaborate kind of summary, where a sequence of messages replying to each other would be rewritten into a single obsel of type Conversation, while a sequence of sent messages with the same content to different recipients would be rewritten into a single obsel of type Broadcast.

Note that rewriting transformations may apply not only to obsel attributes, but also to their time-stamps, as well as those of the m-trace. Reducing their precision or changing an absolute origin to an opaque one may be necessary to efficiently anonymise the m-trace. It could also be used to align two m-traces originally captured at different times, in order to compare them. For example, one may want to compare the execution of the same task in two different contexts.

Fig. 2.5 illustrates those notions. It also points out that not only are transformed traces linked to their source trace, but every obsel in a transformed trace can keep track of its corresponding source obsels. Thus, any obsel at any level can be explained by the process (transformation methods) and the data (source obsels) from which it was produced.

_images/transformations.svg

Fig. 2.5 Transformed traces.

This figure shows how the three kinds of transformations can be applied in our running example. The first (from the bottom) transformed trace is a fusion of the two primary traces, which represent the same “e-mail” activity at different periods of time. The second transformed trace is a selection, keeping only obsels of type Message (recall that RecvMsg and SendMsg both inherit that obsel type). The third transformed trace is a rewriting into a more synthetic trace-model, classifying sequences of obsels into different communication patterns. Note how each transformed obsel is linked to one or more source obsels (dotted arrows).

Reasoning with transformations

A transformation chain, such as the one depicted in Fig. 2.5, can arguably be considered as a reasoning process. It involves different knowledge containers: the factual knowledge contained in the primary traces, the structural knowledge contained in the various trace-models, and the inferential knowledge contained in the different transformation methods. Moreover, that particular arrangement of transformations (with the parameters of each transformed trace) also carries some knowledge which can be either general (e.g. SaveAtt obsels are not relevant for rewriting to an “e-mail summary”, so they can be filtered out) or specific to a given context (e.g. those two primary traces are related to each other, so they should be merged).

We have proposed (Cordier et al. 2013) that trace based reasoning (TBR) can be structured as an interactive cycle of three steps – inspired by the CBR cycle described by Aamodt and Plaza (1994).

  • The elaboration step consists in setting up the transformation chain that is relevant to solve the problem at hand. This usually amounts to identifying in m-traces reusable episodes, i.e. sets of obsels that meet a number of criteria; those episodes will typically appear as aggregate obsels in a transformed trace (such as the Conversation and Broadcast obsels in Fig. 2.5). For the classes of problems anticipated by the MTBS designers, the appropriate transformations will be provided with the system. However, nothing prevents users to add their own transformation to answer unanticipated questions. Indeed, MTMSs can handle multiple concurrent transformations of the same m-trace, in order to support multiple (and sometimes contradictory) interpretations of the primary traces.

  • In the retrieval steps, the MTMS executes the transformations specified before. Depending on the kind of transformation, it can be submitted by the user to a number of constraints: on the number of episodes to retrieve, on the search algorithm to use, on the minimum certainty degree to apply… Note that, contrarily to the cases in traditional CBR, the episodes in TBR are never isolated as self-sufficient structures, but remain linked to the original obsels. They are always part of a “bigger picture”, and their context of occurrence can always be tapped whenever their content itself is not enough to decide on the most relevant episode(s) to retrieve.

  • The reuse step is when the retrieved episodes are effectively used (possibly with some adaptation) to solve the problem at hand. This can be done in various different ways, typically outside the MTMS itself (in the application modules of Fig. 2.4). Whatever the functions of those application modules are, they are integrated in the user’s traced activity, and therefore fed back to the MTMS. This is what closes the cycle, even if we do not have an explicit “retain” phase as in the CBR cycle. Indeed, Ollagnier-Beldame (2011) and Terrat (2015) have shown that even the simple fact of displaying the m-trace to the user can help improve their appropriation of the system, an effect called reflexivity. Of course, knowledge extraction can also be the explicit goal of that step, either for an external analyst or as an advanced form of reflexivity. This has been studied by Mathern et al. (2012) and Barazzutti, Cordier, and Fuchs (2015).

The user is therefore at the center of the cycle, being strongly involved in each step. Knowledge is dynamically co-constructed, the system sustaining pre-defined interpretations and providing reflexivity, and the user assessing those interpretations in context, and testing new ones when the former are not satisfactory. As such, the system can be continuously adapting to changes.

Of course, to support this level of interactivity, MTBSs must be equipped with intelligible user interfaces, both for presenting m-traces and for designing new transformation. Studying such interfaces, in order to determine which features make them efficient, is still an open question and probably one of the key points in the future development of MTBSs, which we have started to investigate (Besnaci, Guin, and Champin 2015; Kong Win Chang et al. 2015).

An open-source reference implementation

The meta-model presented in this chapter is the result of many discussions and iterations. Since 2009, we have been working on a reference implementation, whose first goal was to help stabilize the meta-model (as many problems only appear with a concrete use cases). Its second goal was to ease and speed up the development of experiments aiming at validating and/or extending the meta-model. This implementation is named kTBS (a kernel for trace based systems), and is available at http://tbs-platform.org/ktbs.

kTBS is open-source, in order to foster its reuse both in and outside our research group. It is designed as a RESTful Web service (Fielding 2000), in order to be easily integrated with other systems, regardless of their own architecture or programming language9. Internally, it stores all its data using the RDF data model (Schreiber and Raimond 2014), which meets the requirements of flexibility of our meta-model. RDF also comes with a powerful query language (Harris and Seaborne 2013), and expressive ontology languages (Hitzler et al. 2009). Externally, kTBS exposes and consumes data in the JSON-LD format (Sporny, Kellogg, and Lanthaler 2014). As stated above, the next step is to provide kTBS with intuitive and intelligible user interfaces, as TBR heavily relies on the user interacting with the MTMS.

The meta-model presented in this chapter, as well as the notion of Trace-Based Reasoning (TBR), are underlying a number of the works presented in the following chapters. Those will further demonstrate how this meta-model supports the user-centric co-construction of knowledge, taking into account the various contexts of use of that knowledge, and allowing multiple interpretations to coexist.

Notes

1

An exception is a case that the computer can not handle, such as a division by zero or an access to a non-existing memory address. When an exception is raised by a program, that program is suspended and an exception handler is started, provided with the context in which the exception occurred. The term “error” is sometimes used instead of “exception”, but the very notion of error relates to an interpretation, even an appraisal.

2

For example, services such as Google Analytics offer tools to precisely analyze the visits on a web site: http://www.google.com/intl/en/analytics/

3

This assessment is based on legacy data, such as log files. Recent efforts proposing generic trace formats (Moreau and Missier 2013; Snell and Prodromou 2016) build on semantic web technologies and linked data principles, and are much more similar to our proposal.

4

In this respect, let us point out how misleading the term “data” can be. It originally means “given”, which gives it an aura of neutrality or objectivity. In fact the data we get are not so much given as they are taken (observed, measured, extracted, captured, selected…) and therefore never independent of the processes set up to obtain them.

5

Note that time-stamps in most operating systems are represented that way: as a number of time units (usually the second) since a give origin or “epoch” (typically 1970-01-01 on UNIX systems).

6

There is another consequence: while it might seem trivial to convert time-stamps from a fine-grained unit to a coarser-grained unit, it is actually not always possible when using an opaque origin. For example, converting days to months can not be done accurately if we don’t have an absolute origin, as we do not know after how many days to change month. Less obviously, converting from hours (or minutes, or seconds) to days can not be done either: because of Daylight Saving Time, some days have 23 hours, and some have 25.

7

Most collectors will be dedicated to one specific trace-model, with the constraints of that model hard-coded in them. However, one could imagine more generic collectors, able to inspect richly described trace-models in order to comply to them dynamically. Another perspective would be collectors able to dynamically edit trace-models, whenever they encounter a situation that the model can not represent.

8

An efficient anonymization would actually require a more complex processing, as also the message bodies and attachments may contain information allowing to identify the persons involved. Still, those complex processes would still qualify as a rewriting transformation, according to our definition.

9

The rationale of this choice is of course not to suggest that kTBS, or any MTMS, should in general be offshored to an external providers. Traces typically contain privacy-sensible information, and should obviously be kept in trustworthy locations. In a typical setup, the kTBS service would be deployed on the same server as the application.

Chapter bibliography

Aamodt, Agnar, and Enric Plaza. 1994. “Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches.” AI Communications 7 (1): 39–59.

Amer-Yahia, Sihem, Noha Ibrahim, Christiane Kamdem Kengne, Federico Ulliana, and Marie-Christine Rousset. 2014. “SOCLE: Towards a Framework for Data Preparation in Social Applications.” Ingénierie Des Systèmes d’Information 19 (3): 49–72. https://doi.org/10.3166/isi.19.3.49-72.

Bachimont, B. 2004. “Arts et Sciences Du Numérique : Ingénierie Des Connaissances et Critique de La Raison Computationnelle.” HDR, Université de Technologie de Compiègne.

Barazzutti, Pierre-Loup, Amélie Cordier, and Béatrice Fuchs. 2015. “Transmute: An Interactive Tool for Assisting Knowledge Discovery in Interaction Traces.” Research Report. Universite Claude Bernard Lyon 1 ; Universite Jean Moulin Lyon 3. https://hal.archives-ouvertes.fr/hal-01172013.

Besnaci, Mohamed, Nathalie Guin, and Pierre-Antoine Champin. 2015. “Acquisition de Connaissances Pour Importer Des Traces Existantes Dans Un Système de Gestion de Bases de Traces.” In Journées Francophones d’Ingénierie Des Connaissances. Rennes, France: AFIA. https://hal.archives-ouvertes.fr/hal-01164384.

Champin, Pierre-Antoine, Alain Mille, and Yannick Prié. 2013. “Vers des traces numériques comme objets informatiques de premier niveau : une approche par les traces modélisées.” Intellectica, no. 59 (June): 171–204.

Cook, Jonathan E., and Alexander L. Wolf. 1998. “Discovering Models of Software Processes from Event-Based Data.” ACM Transactions on Software Engineering and Methodology (TOSEM) 7 (3): 215–249.

Cordier, Amélie, Marie Lefevre, Pierre-Antoine Champin, Olivier Georgeon, and Alain Mille. 2013. “Trace-Based Reasoning — Modeling Interaction Traces for Reasoning on Experiences.” In The 26th International FLAIRS Conference. http://liris.cnrs.fr/publis/?id=5955.

Deransart, Pierre, Mireille Ducassé, and Ludovic Langevine. 2002. “A Generic Trace Model for Finite Domain.” In User-Interaction in Constraint Satisfaction, edited by Barry O’Sullivan and Eugene C. Freuder. New York, NY, USA.

Estrin, Deborah. 2014. “Small Data, Where n = Me.” Commun. ACM 57 (4): 32–34. https://doi.org/10.1145/2580944.

Fielding, Roy Thomas. 2000. “Architectural Styles and the Design of Network-Based Software Architectures.” Doctoral dissertation, University of California, Irvine. http://www.ics.uci.edu/%7Efielding/pubs/dissertation/top.htm.

Fisher, Carolanne, and Penelope Sanderson. 1996. “Exploratory Sequential Data Analysis: Exploring Continuous Observational Data.” Interactions 3 (2): 25–34.

George, Sébastien, Christine Michel, and Magali Ollagnier-Beldame. 2013. “Usages Réflexifs Des Traces Dans Les Environnements Informatiques Pour l’apprentissage Humain.” Intellectica, no. 59: 205–241.

Harris, Steve, and Andy Seaborne. 2013. “SPARQL 1.1 Query Language.” W3C Recommendation. W3C. http://www.w3.org/TR/sparql11-query/.

Hitzler, Pascal, Markus Krötzsch, Bijan Parsia, Peter F. Patel-Schneider, and Sebastian Rudolph. 2009. “OWL 2 Web Ontology Language Primer.” W3C Recommendation. W3C. http://www.w3.org/TR/owl2-primer/.

Hsieh, Cheng-Kang, Hongsuda Tangmunarunkit, Faisal Alquaddoomi, John Jenkins, Jinha Kang, Cameron Ketcham, Brent Longstaff, et al. 2013. “Lifestreams: A Modular Sense-Making Toolset for Identifying Important Patterns from Everyday Life.” In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, 5:1–5:13. SenSys ’13. New York, NY, USA: ACM. https://doi.org/10.1145/2517351.2517368.

Kaptelinin, Victor. 1996. “Computer-Mediated Activity: Functional Organs in Social and Developmental Contexts.” Context and Consciousness: Activity Theory and Human-Computer Interaction, 45–68.

Klyne, Graham, and Chris Newman. 2002. “Date and Time on the Internet: Timestamps.” RFC 3339. IETF. https://tools.ietf.org/html/rfc3339.

Kong Win Chang, Bryan, Marie Lefevre, Nathalie Guin, and Pierre-Antoine Champin. 2015. “SPARE-LNC : Un Langage Naturel Contrôlé Pour l’interrogation de Traces d’interactions Stockées Dans Une Base RDF.” In Journées Francophones d’Ingénierie Des Connaissances. Rennes, France: AFIA. https://hal.archives-ouvertes.fr/hal-01164383.

Laflaquiere, Julien, Lotfi S. Settouti, Yannick Prié, and Alain Mille. 2006. “Trace-Based Framework for Experience Management and Engineering.” In Knowledge-Based Intelligent Information and Engineering Systems, 1171–1178. Springer. http://link.springer.com/chapter/10.1007/11892960_141.

Marty, Jean-Charles, Thibault Carron, and Jean-Mathias Heraud. 2009. “Traces and Indicators: Fundamentals for Regulating Learning Activities.” In Teachers and Teaching: Strategies, Innovations and Problem Solving, 323–349. Nova Science. http://brainsharepubliconlinelibrary.tk/education/Teachers%20and%20Teaching%20Strategies.pdf#page=325.

Mathern, Benoît, Alain Mille, Amélie Cordier, Damien Cram, and Raafat Zarka. 2012. “Towards a Knowledge-Intensive and Interactive Knowledge Discovery Cycle.” In 20th ICCBR Workshop Proceedings, edited by Juan A. Recio-Garcia Luc Lamontagne, 151–62. http://liris.cnrs.fr/publis/?id=5916.

Moreau, Luc, and Paolo Missier. 2013. “PROV-DM: The PROV Data Model.” W3C Recommendation. W3C. http://www.w3.org/TR/prov-dm/.

Ollagnier-Beldame, Magali. 2011. “The Use of Digital Traces: A Promising Basis for the Design of Adapted Information Systems?” International Journal on Computer Science and Information Systems. Special Issue ”Users and Information Systems” 6 (2): 24–45.

Sachan, Mrinmaya, Danish Contractor, Tanveer A. Faruquie, and L. Venkata Subramaniam. 2012. “Using Content and Interactions for Discovering Communities in Social Networks.” In 21st International Conference on World Wide Web, 331–340. Lyon, France: ACM. http://dl.acm.org/citation.cfm?id=2187882.

Schreiber, Guss, and Yves Raimond. 2014. “RDF 1.1 Primer.” W3C Working Group Note. W3C. http://www.w3.org/TR/rdf-primer/.

Snell, James, and Evan Prodromou. 2016. “Activity Streams 2.0.” W3C Candidate Recommendation. W3C. http://www.w3.org/TR/activitystreams-core/.

Song, Minseok, Christian W. Günther, and Wil MP Van der Aalst. 2009. “Trace Clustering in Process Mining.” In Business Process Management Workshops, 109–120. Springer. http://link.springer.com/chapter/10.1007/978-3-642-00328-8_11.

Sporny, Manu, Gregg Kellogg, and Markus Lanthaler. 2014. “JSON-LD 1.0 – A JSON-Based Serialization for Linked Data.” W3C Recommendation. W3C. http://www.w3.org/TR/json-ld-syntax/.

Terrat, Helene. 2015. “Apports et Limites Des TICE Dans Les Apprentissages de La Langue Chez Les Élèves Handicapés Moteurs Présentant Des Troubles Associés : Utilisation Des Traces Numériques Pour Favoriser l’apprentissage de La Langue Écrite.” Phd Thesis, Université Lumière Lyon 2. http://www.theses.fr/s62275.

Van der Aalst, Wil MP, Boudewijn F. van Dongen, Joachim Herbst, Laura Maruster, Guido Schimm, and Anton JMM Weijters. 2003. “Workflow Mining: A Survey of Issues and Approaches.” Data & Knowledge Engineering 47 (2): 237–267.