.. include:: common.inc.rst

================================================
 Experiential knowledge and trace-based systems
================================================

This chapter is mostly based on the papers by `@ChampinVers2013` and `@CordierTrace2013`,
and presents a synthesis of our work on modeling experiential knowledge and reasoning with it.

Motivation
==========

By design, computers continuously produce and use traces.
Every computational process works
with data stored in more or less volatile memories,
and produces new data that is in turn stored in those memories.
Every digital inscription is therefore by definition
a trace of the processes that allowed it to be produced.
Those digital inscriptions account for computational processes,
insofar as they result
from the execution of programs `[@DeransartGeneric2002]`,
but they also account
for the interactive processes between the human and the machine,
insofar as they pertain to computer applications that are used by humans,
in the context of their activity which is *mediated*
by the computer `[@KaptelininMediated1996]`.

A computer-mediated activity produces
traces linked to every computational process taking part in that activity.
For example, a typical workday
on a computer connected to the World Wide Web would produce
different kinds of traces both on that computer and
on other involved machines.
Should we consider the resulting traces at the end of that day,
we would have all documents, created or received, e-mail and instant messages,
browsing history, to name only those inscriptions handled by the user.
But we would also have the log files of all involved applications and servers.
Should we be interested in the traces related
to the proceeding of the activity, we would add
different data structures handled by those applications:
messages being written, open windows,
load indicators for CPU, memory or network, |etc|

Those traces are digital traces in the broad sense `[@LaflaquiereTrace2006]`,
that can be of any kind: document, data structure, log file, |etc|
We can make two remarks about them.
First, their status of trace is only marginally taken into account
by computer systems,
and the interpretation of such inscriptions as traces is usually performed
outside the system that produced them.
Consider for example a contact in an address book,
interpreted as a trace of the activity resulting in its recording.
The trace status of those inscription is nevertheless acknowledged through
the recording of temporal information regarding
the processes producing and altering these inscriptions,
|eg| the creation and modification dates of a file,
or the timestamp of each entry in a log file.
Second,
each application handling traces as such has its own dedicated models and formats for representing traces,
for example Learning Management Systems `[@GeorgeUsages2013]` or browser history.
Despite some recents efforts to unify digital traces in some domains,
such as social applications `[@SnellStreams2016]`,
health `[@EstrinLifestreams2013]`,
or data provenance `[@MoreauProv2013]`,
there is not yet a general model that would allow
a cross-application and cross-domain use of digital traces,
and provide generic processes for manipulating them.

Our goal, as ambitious as it may seem, is however
to make traces a first-class citizen of computer systems.
That way, we aim at capturing an important, and often overlooked, kind of knowledge:
the *experience* that result from remembering and reusing past activities.
We need to define a new *digital trace* object,
which includes all the features (especially temporal ones) allowing it
to be explicitly treated as a trace by applications,
while remaining generic enough to be usable across various application domains.

.. index:: modeled-trace based system, modeled-trace
   see: MTBS; modeled-trace based system

In this chapter,
we present the knowledge-engineering approach to digital traces that we have been developing for a number of years.
This approach aims at building modeled-trace based systems (MTBS).
Modeled-traces (or m-traces for short) are made
of timestamped elements named *obsels*
(contraction of "observed element")
and are associated with a *trace-model*.
The trace-model provides a guideline for building and interpreting the m-trace.
Computations on m-traces are most of the time
*transformations* into new m-traces,
that can be seen as a form of automated interpretation of the source m-trace.


A knowledge based approach for modeling and transforming digital traces
=======================================================================

Ad-hoc uses of digital traces for observing
+++++++++++++++++++++++++++++++++++++++++++

Observing an activity in order to understand it consists in collecting
observable elements related to that activity, in order to build *evidences*
guiding an interpretation.
The sequence of evidences forming a trace can therefore be used
to support, justify and explain interpretations.
When the activity is computer-mediated,
it is relatively easy to instrument the computer environment so that
it collects digital traces made of potentially meaningful elements.

Digital traces were first used to ease the debugging of computer programs,
with the idea that a knowledgeable observer (usually the programmer)
could analyze the collected observations
and interpret those traces,
in order to understand the program's behavior and fix it if needed.
Computer systems have long been able to produce a memory dump
whenever an exception\ [#exception]_ is raised
during the execution of a program.
The produced trace can be completely standard
or customized by analysts, who can set up tracing tools
in order to follow only the elements that
they deem relevant `[@DeransartGeneric2002]`.

Like programmers,
who can analyze the behavior of a computational process they designed,
one can observe computer-mediated processes or activity as soon as
the environment is instrumented in order to leave persistent traces.
Such an analyst can be a professional one,
or simply somebody willing to review or understand that activity\
---possibly the very person having performed that activity.

Then they need an interpretable representation of the collected traces.
Such representations are always the result of a computation,
either elementary (|eg| the hexadecimal representation of a memory dump)
or more complex
(|eg| a histogram of the time spent on that activity per day).
A statistical processing of those elements can also be performed,
using heuristics that depend on the purposed interpretation.
A notable example is digital trace mining,
which seeks to detect structural recurring patterns,
in order to identify relevant behaviors or processes `[@SongClustering2009;@AalstSurvey2003;@CookDiscovering1998]`.
An important and well established use case of those techniques is to provide
recommendations and personalization to the users of the traced system.
This is applied in various contexts,
such as Learning Management Systems `[@MartyTraces2009]`
or web sites\ [#google_analytics]_ `[@SachanInteractions2012]`.

.. _small-data:

While trace mining is usually associated with big-data,
and the analysis of trends among a large population of users,
digital traces may prove valuable at a much smaller scale,
namely that of the individual traced user.
Deborah `@EstrinSmall2014` captures this idea with the concept of "small data, where n=me",
advocating for the extensive use of personal traces for providing insight on one's behaviour
(implying, among other things,
the right for users to access their traces collected by third-party applications).

Whatever the techniques or the scale of trace analysis,
the analyst has a fundamental role to play in the process of knowledge discovery.
The use of sequential interaction traces has been studied
by `@FisherExploratory1996` who showed that
the crucial task for the analyst is to find which transformations
to apply to the raw observations in order to discover useful descriptions
for explaining the observed process.
More recently,
`@YahiaSocle2014` have proposed a formal algebra to describe those transformations,
for preparing data produced by social applications
before applying data mining techniques.
Indeed, raw observations are piecemeal and expressed from the perspective of collecting devices,
|ie| in a low level register.
Knowledge, on the other hand, is expressed in the register of the activity.
Hence the need, to interpret traces, for a transformation carrying
the skills and knowledge of the analyst,
so as to rephrase sequences of raw observations
into sequences of meaningful activity elements.


The research works and practices described above suggest that collecting,
modeling, transforming,
rephrasing and interactively exploring are necessary steps
whenever the observation requires multiple interpretations.
We have proposed a unifying approach in order to integrate those steps
with a rich representation structure dedicated to observation traces.

Modeling digital traces: associating an interpretation model to observed elements
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

.. NB: compared to Intellectica,
   I revised this section a lot, because:

   1. I disagree that nowadays, traces don't have semantics
   (Atom has one, even if minimal; AcivityStreams and PROV definitely have one)

   2. I think it sells to much the Model = Ontology stance,
   which is not so clearly used in our work.

In numerous applications processing traces and sequential data,
the semantics of those data is mostly implicit.
Even when documented, it is often loosely defined,
reducing developers and analysts to a hazardous guesswork based on data labels and sample values.
The outcome of knowledge discovery processes could be used to improve this situation,
but it can not in general be reliably attached to the original traces,
as their format is not designed to allow such linking\ [#modern_traces]_.

We propose a new perspective on traces,
considering them as *knowledge inscriptions* meant to carry not only the collected information,
but also the elements allowing their interpretation by humans as well as computers.
This brings traces into the domain of knowledge engineering,
thus significantly widening the range of available tools for processing,
transforming, sharing and reusing them,
thanks to an explicit and operational semantics.
We choose to express this semantics as a *trace-model* associated to the set of observed elements, and playing three roles.
First, it plays the role of a *vocabulary* used to describe the observed elements,
unambiguously relating them to the model.
Second, it plays the role of a *schema*,
constraining the structure of the observed elements.
As such, it can be used to distinguish valid (or consistent)
observations from invalid ones.
Third, it plays the role of an *ontology* `[@BachimontArts2004 p.160-]`,
allowing to infer new information from what has been actually observed.

But obviously,
a unique model can not be sufficient to describe all computer-mediated activities,
not to mention the multiplicity of perspectives on a given activity.
We therefore propose a generic *meta-model* specifying how trace-models can be described,
which will be described in `mtbs`:numref:.

We consider the interpretation of a trace,
expressed using an initial trace-model,
as a "rephrasing" of that trace into another model,
working at a different level of abstraction.
For example, it would seem natural to interpret the sequence
``[click icon foo], [word processor starting], [foo loading], [window displayed]``
as the user opening a document named "foo".
Hence, that sequence could be rephrased into a single observarion
``[open document foo]``.
The observed element in this new trace belongs to a new trace-model,
which has a higher level of abstraction than the one of the initial sequence.


.. _mtbs:

Modeled-trace based systems
===========================

I now present the meta-model that we have proposed for representing and processing m-traces in dedicated knowledge-based systems, |MTBS|\ s.

.. admonition:: Example

   In the rest of this chapter,
   I will illustrate the presented notions with the following running example:
   Alice uses an e-mail application to communicate and exchange documents with her colleagues.

.. index:: modeled-trace, !obsel, !trace-model
   see: m-trace; modeled-trace
   see: observed element; obsel

Modeled-trace
+++++++++++++

The central notion of our meta-model is that of *modeled-trace* (m-trace),
but we first need to define the notions of *obsel* and *trace-model*.

Every traced activity is represented by a list of *observed elements* or `obsels`:def:.
This neologism is inspired by the word "pixel" (picture element),
and was coined to insist on the fact that the content of any trace is the result of an observation,
hence unavoidably biased\ [#data-etymology]_.
Every obsel has:

* a begin time-stamp and an end time-stamp,
  anchoring the obsel in the time of the activity;
  both time-stamps can be equal,
  in the case of an *instantaneous* observation;

* a type,
  associating this particular obsel to an explicit category from the trace-model;

* a set of attributes, of the form <attribute-type, value>.

Let us note that the components of an obsel are, on purpose,
only loosely specified by the meta-model.
They are highly dependent on the represented activity,
which should therefore be described by a `trace-model`:def:.
That model must specify:

* how *time* is to be represented
  (simply a time unit,
  as discussed in the next subsection "`representing-time`:ref:");

* the obsel types that can be used to describe the activity;

* for each obsel type, which attribute types can be used,
  and what type of value they may have;

* a set of binary relation types that may exist between obsels;

* a set of integrity constraints that an m-trace and its obsels must satisfy to comply with this trace-model.

.. admonition:: Example

   In Alice's "e-mail" activity, we decide to measure time to the second.

   There are three obsel types: the receiving of a message (``RecvMsg``),
   the sending of a message (``SendMsg``)
   and the saving of an attachment (``SaveAtt``).
   Obsels of types ``RecvMsg`` and ``SendMsg`` have two attributes in common:
   the content of the message, and the content of their attachment if any.
   Moreover,
   obsels of type ``RecvMsg`` have an extra attribute holding the e-mail address of the sender,
   while obsels of type ``SendMsg`` have an attribute holding the e-mail address(es) of the recipient(s) of the message,
   and one holding the path of the attached file, if any.
   Finally,
   obsels of type ``SaveAtt`` have an attribute holding the name under which the attachment was saved.

   The trace-model also defines three relation types.
   The first two, ``RepliesTo`` and ``Forwards``,
   both link an obsel of type ``SendMsg`` to one of type ``RecvMsg``,
   to indicate that the sent message was, respectively,
   a response to the received message,
   or its forwarding to another recipient.
   The third relation type, ``From``,
   links an obsel of type ``SaveAtt`` to one of type ``RecvMsg``
   to indicate which message the saved attachment came from.

   This trace-model constrains all obsels to be instantaneous,
   |ie| to have the same begin and end times-stamps.
   Furthermore, the second member of a ``From`` relation must have an attachment,
   |ie| the corresponding attribute must not be empty.
   Finally, in a ``SendMsg`` obsel,
   the two attributes holding the attachment content and its file-name must be either both empty or both non-empty.

An obsel type in a trace-model can also be associated to one or more parent type(s).
This relationship has the standard subclass semantics (also called "a kind of"),
and is interesting at several levels.
At the syntax level,
it allows the children types to inherit attribute definitions from their parent types,
and encourages modularity in the design of the trace-model.
At the semantic level,
it implies that all obsels of the children types will also belong to the parent types,
enabling more reasoning (and hence transformations) on the m-traces.
Relation types can also have parent relation types.

.. admonition:: Example

   In our trace-model above,
   the common attributes of ``RecvMsg`` and ``SendMsg`` can be moved up in a parent type,
   which we can call ``Message``.
   The resulting trace-model is represented in `fig:trace-model`:numref:.


.. figure:: _static/trace-model.*
   :name: fig:trace-model

   An example trace-model `[@ChampinVers2013]`.


Finally, a trace-model can be linked to a number of parent trace-models,
provided that they all share the same representation of time.
In that case, the child trace-model will inherit all obsel types,
attribute types, relation types and integrity constraints of all its parents.
This is valuable from a knowledge engineering perspective,
as it encourages the reuse of previously defined trace-models,
together with the reasoning processes and transformations associated with them.

.. admonition:: Example

   The "e-mail" trace-model described above could be inherited by a broader trace-model,
   also inheriting a trace-model for "word processing",
   providing a more holistic view on Alice's (or any office worker's) activity.
   Another trace-model, dedicated to a specific e-mail application,
   could also inherit our example trace-model,
   and extend it with functionalities that are specific to this application
   (|eg| contact management, message folders...).

.. index:: !modeled-trace

We are now ready to precisely define a `modeled-trace`:def:.
It is specified by:

* a reference to a trace-model,
* a time interval called the *temporal extension* of the trace,
* a set of obsels,
* a set of typed binary relations between those obsels.

The temporal extension is the period of time during which the traced activity was recorded.
While the obsels of the m-trace must all be between the bounds of the temporal extension,
the time-stamp of the first and last obsel may not match exactly these bounds.
Indeed,
the *absence* of obsels, in some parts of the temporal extension,
may be relevant for interpreting the trace.

The temporal extension is described using the time representation specified by the trace-model.
Of course, the obsels and their relations are also described accordingly to the trace-model.

.. admonition:: Example

   :numref:`fig:trace1a` shows an m-trace representing Alice's e-mail activity.
   It refers to the "e-mail" trace-model described above.
   Its temporal extension spans from Monday 9:00 AM to 11:00 AM.

   It is composed of four obsels.
   To keep it simple,
   we have not represented the end time-stamps
   (as they are always equal to the begin time-stamp).
   At 9:15, Alice receives an e-mail from Bob.
   At 9:31, she saves the attached file as ``report.docx``,
   then replies to Bob at 9:32.
   At 9:47, she sends a message to Charlie,
   attaching a file named ``report-summary.docx``.


.. figure:: _static/trace1a.*
   :name: fig:trace1a
   :figclass: wide

   An example modeled-trace,
   complying with the model from `fig:trace-model`:numref:
   `[@ChampinVers2013]`.

.. _representing-time:

Representing time
+++++++++++++++++

The goal of our meta-model is to represent a wide range of activities,
requiring different ways of representing time.
In our running example, a granularity of one second seemed appropriate;
but in other domains, such as traces of car driving or eye tracking,
one might want more precise time-stamps.
On the contrary,
other activities may only require a granularity of one hour or one day,
and in some cases, more precise timing information is not even available.

Besides, in some contexts,
one may only have a *relative* mesure of time for the collected obsels.
For example,
the m-trace depicted in `fig:trace1a`:numref: spans from Monday 9:00 to 11:00,
but there is no indication on *which* Monday it is.
This information may be unavailable for several reasons:
either it was not recorded (some log files do not store a complete date),
or it was removed on purpose, for example for privacy reasons.
In other contexts,
the temporal information may be even scanter,
obsels being merely ordered in a sequence.


.. index:: !origin, !origin; opaque

To account for all those situations, our meta-model requires that:

* per its definition, a trace-model specifies a time unit `u`:m:;
* every m-trace has an origin `o`:m: (see below);
* every time-stamps in an m-trace (its temporal extension and its obsels)
  is an integer `t`:m:,
  representing the instant `o + tu`:m:.

The `origin`:def: is a character string.
If it is a standard representation of an instant,
|eg| using the RFC 3339 format `[@KlyneDate2002]`,
at least as precise as the unit of the trace-model,
then the temporal extension and the obsels can be absolutely dated.
Their time-stamps can be converted to other time formats,
and compared with any other absolute time-stamp\ [#unix-epoch]_.
On the other hand,
an origin not complying with a standard format is called an `opaque origin`:def:.
The time-stamps of the corresponding trace can be compared with each other,
but not with any arbitrary other time-stamp.
Note however that an opaque origin is assumed to always identify the same instant,
so if two m-traces have the same opaque origin,
their time-stamps are assumed to be comparable\ [#converting-units]_.
As most transformations do not alter time-stamps,
they usually preserve the origin of the m-trace,
making the source and the transformed trace comparable with each other.

.. admonition:: Example

   The trace from `fig:trace1a`:numref: must be represented with an opaque origin,
   as we do not know on which Monday it was recorded.
   We chose to keep "Monday" in the origin to provide a hint to users.
   The temporal extension spans from 32400 (|ie| 9 hours)
   to 39600 (|ie| 11 hours).
   All time-stamps of the obsels are converted accordingly.
   The resulting m-trace is depicted in `fig:trace1b`:numref:.


.. figure:: _static/trace1b.*
   :name: fig:trace1b
   :figclass: wide

   The example from `fig:trace1a`:numref:,
   with a unified representation of time `[@ChampinVers2013]`.


Finally, to represent a sheer ordered sequence of obsels without any quantifiable temporal information,
we define the special time unit ``sequence``.
This unit imposes the following constraints:

* the origin of the m-trace must be opaque;
* every obsel must have equal begin and end time-stamps,
  and all obsels of the m-trace must have different time-stamps;
* only the *order* of the time-stamps is significant;
  their absolute value gives no information of duration.
  One can not assume, for example,
  that the duration between time-stamps 1 and 2 is the same as between 2 and 3.

This allows to handle cases where the only information about the obsels is a total ordering.
Other special units could be proposed to handle other kinds of limited temporal information.


.. index:: !modeled-trace management system, !collector, !primary trace

Architecture of an |MTBS|
+++++++++++++++++++++++++

We are now ready to describe the overall architecture of an |MTBS|,
illustrated in `fig:mtbs`:numref:.


.. figure:: _static/mtbs.*
   :name: fig:mtbs

   General architecture of a |MTBS| built around a |MTMS|
   `[@ChampinVers2013]`


.. index::
   see: MTMS; modeled-trace management system

The core component of such a system is the `modeled-trace management system`:def:
(MTMS).
It plays a similar role to that of a the database management system in a standard application,
but manages instead m-traces complying with the meta-model presented above.
It must be flexible enough to allow several trace-models to coexist (and evolve).
It must also support the intrinsic dynamics of traces.
Finally, it must be able to handle modeled-trace transformations
(that will be discussed in more detail in `tbr`:numref:).

The |MTMS| is fed by a number of `collectors`:def:,
whose role is to gather the information required to build one or several m-traces.
That information can be gathered synchronously,
by observing the traced activity while it is taking place,
or `a posteriori`:l:, for example by examining log files.
The trace-model of the collected m-trace determines which part of the available information is kept,
and how it is organized to constitute the obsels of the m-trace\ [#collector-model]_.
Any m-trace produced by a collector is called a `primary trace`:def:,
as opposed to the *transformed traces* that are computed by the |MTMS| from other m-traces
(either primary or transformed).

Finally, all m-traces can be used by application modules.
Some of them can be used to display m-traces to the user in different ways,
either very generic (a table listing all the obsels)
or specific to a given trace-model, or even to a specific task.
Other modules will process the m-traces in order to alter their own behavior,
such as assistance system reusing past experiences of the user.


.. index:: trace based reasoning

.. _tbr:

Trace based reasoning
=====================

.. index:: !transformed trace
   see: transformation; transformed trace

Transformed m-traces
++++++++++++++++++++

Most of the time, primary traces are not directly (or easily) usable by application modules;
it is necessary to pre-process and transform them.
One of the key roles of the |MTMS| is to perform those transformations,
in order to support multiple interpretations and reasoning with m-traces.

A `transformed trace`:def: is specified by:

* one or more *source* m-traces (which can be either primary or transformed),
* a reference to a transformation method,
* optionally one or more parameters influencing the execution of the transformation method.

All the properties of the transformed trace
(its model, its temporal extension, its obsels and their relations)
are deterministically computed by the transformation method,
provided with the source traces and the parameters.
Also, note that transformations can be chained
(as the sources of a transformed trace can be transformed traces themselves),
in order to produce complex workflows.

While the range of possible transformation methods is very large,
we can distinguish three main classes of elementary methods.

* *Selection* methods keep only a subset of the obsels of a unique source trace,
  with respect to a given criterion.
  The model of the transformed trace is usually the same as the source trace,
  as well as the temporal extension (unless the criterion is about time-stamps).

  .. admonition:: Example

     In the "e-mail" trace-model of our running example,
     the following selections can be considered:
     keep only obsels between 9:30 and 9:40 (temporal criterion),
     keep only obsels of type ``SendMsg`` (typology criterion),
     keep only obsels with an non-empty attachment (attribute criterion),
     keep only obsels that have been replied to (relation criterion).

* *Fusion* methods gather in the transformed trace all obsels from several source m-traces.
  If the sources have different trace-models,
  the model of the transformed trace should inherit all their model
  (which implies that they have the same representation of time).

  .. admonition:: Example

     We could combine the trace of our running example with another of Alice's traces,
     also complying with the "e-mail" trace-model,
     but covering the period between 11:00 and 13:00 that same day,
     to analyze a longer part of her activity.
     We could also merge her trace with the "e-mail" trace from Bob at the same time,
     in order to study more precisely how the two of them communicate.
     Finally, we could combine that trace with Alice's "word processing" trace,
     to analyse her office activity in a larger context.
     That larger context could in particular provide insight on the "e-mail" part of the activity,
     for example by showing that ``report-summary.docx`` is a modified version of ``report.docx``.

* *Rewriting* methods populate the transformed trace with new obsels,
  that are derived from the obsels of a unique source trace.
  It may consist in copying those obsels with less information
  (removing or altering some of their attributes)
  or more
  (inferring new attributes or relations from the content of the source trace or from external knowledge).
  But rewriting is not necessarily injective;
  obsels in the transformed trace may be derived from several source obsels,
  collectively satisfying a number of constraints.

  .. admonition:: Example

     A trace complying with the "e-mail" trace-model can be anonymized by removing all sender and recipient attributes\ [#anonymize]_.
     On the other hand,
     we could imagine to enrich the source trace by tagging obsels with an emotion detection algorithm
     (which would require to extend the original trace-model).
     Another rewriting could consist in summarizing e-mail activity,
     by generating one obsel per day,
     its attributes indicating the number of sent and received messages
     (this would of course require a dedicated trace-model,
     different from the one presented earlier).
     Finally,
     we can imagine a more elaborate kind of summary,
     where a sequence of messages replying to each other would be rewritten into a single obsel of type ``Conversation``,
     while a sequence of sent messages with the same content to different recipients would be rewritten into a single obsel of type ``Broadcast``.

Note that rewriting transformations may apply not only to obsel attributes,
but also to their time-stamps, as well as those of the m-trace.
Reducing their precision or changing an absolute origin to an opaque one may be necessary to efficiently anonymise the m-trace.
It could also be used to align two m-traces originally captured at different times,
in order to compare them.
For example, one may want to compare the execution of the same task in two different contexts.

:numref:`fig:transformations` illustrates those notions.
It also points out that not only are transformed traces linked to their source trace,
but every obsel in a transformed trace can keep track of its corresponding source obsels.
Thus,
any obsel at any level can be *explained* by the process
(transformation methods)
and the data (source obsels) from which it was produced.


.. figure:: _static/transformations.*
   :name: fig:transformations
   :figclass: wide

   Transformed traces.

   This figure shows how the three kinds of transformations can be applied in our running example.
   The first (from the bottom) transformed trace is a fusion of the two primary traces,
   which represent the same "e-mail" activity at different periods of time.
   The second transformed trace is a selection, keeping only obsels of type ``Message``
   (recall that ``RecvMsg`` and ``SendMsg`` both inherit that obsel type).
   The third transformed trace is a rewriting into a more synthetic trace-model,
   classifying sequences of obsels into different communication patterns.
   Note how each transformed obsel is linked to one or more source obsels
   (dotted arrows).


Reasoning with transformations
++++++++++++++++++++++++++++++

A transformation chain, such as the one depicted in `fig:transformations`:numref:,
can arguably be considered as a reasoning process.
It involves different knowledge containers:
the factual knowledge contained in the primary traces,
the structural knowledge contained in the various trace-models,
and the inferential knowledge contained in the different transformation methods.
Moreover, that particular arrangement of transformations
(with the parameters of each transformed trace)
also carries some knowledge which can be either general
(|eg| ``SaveAtt`` obsels are not relevant for rewriting to an "e-mail summary",
so they can be filtered out)
or specific to a given context
(|eg| those two primary traces are related to each other,
so they should be merged).

.. index:: episode
   see: TBR; trace based reasoning
   double: elaboration; trace based reasoning
   double: retrieval; trace based reasoning
   double: reuse; trace based reasoning

We have proposed `[@CordierTrace2013]`
that trace based reasoning (TBR)
can be structured as an interactive cycle of three steps
-- inspired by the |CBR| cycle described by `@AamodtCase1994`.

.. _reflexivity:

* The *elaboration* step consists in setting up the transformation chain that is relevant to solve the problem at hand.
  This usually amounts to identifying in m-traces reusable *episodes*,
  |ie| sets of obsels that meet a number of criteria;
  those episodes will typically appear as aggregate obsels in a transformed trace
  (such as the ``Conversation`` and ``Broadcast`` obsels in `fig:transformations`:numref:).
  For the classes of problems anticipated by the |MTBS| designers,
  the appropriate transformations will be provided with the system.
  However,
  nothing prevents users to add their own transformation to answer unanticipated questions.
  Indeed, |MTMS|\s can handle multiple concurrent transformations of the same m-trace,
  in order to support multiple (and sometimes contradictory)
  interpretations of the primary traces.

* In the *retrieval* steps, the |MTMS| executes the transformations specified before.
  Depending on the kind of transformation,
  it can be submitted by the user to a number of constraints:
  on the number of episodes to retrieve,
  on the search algorithm to use,
  on the minimum certainty degree to apply...
  Note that, contrarily to the cases in traditional CBR,
  the episodes in TBR are never isolated as self-sufficient structures,
  but remain linked to the original obsels.
  They are always part of a "bigger picture",
  and their context of occurrence can always be tapped whenever their content itself is not enough to decide on the most relevant episode(s) to retrieve.

* The *reuse* step is when the retrieved episodes are effectively used
  (possibly with some adaptation) to solve the problem at hand.
  This can be done in various different ways,
  typically outside the |MTMS| itself
  (in the application modules of `fig:mtbs`:numref:).
  Whatever the functions of those application modules are,
  they are integrated in the user's traced activity,
  and therefore fed back to the |MTMS|.
  This is what closes the cycle,
  even if we do not have an explicit "retain" phase as in the |CBR| cycle.
  Indeed, `@OllagnierTraces2011` and `@TerratApports2015`
  have shown that even the simple fact of displaying the m-trace to the user can help improve their appropriation of the system,
  an effect called `reflexivity`:i:.
  Of course, knowledge extraction can also be the explicit goal of that step,
  either for an external analyst or as an advanced form of reflexivity.
  This has been studied by `@MathernIntensive2012` and `@BarazzuttiTransmute2015`.


.. _co-construction:

.. index:: !co-construction

The user is therefore at the center of the cycle,
being strongly involved in each step.
Knowledge is dynamically **co-constructed**,
the system sustaining pre-defined interpretations and providing reflexivity,
and the user assessing those interpretations in *context*,
and testing new ones when the former are not satisfactory.
As such, the system can be continuously adapting to changes.


Of course, to support this level of interactivity,
|MTBS|\s must be equipped with intelligible user interfaces,
both for presenting m-traces and for designing new transformation.
Studying such interfaces,
in order to determine which features make them efficient,
is still an open question and probably one of the key points in the future development of |MTBS|\s,
which we have started to investigate `[@BesnaciAcquisition2015;@KongNaturel2015]`.


.. _ktbs:

An open-source reference implementation
+++++++++++++++++++++++++++++++++++++++

The meta-model presented in this chapter is the result of many discussions and iterations.
Since 2009, we have been working on a reference implementation,
whose first goal was to help stabilize the meta-model
(as many problems only appear with a concrete use cases).
Its second goal was to ease and speed up the development of experiments aiming at validating and/or extending the meta-model.
This implementation is named kTBS (a kernel for trace based systems),
and is available at http://tbs-platform.org/ktbs\ .

kTBS is open-source,
in order to foster its reuse both in and outside our research group.
It is designed as a RESTful Web service `[@FieldingArchitecture2000]`,
in order to be easily integrated with other systems,
regardless of their own architecture or programming language\ [#rest-privacy]_.
Internally,
it stores all its data using the |RDF| data model `[@SchreiberPrimer2014]`,
which meets the requirements of flexibility of our meta-model.
|RDF| also comes with a powerful query language `[@HarrisSparql2013]`,
and expressive ontology languages `[@HitzlerOwl2009]`.
Externally,
kTBS exposes and consumes data in the JSON-LD format `[@SpornyJSON2014]`.
As stated above,
the next step is to provide kTBS with intuitive and intelligible user interfaces,
as |TBR| heavily relies on the user interacting with the |MTMS|.


.. rst-class:: conclusion

   The meta-model presented in this chapter,
   as well as the notion of Trace-Based Reasoning (TBR),
   are underlying a number of the works presented in the following chapters.
   Those will further demonstrate how this meta-model supports the user-centric co-construction of knowledge,
   taking into account the various contexts of use of that knowledge,
   and allowing multiple interpretations to coexist.


.. rubric:: Notes

.. [#exception] An exception is a case that the computer can not handle,
   such as a division by zero or an access to a non-existing memory address.
   When an exception is raised by a program, that program is suspended
   and an exception handler is started,
   provided with the context in which the exception occurred.
   The term "error" is sometimes used instead of "exception",
   but the very notion of error relates to an interpretation,
   even an appraisal.

.. [#google_analytics] For example, services such as Google Analytics offer
   tools to precisely analyze the visits on a web site:
   http://www.google.com/intl/en/analytics/

.. [#modern_traces] This assessment is based on legacy data, such as log files.
   Recent efforts proposing generic trace formats `[@MoreauProv2013;@SnellStreams2016]`
   build on semantic web technologies and linked data principles,
   and are much more similar to our proposal.

.. [#data-etymology] In this respect,
   let us point out how misleading the term "data" can be.
   It originally means "given",
   which gives it an aura of neutrality or objectivity.
   In fact the data we get are not so much given as they are *taken*
   (observed, measured, extracted, captured, selected...)
   and therefore never independent of the processes set up to obtain them.

.. [#unix-epoch] Note that time-stamps in most operating systems are represented that way:
   as a number of time units (usually the second)
   since a give origin or "epoch" (typically 1970-01-01 on UNIX systems).

.. [#converting-units] There is another consequence:
   while it might seem trivial to convert time-stamps from a fine-grained unit to a coarser-grained unit,
   it is actually not always possible when using an opaque origin.
   For example, converting days to months can not be done accurately if we don't have an absolute origin,
   as we do not know after how many days to change month.
   Less obviously, converting from hours (or minutes, or seconds)
   to days can not be done either: because of Daylight Saving Time,
   some days have 23 hours, and some have 25.

.. [#collector-model] Most collectors will be dedicated to one specific trace-model,
   with the constraints of that model hard-coded in them.
   However,
   one could imagine more generic collectors,
   able to inspect richly described trace-models in order to comply to them dynamically.
   Another perspective would be collectors able to dynamically edit trace-models,
   whenever they encounter a situation that the model can not represent.

.. [#anonymize] An efficient anonymization would actually require a more complex processing,
   as also the message bodies and attachments may contain information allowing to identify the persons involved.
   Still, those complex processes would still qualify as a rewriting transformation,
   according to our definition.

.. [#rest-privacy] The rationale of this choice is of course not to suggest that kTBS,
   or any |MTMS|, should in general be offshored to an external providers.
   Traces typically contain privacy-sensible information,
   and should obviously be kept in trustworthy locations.
   In a typical setup,
   the kTBS service would be deployed on the *same* server as the application.

.. rubric:: Chapter bibliography

.. bibliography::