The dmt4sp prototype is a command line tool to extract episodes and episode rules under the minimal occurrence semantics as defined in [1], supporting various constraints, over a single sequence or several sequences of events.

Contact: Christophe Rigotti

Three kinds of patterns can be extracted

  • serial episodes
  • serial episode rules having a single event type in the consequent
  • quantitative episodes: grouping “homogeneous” occurrences of the serial episodes with respect to the time elapsed between the event types (patterns called quantitative episodes and defined in [2])

Support constraints (minimal occurrence semantics)

  • minimum number of occurrences
  • minimum number of sequences in which the pattern must occur

Temporal constraints

  • maximum window size
  • minimum gap
  • a kind of maximum gap (not the standard max gap constraint used for sequential patterns)

Syntactic constraints

  • minimum pattern length
  • maximum pattern length
  • last event type of the pattern (for episode rules this allows to set the event type of the consequent, to discard the other rules)
  • prefix of the pattern (with wildcard place holder)

Some other options

  • a threshold to discard event types that are too frequent
  • for rules: minimum confidence
  • for groups of occurrences (quantitative episodes): parameters to define the homogeneity of the groups and their minimum size
  • several input and output formats (the input can be a single long sequence or a set of sequences)
  • output of occurrence locations
  • and more ...
[1]Discovery of frequent episodes in event sequences. Mannila, H.,Toivonen, H. and Verkamo, A.I. DMKD Journal, volume 1, pp. 259-289, 1997.
[2]Extracting Trees of Quantitative Serial Episodes. M. Nanni and C. Rigotti. Knowledge Discovery in Inductive Databases 5th International Workshop KDID‘06 Revised Selected and Invited Papers. LNCS 4747, pp. 170-188, 2007.