dmt4sp

The dmt4sp prototype is a command line tool to extract episodes and episode rules under the minimal occurrence semantics as defined in 1, supporting various constraints, over a single sequence or several sequences of events.

Contact: Christophe Rigotti

Three kinds of patterns can be extracted

  • serial episodes

  • serial episode rules having a single event type in the consequent

  • quantitative episodes: grouping “homogeneous” occurrences of the serial episodes with respect to the time elapsed between the event types (patterns called quantitative episodes and defined in 2)

Support constraints (minimal occurrence semantics)

  • minimum number of occurrences

  • minimum number of sequences in which the pattern must occur

Temporal constraints

  • maximum window size

  • minimum gap

  • a kind of maximum gap (not the standard max gap constraint used for sequential patterns)

Syntactic constraints

  • minimum pattern length

  • maximum pattern length

  • last event type of the pattern (for episode rules this allows to set the event type of the consequent, to discard the other rules)

  • prefix of the pattern (with wildcard place holder)

Some other options

  • a threshold to discard event types that are too frequent

  • for rules: minimum confidence

  • for groups of occurrences (quantitative episodes): parameters to define the homogeneity of the groups and their minimum size

  • several input and output formats (the input can be a single long sequence or a set of sequences)

  • output of occurrence locations

  • and more …

1

Discovery of frequent episodes in event sequences. Mannila, H.,Toivonen, H. and Verkamo, A.I. DMKD Journal, volume 1, pp. 259-289, 1997.

2

Extracting Trees of Quantitative Serial Episodes. M. Nanni and C. Rigotti. Knowledge Discovery in Inductive Databases 5th International Workshop KDID‘06 Revised Selected and Invited Papers. LNCS 4747, pp. 170-188, 2007.