Library reference

TODO: check that all docstrings are Sphinx-friendly

Package treefic

A library for converting specialized textual syntaxes to XML.

Mini-languages are simple ad-hoc languages designed for a specific task. An important feature of these languages is to be easy to read and write, so textual formats, usually described by BNF grammars, are more suitable than vervose XML formats.

Although XML formats are now a de facto standard for the structuration and exchange of digital data, they are usually verbose and not always easy to read or write by human users. In many situations, plain text is more suitable, usually structured according to a grammar expressed in a variant of the BNF. Those syntaxes, that we call specialized textual syntaxes, reduce the amount of markup markup in favour of the actual textual data.

Nevertheless, textual syntaxes are not incompatible with XML; as a matter of fact, they are usually parsed using a tree structure very similar to an XML tree.

The idea of this package is to provide the best to the two worlds: it allows to parse a text according to a specialized textual syntax, and produce an equivalent XML tree with it, so that it can further be used with standard XML tools (Xpath, Xquery, XSLT...).

exception treefic.GrammarParseError(context, index, arg=None, logger=None)
Specialized parse error that is raised when the grammar can not be parsed.
treefic.parse(grammar, data, meta_grammar=True)

I parse data according to grammar, and return an XML ElementTree.

grammar and data may be a file-like or unicode-like objects. In addition, grammar may be an XML ElementTree or a Grammar as returned by gparse.

If meta_grammar is not provided, grammar is assumed to comply with the treefic textual syntax. If set to a Grammar as returned by gparse, this grammar is used instead. If set to None, grammar is assumed to be formatted in the treefic XML syntax.

If grammar is an XML ElementTree or a Grammar, meta_grammar is ignored.

Module treefic.gparser

I parse XML-encoded context-free grammars.

See share/treefic-xml.{rnc|rng|xsd} for a schema of the XML format expected by this module.

treefic.gparser.parse(source)

Parse a file-like object or ElementTree and return a Grammar.

Pre-condition: the source is a valid w.r.t. the treefic XML syntax.

treefic.gparser.parse_pattern(element)

Parse an XML element representing a pattern and return an instance of the appropriate class.

Pre-condition: element is not a comment node.

treefic.gparser.parse_rule(element)
Parse an XML <treefic:rule> element and return a Rule.

Module treefic.grammar

I implement context-free grammars as a set of rules with decorators.

class treefic.grammar.Grammar(*rules, **kw)

I represend a context-free grammar.

parse(text)
Parse the given text and return a Node tree and a ParsingContext.
class treefic.grammar.NonTerm(value)

I implement a non-terminal pattern.

This is basically a unicode string with a pattern attribute set by the grammar to point to the Pattern corresponding to this non-terminal symbol.

iter_matches(context, start)
Abstract method implementation.
class treefic.grammar.Rule(head, *body, **kw)

I represent a rule in a context-free gammar.

Parameters:
  • head (anything coercible to :class:unicode) – the name of the rule
  • body (non-empty list of Pattern_) – interpreted as a sequence if its length is > 1
  • kwdecorators for the rule
exception treefic.grammar.UndefinedNonTerminal
The grammar uses an undefined non-terminal symbol.

Module treefic.parsing

I provide building blocks for the parsing of text according to contex-free grammars.

class treefic.parsing.Node(start, end, pattern, children=())
A node in the parse tree.
exception treefic.parsing.ParseError(context, index, arg=None, logger=None)

An exception raised during parsing.

It holds information about where the parsing error occurred (index, line, column) a message possibly an embeded exception (embeded).

Note that index is 0-based, while line and column are 1-based.

column
column in the data where the error occured
embeded
embeded exception, if any
index
index in the data where the error occured
line
line in the data where the error occured
message
message
class treefic.parsing.ParsingContext(grammar, text)

A parsing context holds a reference to the grammar, the text, and a cache previously computed matches.

It also allows provides utility methods:

  • get_coord converts text position to line/column coordinates
  • add_error and get_error are useful for diagnostic on failure
add_error(start, rule_name)

Register a failed rule at a given position.

See-also:get_error
get_error()

Return an error diagostic as (position,candidate).

  • position is the highest position that the parsing reached
  • candidate is a set containing the names of the rules that failed at that position.
See-also:add_error

Module treefic.patterns

I define elementary patterns of context-free grammars.

class treefic.patterns.Alternative(*args)

I implement an alternative pattern, resolving to any of its children.

iter_matches(context, start)
Abstract method implementation.
class treefic.patterns.Pattern

I implement the common behaviour of a pattern.

iter_matches(context, start)

Iter over all the possible matches at index start.

The empty match must be the last one to be yielded; this ensure that we can distinguish an unsafe recursion from a pseudo_recursion (see IDEAS).

This is an abstract method that subclasses must implement.

class treefic.patterns.Regexp(regexp, flags='')

I implement a regular expression pattern.

iter_matches(context, start)
Abstract method implementation.
class treefic.patterns.Repetition(pattern, cmin=1, cmax=-1)

I implement a repetition pattern, parameterized by the minimum and maximum number of repetition of the child pattern.

iter_matches(context, start)
Abstract method implementation
class treefic.patterns.Sequence(*args)

I implement a sequence of patterns.

iter_matches(context, start)
Abstract method implementation.
class treefic.patterns.Terminal

I implement a terminal pattern.

iter_matches(context, start)
Abstract method implementation.

Module treefic.xmlbuilder

I implement XML generation from a parsing tree of nodes.

treefic.xmlbuilder.build_xml(node, context, parent_elt)

Convert to XML the parsing tree rooted in this node, and root it to the given ElementTree parent_elt.

NB: note that this Node may not always result in an XML element: some nodes will be ignored, others will add text to the parent node, others will generate an attribute...

treefic.xmlbuilder.parse(grammar, text)

Parse the given text with the given grammar into an XML ElementTree.

text: a unicode grammar: an instance of Grammar

Table Of Contents

Previous topic

Grammar description

This Page