TODO: check that all docstrings are Sphinx-friendly
A library for converting specialized textual syntaxes to XML.
Mini-languages are simple ad-hoc languages designed for a specific task. An important feature of these languages is to be easy to read and write, so textual formats, usually described by BNF grammars, are more suitable than vervose XML formats.
Although XML formats are now a de facto standard for the structuration and exchange of digital data, they are usually verbose and not always easy to read or write by human users. In many situations, plain text is more suitable, usually structured according to a grammar expressed in a variant of the BNF. Those syntaxes, that we call specialized textual syntaxes, reduce the amount of markup markup in favour of the actual textual data.
Nevertheless, textual syntaxes are not incompatible with XML; as a matter of fact, they are usually parsed using a tree structure very similar to an XML tree.
The idea of this package is to provide the best to the two worlds: it allows to parse a text according to a specialized textual syntax, and produce an equivalent XML tree with it, so that it can further be used with standard XML tools (Xpath, Xquery, XSLT...).
I parse data according to grammar, and return an XML ElementTree.
grammar and data may be a file-like or unicode-like objects. In addition, grammar may be an XML ElementTree or a Grammar as returned by gparse.
If meta_grammar is not provided, grammar is assumed to comply with the treefic textual syntax. If set to a Grammar as returned by gparse, this grammar is used instead. If set to None, grammar is assumed to be formatted in the treefic XML syntax.
If grammar is an XML ElementTree or a Grammar, meta_grammar is ignored.
I parse XML-encoded context-free grammars.
See share/treefic-xml.{rnc|rng|xsd} for a schema of the XML format expected by this module.
Parse a file-like object or ElementTree and return a Grammar.
Pre-condition: the source is a valid w.r.t. the treefic XML syntax.
Parse an XML element representing a pattern and return an instance of the appropriate class.
Pre-condition: element is not a comment node.
I implement context-free grammars as a set of rules with decorators.
I represend a context-free grammar.
I implement a non-terminal pattern.
This is basically a unicode string with a pattern attribute set by the grammar to point to the Pattern corresponding to this non-terminal symbol.
I represent a rule in a context-free gammar.
Parameters: |
|
---|
I provide building blocks for the parsing of text according to contex-free grammars.
An exception raised during parsing.
It holds information about where the parsing error occurred (index, line, column) a message possibly an embeded exception (embeded).
Note that index is 0-based, while line and column are 1-based.
A parsing context holds a reference to the grammar, the text, and a cache previously computed matches.
It also allows provides utility methods:
Register a failed rule at a given position.
See-also: | get_error |
---|
Return an error diagostic as (position,candidate).
See-also: | add_error |
---|
I define elementary patterns of context-free grammars.
I implement an alternative pattern, resolving to any of its children.
I implement the common behaviour of a pattern.
Iter over all the possible matches at index start.
The empty match must be the last one to be yielded; this ensure that we can distinguish an unsafe recursion from a pseudo_recursion (see IDEAS).
This is an abstract method that subclasses must implement.
I implement a regular expression pattern.
I implement a repetition pattern, parameterized by the minimum and maximum number of repetition of the child pattern.
I implement XML generation from a parsing tree of nodes.
Convert to XML the parsing tree rooted in this node, and root it to the given ElementTree parent_elt.
NB: note that this Node may not always result in an XML element: some nodes will be ignored, others will add text to the parent node, others will generate an attribute...
Parse the given text with the given grammar into an XML ElementTree.
text: a unicode grammar: an instance of Grammar