Basic Features

The features needed to make the parser basically useful are:

  • Parser subclasses can define their own grammar using Ruby syntax as much as possible. E.g.,
    grammar {
      def_rule :foo, "production" do |match|
        do_something_with(match)
      end
    }
    
  • Parsers which inherit from a subclass of OOParser inherit their grammars with rule polymorphism. (E.g., HTMLParser < XMLParser < SGMLParser)
  • Parse failures should present human-readable errors which describe the failure in detail, including line number, what it was expecting to find, what it did find, etc.

Initial Production Items

Productions (currently) can contain three basic kinds of items:

Literals/Terminals
Matches exactly whatever is specified.
Example: /literal/xi or "literal" (which are equivalent)
Directives
Match using some other more-complex construct.
Only a subset of the planned production directives will be handled for the first release. Directives are of the form: <identifier>. The currently-implemented ones are:
Subrules
Matches using another rule in the grammar.
<subrule>
Pre-defined Subrules
Subrules that are pre-defined by OOParser
<WHITESPACE>, <CRLF>, <LT>
[perhaps pre-define all the HTML entities? Grab most of Perl6::Rules's predefined named rules at the very least.]
Functions
Functions change the behavior of the parser, define some custom matching behavior, or execute code in the middle of a production. There are several built-in functions, but you can define more by defining methods that end in _function in the parser class.
<&set_skip(//)>
Turns off whitespace-skipping
<&debug("Got here.")>
Logs a message at DEBUG level.
<&foo("bar")>
Call the #foo_function method in the current parser, passing the current ParseState object and the string "bar" to it and considering a non-nil, non-false return value as a successful match.
Quantifiers
By default, directives match exactly once. Quantifiers can be used to change this behavior.
<foo>+ (one or more), <foo>? (zero or one), <foo>* (zero or more)