4529 lines
186 KiB
Plaintext
4529 lines
186 KiB
Plaintext
==========
|
||
Change Log
|
||
==========
|
||
|
||
RELEASE PLANNING NOTES:
|
||
|
||
In the pyparsing release 3.3.0, use of many of the pre-PEP8 methods (such as
|
||
`ParserElement.parseString`) will start to raise `DeprecationWarnings`. I plan to
|
||
completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release
|
||
until some time in 2026. So there is plenty of time to convert existing parsers to
|
||
the new function names before the old functions are completely removed. (Big help from
|
||
Devin J. Pohly in structuring the code to enable this peaceful transition.)
|
||
|
||
===========================================================================================
|
||
The version 3.3.0 release will begin emitting `DeprecationWarnings` for pyparsing methods
|
||
that have been renamed to PEP8-compliant names (introduced in pyparsing 3.0.0, in August,
|
||
2021, with legacy names retained as aliases). In preparation, I have added in pyparsing
|
||
3.2.2 a utility for finding and replacing the legacy method names with the new names.
|
||
This utility is located at `pyparsing/tools/cvt_pep8_names.py`. This script will scan all
|
||
Python files specified on the command line, and if the `-u` option is selected, will
|
||
replace all occurrences of the old method names with the new PEP8-compliant names,
|
||
updating the files in place.
|
||
|
||
Here is an example that converts all the files in the pyparsing `/examples` directory:
|
||
|
||
python -m pyparsing.tools.cvt_pyparsing_pep8_names -u examples/*.py
|
||
|
||
The new names are compatible with pyparsing versions 3.0.0 and later.
|
||
===========================================================================================
|
||
|
||
|
||
Required Python versions by pyparsing version
|
||
---------------------------------------------
|
||
|
||
+--------------------------------------------------+-------------------+
|
||
| pyparsing version | Required Python |
|
||
+==================================================+===================+
|
||
| 3.2.0 - later | 3.9 or later |
|
||
| 3.0.8 - 3.1.4 | 3.6.8 or later |
|
||
| 3.0.0 - 3.0.7 (these versions are discouraged) | 3.6 or later |
|
||
| 2.4.7 | 2.7 or later |
|
||
| 1.5.7 | 2.6 - 2.7 |
|
||
+--------------------------------------------------+-------------------+
|
||
|
||
|
||
Version 3.2.3 - March, 2025
|
||
---------------------------
|
||
- Fixed bug released in 3.2.2 in which `nested_expr` could overwrite parse actions
|
||
for defined content, and could truncate list of items within a nested list.
|
||
Fixes Issue #600, reported by hoxbro and luisglft, with helpful diag logs and
|
||
repro code.
|
||
|
||
|
||
Version 3.2.2 - March, 2025
|
||
---------------------------
|
||
- Released `cvt_pyparsing_pep8_names.py` conversion utility to upgrade pyparsing-based
|
||
programs and libraries that use legacy camelCase names to use the new PEP8-compliant
|
||
snake_case method names. The converter can also be imported into other scripts as
|
||
|
||
from pyparsing.tools.cvt_pyparsing_pep8_names import pep8_converter
|
||
|
||
- Fixed bug in `nested_expr` where nested contents were stripped of whitespace when
|
||
the default whitespace characters were cleared (raised in this StackOverflow
|
||
question https://stackoverflow.com/questions/79327649 by Ben Alan). Also addressed
|
||
bug in resolving PEP8 compliant argument name and legacy argument name.
|
||
|
||
- Fixed bug in `rest_of_line` and the underlying `Regex` class, in which matching a
|
||
pattern that could match an empty string (such as `".*"` or `"[A-Z]*"` would not raise
|
||
a `ParseException` at or beyond the end of the input string. This could cause an
|
||
infinite parsing loop when parsing `rest_of_line` at the end of the input string.
|
||
Reported by user Kylotan, thanks! (Issue #593)
|
||
|
||
- Enhancements and extra input validation for `pyparsing.util.make_compressed_re` - see
|
||
usage in `examples/complex_chemical_formulas.py` and result in the generated railroad
|
||
diagram `examples/complex_chemical_formulas_diagram.html`. Properly escapes characters
|
||
like "." and "*" that have special meaning in regular expressions.
|
||
|
||
- Fixed bug in `one_of()` to properly escape characters that are regular expression markers
|
||
(such as '*', '+', '?', etc.) before building the internal regex.
|
||
|
||
- Better exception message for `MatchFirst` and `Or` expressions, showing all alternatives
|
||
rather than just the first one. Fixes Issue #592, reported by Focke, thanks!
|
||
|
||
- Added return type annotation of "-> None" for all `__init__()` methods, to satisfy
|
||
`mypy --strict` type checking. PR submitted by FeRD, thank you!
|
||
|
||
- Added optional argument `show_hidden` to `create_diagram` to show
|
||
elements that are used internally by pyparsing, but are not part of the actual
|
||
parser grammar. For instance, the `Tag` class can insert values into the parsed
|
||
results but it does not actually parse any input, so by default it is not included
|
||
in a railroad diagram. By calling `create_diagram` with `show_hidden` = `True`,
|
||
these internal elements will be included. (You can see this in the tag_metadata.py
|
||
script in the examples directory.)
|
||
|
||
- Fixed bug in `number_words.py` example. Also added `ebnf_number_words.py` to demonstrate
|
||
using the `ebnf.py` EBNF parser generator to build a similar parser directly from
|
||
EBNF.
|
||
|
||
- Fixed syntax warning raised in `bigquery_view_parser.py`, invalid escape sequence "\s".
|
||
Reported by sameer-google, nice catch! (Issue #598)
|
||
|
||
- Added support for Python 3.14.
|
||
|
||
|
||
Version 3.2.1 - December, 2024
|
||
------------------------------
|
||
- Updated generated railroad diagrams to make non-terminal elements links to their related
|
||
sub-diagrams. This _greatly_ improves navigation of the diagram, especially for
|
||
large, complex parsers.
|
||
|
||
- Simplified railroad diagrams emitted for parsers using `infix_notation`, by hiding
|
||
lookahead terms. Renamed internally generated expressions for clarity, and improved
|
||
diagramming.
|
||
|
||
- Improved performance of `cpp_style_comment`, `c_style_comment`, `common.fnumber`
|
||
and `common.ieee_float` Regex expressions. PRs submitted by Gabriel Gerlero,
|
||
nice work, thanks!
|
||
|
||
- Add missing type annotations to `match_only_at_col`, `replace_with`, `remove_quotes`,
|
||
`with_attribute`, and `with_class`. Issue #585 reported by rafrafrek.
|
||
|
||
- Added generated diagrams for many of the examples.
|
||
|
||
- Replaced old `examples/0README.html` file with `examples/README.md` file.
|
||
|
||
|
||
Version 3.2.0 - October, 2024
|
||
-------------------------------
|
||
- Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from
|
||
Python versions 3.7-3.9:
|
||
- Updated type annotations to use built-in container types instead of names
|
||
imported from the `typing` module (e.g., `list[str]` vs `List[str]`).
|
||
- Reworked portions of the packrat cache to leverage insertion-preserving ordering
|
||
in dicts (including removal of uses of `OrderedDict`).
|
||
- Changed `pdb.set_trace()` call in `ParserElement.set_break()` to `breakpoint()`.
|
||
- Converted `typing.NamedTuple` to `dataclasses.dataclass` in railroad diagramming
|
||
code.
|
||
- Added `from __future__ import annotations` to clean up some type annotations.
|
||
(with assistance from ISyncWithFoo, issue #535, thanks for the help!)
|
||
|
||
- POSSIBLE BREAKING CHANGES
|
||
|
||
The following bugfixes may result in subtle changes in the results returned or
|
||
exceptions raised by pyparsing.
|
||
|
||
- Fixed code in `ParseElementEnhance` subclasses that
|
||
replaced detailed exception messages raised in contained expressions with a
|
||
less-specific and less-informative generic exception message and location.
|
||
|
||
If your code has conditional logic based on the message content in raised
|
||
`ParseExceptions`, this bugfix may require changes in your code.
|
||
|
||
- Fixed bug in `transform_string()` where whitespace
|
||
in the input string was not properly preserved in the output string.
|
||
|
||
If your code uses `transform_string`, this bugfix may require changes in
|
||
your code.
|
||
|
||
- Fixed bug where an `IndexError` raised in a parse action was
|
||
incorrectly handled as an `IndexError` raised as part of the `ParserElement`
|
||
parsing methods, and reraised as a `ParseException`. Now an `IndexError`
|
||
that raises inside a parse action will properly propagate out as an `IndexError`.
|
||
(Issue #573, reported by August Karlstedt, thanks!)
|
||
|
||
If your code raises `IndexError`s in parse actions, this bugfix may require
|
||
changes in your code.
|
||
|
||
- FIXES AND NEW FEATURES
|
||
|
||
- Added type annotations to remainder of `pyparsing` package, and added `mypy`
|
||
run to `tox.ini`, so that type annotations are now run as part of pyparsing's CI.
|
||
Addresses Issue #373, raised by Iwan Aucamp, thanks!
|
||
|
||
- Exception message format can now be customized, by overriding
|
||
`ParseBaseException.format_message`:
|
||
|
||
def custom_exception_message(exc) -> str:
|
||
found_phrase = f", found {exc.found}" if exc.found else ""
|
||
return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"
|
||
|
||
ParseBaseException.formatted_message = custom_exception_message
|
||
|
||
(PR #571 submitted by Odysseyas Krystalakos, nice work!)
|
||
|
||
- `run_tests` now detects if an exception is raised in a parse action, and will
|
||
report it with an enhanced error message, with the exception type, string,
|
||
and parse action name.
|
||
|
||
- `QuotedString` now handles translation of escaped integer, hex, octal, and
|
||
Unicode sequences to their corresponding characters.
|
||
|
||
- Fixed the displayed output of `Regex` terms to deduplicate repeated backslashes,
|
||
for easier reading in debugging, printing, and railroad diagrams.
|
||
|
||
- Fixed (or at least reduced) elusive bug when generating railroad diagrams,
|
||
where some diagram elements were just empty blocks. Fix submitted by RoDuth,
|
||
thanks a ton!
|
||
|
||
- Fixed railroad diagrams that get generated with a parser containing a Regex element
|
||
defined using a verbose pattern - the pattern gets flattened and comments removed
|
||
before creating the corresponding diagram element.
|
||
|
||
- Defined a more performant regular expression used internally by `common_html_entity`.
|
||
|
||
- `Regex` instances can now be created using a callable that takes no arguments
|
||
and just returns a string or a compiled regular expression, so that creating complex
|
||
regular expression patterns can be deferred until they are actually used for the first
|
||
time in the parser.
|
||
|
||
- Added optional `flatten` Boolean argument to `ParseResults.as_list()`, to
|
||
return the parsed values in a flattened list.
|
||
|
||
- Added `indent` and `base_1` arguments to `pyparsing.testing.with_line_numbers`. When
|
||
using `with_line_numbers` inside a parse action, set `base_1`=False, since the
|
||
reported `loc` value is 0-based. `indent` can be a leading string (typically of
|
||
spaces or tabs) to indent the numbered string passed to `with_line_numbers`.
|
||
Added while working on #557, reported by Bernd Wechner.
|
||
|
||
- NEW/ENHANCED EXAMPLES
|
||
|
||
- Added query syntax to `mongodb_query_expression.py` with:
|
||
- better support for array fields ("contains all",
|
||
"contains any", and "contains none")
|
||
- "like" and "not like" operators to support SQL "%" wildcard matching
|
||
and "=~" operator to support regex matching
|
||
- text search using "search for"
|
||
- dates and datetimes as query values
|
||
- `a[0]` style array referencing
|
||
|
||
- Added `lox_parser.py` example, a parser for the Lox language used as a tutorial in
|
||
Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/).
|
||
With helpful corrections from RoDuth.
|
||
|
||
- Added `complex_chemical_formulas.py` example, to add parsing capability for
|
||
formulas such as "3(C₆H₅OH)₂".
|
||
|
||
- Updated `tag_emitter.py` to use new `Tag` class, introduced in pyparsing
|
||
3.1.3.
|
||
|
||
|
||
Version 3.1.4 - August, 2024
|
||
----------------------------
|
||
- Fixed a regression introduced in pyparsing 3.1.3, addition of a type annotation that
|
||
referenced `re.Pattern`. Since this type was introduced in Python 3.7, using this type
|
||
definition broke Python 3.6 installs of pyparsing 3.1.3. PR submitted by Felix Fontein,
|
||
nice work!
|
||
|
||
|
||
Version 3.1.3 - August, 2024
|
||
----------------------------
|
||
- Added new `Tag` ParserElement, for inserting metadata into the parsed results.
|
||
This allows a parser to add metadata or annotations to the parsed tokens.
|
||
The `Tag` element also accepts an optional `value` parameter, defaulting to `True`.
|
||
See the new `tag_metadata.py` example in the `examples` directory.
|
||
|
||
Example:
|
||
|
||
# add tag indicating mood
|
||
end_punc = "." | ("!" + Tag("enthusiastic")))
|
||
greeting = "Hello" + Word(alphas) + end_punc
|
||
|
||
result = greeting.parse_string("Hello World.")
|
||
print(result.dump())
|
||
|
||
result = greeting.parse_string("Hello World!")
|
||
print(result.dump())
|
||
|
||
prints:
|
||
|
||
['Hello', 'World', '.']
|
||
|
||
['Hello', 'World', '!']
|
||
- enthusiastic: True
|
||
|
||
- Added example `mongodb_query_expression.py`, to convert human-readable infix query
|
||
expressions (such as `a==100 and b>=200`) and transform them into the equivalent
|
||
query argument for the pymongo package (`{'$and': [{'a': 100}, {'b': {'$gte': 200}}]}`).
|
||
Supports many equality and inequality operators - see the docstring for the
|
||
`transform_query` function for more examples.
|
||
|
||
- Fixed issue where PEP8 compatibility names for `ParserElement` static methods were
|
||
not themselves defined as `staticmethods`. When called using a `ParserElement` instance,
|
||
this resulted in a `TypeError` exception. Reported by eylenburg (#548).
|
||
|
||
- To address a compatibility issue in RDFLib, added a property setter for the
|
||
`ParserElement.name` property, to call `ParserElement.set_name`.
|
||
|
||
- Modified `ParserElement.set_name()` to accept a None value, to clear the defined
|
||
name and corresponding error message for a `ParserElement`.
|
||
|
||
- Updated railroad diagram generation for `ZeroOrMore` and `OneOrMore` expressions with
|
||
`stop_on` expressions, while investigating #558, reported by user Gu_f.
|
||
|
||
- Added `<META>` tag to HTML generated for railroad diagrams to force UTF-8 encoding
|
||
with older browsers, to better display Unicode parser characters.
|
||
|
||
- Fixed some cosmetics/bugs in railroad diagrams:
|
||
- fixed groups being shown even when `show_groups`=False
|
||
- show results names as quoted strings when `show_results_names`=True
|
||
- only use integer loop counter if repetition > 2
|
||
|
||
- Some type annotations added for parse action related methods, thanks August
|
||
Karlstedt (#551).
|
||
|
||
- Added exception type to `trace_parse_action` exception output, while investigating
|
||
SO question posted by medihack.
|
||
|
||
- Added `set_name` calls to internal expressions generated in `infix_notation`, for
|
||
improved railroad diagramming.
|
||
|
||
- `delta_time`, `lua_parser`, `decaf_parser`, and `roman_numerals` examples cleaned up
|
||
to use latest PEP8 names and add minor enhancements.
|
||
|
||
- Fixed bug (and corresponding test code) in `delta_time` example that did not handle
|
||
weekday references in time expressions (like "Monday at 4pm") when the weekday was
|
||
the same as the current weekday.
|
||
|
||
- Minor performance speedup in `trim_arity`, to benefit any parsers using parse actions.
|
||
|
||
- Added early testing support for Python 3.13 with JIT enabled.
|
||
|
||
|
||
Version 3.1.2 - March, 2024
|
||
---------------------------
|
||
- Added `ieee_float` expression to `pyparsing.common`, which parses float values,
|
||
plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538).
|
||
|
||
- Updated pep8 synonym wrappers for better type checking compatibility. PR submitted
|
||
by Ricardo Coccioli (#507).
|
||
|
||
- Fixed empty error message bug, PR submitted by InSync (#534). This _should_ return
|
||
pyparsing's exception messages to a former, more helpful form. If you have code that
|
||
parses the exception messages returned by pyparsing, this may require some code
|
||
changes.
|
||
|
||
- Added unit tests to test for exception message contents, with enhancement to
|
||
`pyparsing.testing.assertRaisesParseException` to accept an expected exception message.
|
||
|
||
- Updated example `select_parser.py` to use PEP8 names and added Groups for better retrieval
|
||
of parsed values from multiple SELECT clauses.
|
||
|
||
- Added example `email_address_parser.py`, as suggested by John Byrd (#539).
|
||
|
||
- Added example `directx_x_file_parser.py` to parse DirectX template definitions, and
|
||
generate a Pyparsing parser from a template to parse .x files.
|
||
|
||
- Some code refactoring to reduce code nesting, PRs submitted by InSync.
|
||
|
||
- All internal string expressions using '%' string interpolation and `str.format()`
|
||
converted to f-strings.
|
||
|
||
|
||
Version 3.1.1 - July, 2023
|
||
--------------------------
|
||
- Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue #502)
|
||
|
||
- Fixed bug in bad exception messages raised by Forward expressions. PR submitted
|
||
by Kyle Sunden, thanks for your patience and collaboration on this (#493).
|
||
|
||
- Fixed regression in SkipTo, where ignored expressions were not checked when looking
|
||
for the target expression. Reported by catcombo, Issue #500.
|
||
|
||
- Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498)
|
||
|
||
- Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)
|
||
|
||
|
||
Version 3.1.0 - June, 2023
|
||
--------------------------
|
||
- Added `tag_emitter.py` to examples. This example demonstrates how to insert
|
||
tags into your parsed results that are not part of the original parsed text.
|
||
|
||
|
||
Version 3.1.0b2 - May, 2023
|
||
---------------------------
|
||
- Updated `create_diagram()` code to be compatible with railroad-diagrams package
|
||
version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars),
|
||
reported by Sam Morley-Short.
|
||
|
||
- Fixed bug in `NotAny`, where parse actions on the negated expr were not being run.
|
||
This could cause `NotAny` to incorrectly fail if the expr would normally match,
|
||
but would fail to match if a condition used as a parse action returned False.
|
||
Fixes Issue #482, raised by byaka, thank you!
|
||
|
||
- Fixed `create_diagram()` to accept keyword args, to be passed through to the
|
||
`template.render()` method to generate the output HTML (PR submitted by Aussie Schnore,
|
||
good catch!)
|
||
|
||
- Fixed bug in `python_quoted_string` regex.
|
||
|
||
- Added `examples/bf.py` Brainf*ck parser/executor example. Illustrates using
|
||
a pyparsing grammar to parse language syntax, and attach executable AST nodes to
|
||
the parsed results.
|
||
|
||
|
||
Version 3.1.0b1 - April, 2023
|
||
-----------------------------
|
||
- Added support for Python 3.12.
|
||
|
||
- API CHANGE: A slight change has been implemented when unquoting a quoted string
|
||
parsed using the `QuotedString` class. Formerly, when unquoting and processing
|
||
whitespace markers such as \t and \n, these substitutions would occur first, and
|
||
then any additional '\' escaping would be done on the resulting string. This would
|
||
parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed
|
||
in a single pass working left to right, so the quoted string "\\n" would get unquoted
|
||
to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq,
|
||
thanks!
|
||
|
||
- Added named field "url" to `pyparsing.common.url`, returning the entire
|
||
parsed URL string.
|
||
|
||
- Fixed bug when parse actions returned an empty string for an expression that
|
||
had a results name, that the results name was not saved. That is:
|
||
|
||
expr = Literal("X").add_parse_action(lambda tokens: "")("value")
|
||
result = expr.parse_string("X")
|
||
print(result["value"])
|
||
|
||
would raise a `KeyError`. Now empty strings will be saved with the associated
|
||
results name. Raised in Issue #470 by Nicco Kunzmann, thank you.
|
||
|
||
- Fixed bug in `SkipTo` where ignore expressions were not properly handled while
|
||
scanning for the target expression. Issue #475, reported by elkniwt, thanks
|
||
(this bug has been there for a looooong time!).
|
||
|
||
- Updated `ci.yml` permissions to limit default access to source - submitted by Joyce
|
||
Brum of Google. Thanks so much!
|
||
|
||
- Updated the `lucene_grammar.py` example (better support for '*' and '?' wildcards)
|
||
and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
|
||
|
||
|
||
Version 3.1.0a1 - March, 2023
|
||
-----------------------------
|
||
- API ENHANCEMENT: `Optional(expr)` may now be written as `expr | ""`
|
||
|
||
This will make this code:
|
||
|
||
"{" + Optional(Literal("A") | Literal("a")) + "}"
|
||
|
||
writable as:
|
||
|
||
"{" + (Literal("A") | Literal("a") | "") + "}"
|
||
|
||
Some related changes implemented as part of this work:
|
||
- `Literal("")` now internally generates an `Empty()` (and no longer raises an exception)
|
||
- `Empty` is now a subclass of `Literal`
|
||
|
||
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
|
||
|
||
- Added new class property `identifier` to all Unicode set classes in `pyparsing.unicode`,
|
||
using the class's values for `cls.identchars` and `cls.identbodychars`. Now Unicode-aware
|
||
parsers that formerly wrote:
|
||
|
||
ppu = pyparsing.unicode
|
||
ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
|
||
|
||
can now write:
|
||
|
||
ident = ppu.Greek.identifier
|
||
# or
|
||
# ident = ppu.Ελληνικά.identifier
|
||
|
||
- `ParseResults` now has a new method `deepcopy()`, in addition to the current
|
||
`copy()` method. `copy()` only makes a shallow copy - any contained `ParseResults`
|
||
are copied as references - changes in the copy will be seen as changes in the original.
|
||
In many cases, a shallow copy is sufficient, but some applications require a deep copy.
|
||
`deepcopy()` makes a deeper copy: any contained `ParseResults` or other mappings or
|
||
containers are built with copies from the original, and do not get changed if the
|
||
original is later changed. Addresses issue #463, reported by Bryn Pickering.
|
||
|
||
- Reworked `delimited_list` function into the new `DelimitedList` class.
|
||
`DelimitedList` has the same constructor interface as `delimited_list`, and
|
||
in this release, `delimited_list` changes from a function to a synonym for
|
||
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be
|
||
deprecated in a future release, in favor of `DelimitedList`.
|
||
|
||
- Error messages from `MatchFirst` and `Or` expressions will try to give more details
|
||
if one of the alternatives matches better than the others, but still fails.
|
||
Question raised in Issue #464 by msdemlei, thanks!
|
||
|
||
- Added new class method `ParserElement.using_each`, to simplify code
|
||
that creates a sequence of `Literals`, `Keywords`, or other `ParserElement`
|
||
subclasses.
|
||
|
||
For instance, to define suppressible punctuation, you would previously
|
||
write:
|
||
|
||
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
|
||
|
||
You can now write:
|
||
|
||
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
|
||
|
||
`using_each` will also accept optional keyword args, which it will
|
||
pass through to the class initializer. Here is an expression for
|
||
single-letter variable names that might be used in an algebraic
|
||
expression:
|
||
|
||
algebra_var = MatchFirst(
|
||
Char.using_each(string.ascii_lowercase, as_keyword=True)
|
||
)
|
||
|
||
- Added new builtin `python_quoted_string`, which will match any form
|
||
of single-line or multiline quoted strings defined in Python. (Inspired
|
||
by discussion with Andreas Schörgenhumer in Issue #421.)
|
||
|
||
- Extended `expr[]` notation for repetition of `expr` to accept a
|
||
slice, where the slice's stop value indicates a `stop_on`
|
||
expression:
|
||
|
||
test = "BEGIN aaa bbb ccc END"
|
||
BEGIN, END = Keyword.using_each("BEGIN END".split())
|
||
body_word = Word(alphas)
|
||
|
||
expr = BEGIN + Group(body_word[...:END]) + END
|
||
# equivalent to
|
||
# expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
|
||
|
||
print(expr.parse_string(test))
|
||
|
||
Prints:
|
||
|
||
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
|
||
|
||
- `ParserElement.validate()` is deprecated. It predates the support for left-recursive
|
||
parsers, and was prone to false positives (warning that a grammar was invalid when
|
||
it was in fact valid). It will be removed in a future pyparsing release. In its
|
||
place, developers should use debugging and analytical tools, such as `ParserElement.set_debug()`
|
||
and `ParserElement.create_diagram()`.
|
||
(Raised in Issue #444, thanks Andrea Micheli!)
|
||
|
||
- Added bool `embed` argument to `ParserElement.create_diagram()`.
|
||
When passed as True, the resulting diagram will omit the `<DOCTYPE>`,
|
||
`<HEAD>`, and `<BODY>` tags so that it can be embedded in other
|
||
HTML source. (Useful when embedding a call to `create_diagram()` in
|
||
a PyScript HTML page.)
|
||
|
||
- Added `recurse` argument to `ParserElement.set_debug` to set the
|
||
debug flag on an expression and all of its sub-expressions. Requested
|
||
by multimeric in Issue #399.
|
||
|
||
- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.
|
||
|
||
- Fixed bug in `Word` when `max=2`. Also added performance enhancement
|
||
when specifying `exact` argument. Reported in issue #409 by
|
||
panda-34, nice catch!
|
||
|
||
- `Word` arguments are now validated if `min` and `max` are both
|
||
given, that `min` <= `max`; raises `ValueError` if values are invalid.
|
||
|
||
- Fixed bug in srange, when parsing escaped '/' and '\' inside a
|
||
range set.
|
||
|
||
- Fixed exception messages for some `ParserElements` with custom names,
|
||
which instead showed their contained expression names.
|
||
|
||
- Fixed bug in pyparsing.common.url, when input URL is not alone
|
||
on an input line. Fixes Issue #459, reported by David Kennedy.
|
||
|
||
- Multiple added and corrected type annotations. With much help from
|
||
Stephen Rosen, thanks!
|
||
|
||
- Some documentation and error message clarifications on pyparsing's
|
||
keyword logic, cited by Basil Peace.
|
||
|
||
- General docstring cleanup for Sphinx doc generation, PRs submitted
|
||
by Devin J. Pohly. A dirty job, but someone has to do it - much
|
||
appreciated!
|
||
|
||
- `invRegex.py` example renamed to `inv_regex.py` and updated to PEP-8
|
||
variable and method naming. PR submitted by Ross J. Duff, thanks!
|
||
|
||
- Removed examples `sparser.py` and `pymicko.py`, since each included its
|
||
own GPL license in the header. Since this conflicts with pyparsing's
|
||
MIT license, they were removed from the distribution to avoid
|
||
confusion among those making use of them in their own projects.
|
||
|
||
|
||
Version 3.0.9 - May, 2022
|
||
-------------------------
|
||
- Added Unicode set `BasicMultilingualPlane` (may also be referenced
|
||
as `BMP`) representing the Basic Multilingual Plane (Unicode
|
||
characters up to code point 65535). Can be used to parse
|
||
most language characters, but omits emojis, wingdings, etc.
|
||
Raised in discussion with Dave Tapley (issue #392).
|
||
|
||
- To address mypy confusion of `pyparsing.Optional` and `typing.Optional`
|
||
resulting in `error: "_SpecialForm" not callable` message
|
||
reported in issue #365, fixed the import in `exceptions.py`. Nice
|
||
sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you!
|
||
(Removed definitions of `OptionalType`, `DictType`, and `IterableType`
|
||
and replaced them with `typing.Optional`, `typing.Dict`, and
|
||
`typing.Iterable` throughout.)
|
||
|
||
- Fixed typo in jinja2 template for railroad diagrams, thanks for the
|
||
catch Nioub (issue #388).
|
||
|
||
- Removed use of deprecated `pkg_resources` package in
|
||
railroad diagramming code (issue #391).
|
||
|
||
- Updated `bigquery_view_parser.py` example to parse examples at
|
||
https://cloud.google.com/bigquery/docs/reference/legacy-sql
|
||
|
||
|
||
Version 3.0.8 - April, 2022
|
||
---------------------------
|
||
- API CHANGE: modified `pyproject.toml` to require Python version
|
||
3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6
|
||
fail in evaluating the `version_info` class (implemented using
|
||
`typing.NamedTuple`). If you are using an earlier version of Python
|
||
3.6, you will need to use pyparsing 2.4.7.
|
||
|
||
- Improved pyparsing import time by deferring regex pattern compiles.
|
||
PR submitted by Anthony Sottile to fix issue #362, thanks!
|
||
|
||
- Updated build to use flit, PR by Michał Górny, added `BUILDING.md`
|
||
doc and removed old Windows build scripts - nice cleanup work!
|
||
|
||
- More type-hinting added for all arithmetic and logical operator
|
||
methods in `ParserElement`. PR from Kazantcev Andrey, thank you.
|
||
|
||
- Fixed `infix_notation`'s definitions of `lpar` and `rpar`, to accept
|
||
parse expressions such that they do not get suppressed in the parsed
|
||
results. PR submitted by Philippe Prados, nice work.
|
||
|
||
- Fixed bug in railroad diagramming with expressions containing `Combine`
|
||
elements. Reported by Jeremy White, thanks!
|
||
|
||
- Added `show_groups` argument to `create_diagram` to highlight grouped
|
||
elements with an unlabeled bounding box.
|
||
|
||
- Added `unicode_denormalizer.py` to the examples as a demonstration
|
||
of how Python's interpreter will accept Unicode characters in
|
||
identifiers, but normalizes them back to ASCII so that identifiers
|
||
`print` and `𝕡𝓻ᵢ𝓃𝘁` and `𝖕𝒓𝗂𝑛ᵗ` are all equivalent.
|
||
|
||
- Removed imports of deprecated `sre_constants` module for catching
|
||
exceptions when compiling regular expressions. PR submitted by
|
||
Serhiy Storchaka, thank you.
|
||
|
||
|
||
Version 3.0.7 - January, 2022
|
||
-----------------------------
|
||
- Fixed bug #345, in which delimitedList changed expressions in place
|
||
using `expr.streamline()`. Reported by Kim Gräsman, thanks!
|
||
|
||
- Fixed bug #346, when a string of word characters was passed to WordStart
|
||
or `WordEnd` instead of just taking the default value. Originally posted
|
||
as a question by Parag on StackOverflow, good catch!
|
||
|
||
- Fixed bug #350, in which `White` expressions could fail to match due to
|
||
unintended whitespace-skipping. Reported by Fu Hanxi, thank you!
|
||
|
||
- Fixed bug #355, when a `QuotedString` is defined with characters in its
|
||
quoteChar string containing regex-significant characters such as ., *,
|
||
?, [, ], etc.
|
||
|
||
- Fixed bug in `ParserElement.run_tests` where comments would be displayed
|
||
using `with_line_numbers`.
|
||
|
||
- Added optional "min" and "max" arguments to `delimited_list`. PR
|
||
submitted by Marius, thanks!
|
||
|
||
- Added new API change note in `whats_new_in_pyparsing_3_0_0`, regarding
|
||
a bug fix in the `bool()` behavior of `ParseResults`.
|
||
|
||
Prior to pyparsing 3.0.x, the `ParseResults` class implementation of
|
||
`__bool__` would return `False` if the `ParseResults` item list was empty,
|
||
even if it contained named results. In 3.0.0 and later, `ParseResults` will
|
||
return `True` if either the item list is not empty *or* if the named
|
||
results dict is not empty.
|
||
|
||
# generate an empty ParseResults by parsing a blank string with
|
||
# a ZeroOrMore
|
||
result = Word(alphas)[...].parse_string("")
|
||
print(result.as_list())
|
||
print(result.as_dict())
|
||
print(bool(result))
|
||
|
||
# add a results name to the result
|
||
result["name"] = "empty result"
|
||
print(result.as_list())
|
||
print(result.as_dict())
|
||
print(bool(result))
|
||
|
||
Prints:
|
||
|
||
[]
|
||
{}
|
||
False
|
||
|
||
[]
|
||
{'name': 'empty result'}
|
||
True
|
||
|
||
In previous versions, the second call to `bool()` would return `False`.
|
||
|
||
- Minor enhancement to Word generation of internal regular expression, to
|
||
emit consecutive characters in range, such as "ab", as "ab", not "a-b".
|
||
|
||
- Fixed character ranges for search terms using non-Western characters
|
||
in booleansearchparser, PR submitted by tc-yu, nice work!
|
||
|
||
- Additional type annotations on public methods.
|
||
|
||
|
||
Version 3.0.6 - November, 2021
|
||
------------------------------
|
||
- Added `suppress_warning()` method to individually suppress a warning on a
|
||
specific ParserElement. Used to refactor `original_text_for` to preserve
|
||
internal results names, which, while undocumented, had been adopted by
|
||
some projects.
|
||
|
||
- Fix bug when `delimited_list` was called with a str literal instead of a
|
||
parse expression.
|
||
|
||
|
||
Version 3.0.5 - November, 2021
|
||
------------------------------
|
||
- Added return type annotations for `col`, `line`, and `lineno`.
|
||
|
||
- Fixed bug when `warn_ungrouped_named_tokens_in_collection` warning was raised
|
||
when assigning a results name to an `original_text_for` expression.
|
||
(Issue #110, would raise warning in packaging.)
|
||
|
||
- Fixed internal bug where `ParserElement.streamline()` would not return self if
|
||
already streamlined.
|
||
|
||
- Changed `run_tests()` output to default to not showing line and column numbers.
|
||
If line numbering is desired, call with `with_line_numbers=True`. Also fixed
|
||
minor bug where separating line was not included after a test failure.
|
||
|
||
|
||
Version 3.0.4 - October, 2021
|
||
-----------------------------
|
||
- Fixed bug in which `Dict` classes did not correctly return tokens as nested
|
||
`ParseResults`, reported by and fix identified by Bu Sun Kim, many thanks!!!
|
||
|
||
- Documented API-changing side-effect of converting `ParseResults` to use `__slots__`
|
||
to pre-define instance attributes. This means that code written like this (which
|
||
was allowed in pyparsing 2.4.7):
|
||
|
||
result = Word(alphas).parseString("abc")
|
||
result.xyz = 100
|
||
|
||
now raises this Python exception:
|
||
|
||
AttributeError: 'ParseResults' object has no attribute 'xyz'
|
||
|
||
To add new attribute values to ParseResults object in 3.0.0 and later, you must
|
||
assign them using indexed notation:
|
||
|
||
result["xyz"] = 100
|
||
|
||
You will still be able to access this new value as an attribute or as an
|
||
indexed item.
|
||
|
||
- Fixed bug in railroad diagramming where the vertical limit would count all
|
||
expressions in a group, not just those that would create visible railroad
|
||
elements.
|
||
|
||
|
||
Version 3.0.3 - October, 2021
|
||
-----------------------------
|
||
- Fixed regex typo in `one_of` fix for `as_keyword=True`.
|
||
|
||
- Fixed a whitespace-skipping bug, Issue #319, introduced as part of the revert
|
||
of the `LineStart` changes. Reported by Marc-Alexandre Côté,
|
||
thanks!
|
||
|
||
- Added header column labeling > 100 in `with_line_numbers` - some input lines
|
||
are longer than others.
|
||
|
||
|
||
Version 3.0.2 - October, 2021
|
||
-----------------------------
|
||
- Reverted change in behavior with `LineStart` and `StringStart`, which changed the
|
||
interpretation of when and how `LineStart` and `StringStart` should match when
|
||
a line starts with spaces. In 3.0.0, the `xxxStart` expressions were not
|
||
really treated like expressions in their own right, but as modifiers to the
|
||
following expression when used like `LineStart() + expr`, so that if there
|
||
were whitespace on the line before `expr` (which would match in versions prior
|
||
to 3.0.0), the match would fail.
|
||
|
||
3.0.0 implemented this by automatically promoting `LineStart() + expr` to
|
||
`AtLineStart(expr)`, which broke existing parsers that did not expect `expr` to
|
||
necessarily be right at the start of the line, but only be the first token
|
||
found on the line. This was reported as a regression in Issue #317.
|
||
|
||
In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new
|
||
`AtLineStart` and `AtStringStart` expression classes, so that parsers can chose
|
||
whichever behavior applies in their specific instance. Specifically:
|
||
|
||
# matches expr if it is the first token on the line
|
||
# (allows for leading whitespace)
|
||
LineStart() + expr
|
||
|
||
# matches only if expr is found in column 1
|
||
AtLineStart(expr)
|
||
|
||
- Performance enhancement to `one_of` to always generate an internal `Regex`,
|
||
even if `caseless` or `as_keyword` args are given as `True` (unless explicitly
|
||
disabled by passing `use_regex=False`).
|
||
|
||
- `IndentedBlock` class now works with `recursive` flag. By default, the
|
||
results parsed by an `IndentedBlock` are grouped. This can be disabled by constructing
|
||
the `IndentedBlock` with `grouped=False`.
|
||
|
||
|
||
Version 3.0.1 - October, 2021
|
||
-----------------------------
|
||
- Fixed bug where `Word(max=n)` did not match word groups less than length 'n'.
|
||
Thanks to Joachim Metz for catching this!
|
||
|
||
- Fixed bug where `ParseResults` accidentally created recursive contents.
|
||
Joachim Metz on this one also!
|
||
|
||
- Fixed bug where `warn_on_multiple_string_args_to_oneof` warning is raised
|
||
even when not enabled.
|
||
|
||
|
||
Version 3.0.0 - October, 2021
|
||
-----------------------------
|
||
- A consolidated list of all the changes in the 3.0.0 release can be found in
|
||
`docs/whats_new_in_3_0_0.rst`.
|
||
(https://github.com/pyparsing/pyparsing/blob/master/docs/whats_new_in_3_0_0.rst)
|
||
|
||
|
||
Version 3.0.0.final - October, 2021
|
||
-----------------------------------
|
||
- Added support for python `-W` warning option to call `enable_all_warnings`() at startup.
|
||
Also detects setting of `PYPARSINGENABLEALLWARNINGS` environment variable to any non-blank
|
||
value. (If using `-Wd` for testing, but wishing to disable pyparsing warnings, add
|
||
`-Wi:::pyparsing`.)
|
||
|
||
- Fixed named results returned by `url` to match fields as they would be parsed
|
||
using `urllib.parse.urlparse`.
|
||
|
||
- Early response to `with_line_numbers` was positive, with some requested enhancements:
|
||
. added a trailing "|" at the end of each line (to show presence of trailing spaces);
|
||
can be customized using `eol_mark` argument
|
||
. added expand_tabs argument, to control calling str.expandtabs (defaults to True
|
||
to match `parseString`)
|
||
. added mark_spaces argument to support display of a printing character in place of
|
||
spaces, or Unicode symbols for space and tab characters
|
||
. added mark_control argument to support highlighting of control characters using
|
||
'.' or Unicode symbols, such as "␍" and "␊".
|
||
|
||
- Modified helpers `common_html_entity` and `replace_html_entity()` to use the HTML
|
||
entity definitions from `html.entities.html5`.
|
||
|
||
- Updated the class diagram in the pyparsing docs directory, along with the supporting
|
||
.puml file (PlantUML markup) used to create the diagram.
|
||
|
||
- Added global method `autoname_elements()` to call `set_name()` on all locally
|
||
defined `ParserElements` that haven't been explicitly named using `set_name()`, using
|
||
their local variable name. Useful for setting names on multiple elements when
|
||
creating a railroad diagram.
|
||
|
||
a = pp.Literal("a")
|
||
b = pp.Literal("b").set_name("bbb")
|
||
pp.autoname_elements()
|
||
|
||
`a` will get named "a", while `b` will keep its name "bbb".
|
||
|
||
|
||
Version 3.0.0rc2 - October, 2021
|
||
--------------------------------
|
||
- Added `url` expression to `pyparsing_common`. (Sample code posted by Wolfgang Fahl,
|
||
very nice!)
|
||
|
||
This new expression has been added to the `urlExtractorNew.py` example, to show how
|
||
it extracts URL fields into separate results names.
|
||
|
||
- Added method to `pyparsing_test` to help debugging, `with_line_numbers`.
|
||
Returns a string with line and column numbers corresponding to values shown
|
||
when parsing with expr.set_debug():
|
||
|
||
data = """\
|
||
A
|
||
100"""
|
||
expr = pp.Word(pp.alphanums).set_name("word").set_debug()
|
||
print(ppt.with_line_numbers(data))
|
||
expr[...].parseString(data)
|
||
|
||
prints:
|
||
|
||
1
|
||
1234567890
|
||
1: A
|
||
2: 100
|
||
Match word at loc 3(1,4)
|
||
A
|
||
^
|
||
Matched word -> ['A']
|
||
Match word at loc 11(2,7)
|
||
100
|
||
^
|
||
Matched word -> ['100']
|
||
|
||
- Added new example `cuneiform_python.py` to demonstrate creating a new Unicode
|
||
range, and writing a Cuneiform->Python transformer (inspired by zhpy).
|
||
|
||
- Fixed issue #272, reported by PhasecoreX, when `LineStart`() expressions would match
|
||
input text that was not necessarily at the beginning of a line.
|
||
|
||
As part of this fix, two new classes have been added: AtLineStart and AtStringStart.
|
||
The following expressions are equivalent:
|
||
|
||
LineStart() + expr and AtLineStart(expr)
|
||
StringStart() + expr and AtStringStart(expr)
|
||
|
||
[`LineStart` and `StringStart` changes reverted in 3.0.2.]
|
||
|
||
- Fixed `ParseFatalExceptions` failing to override normal exceptions or expression
|
||
matches in `MatchFirst` expressions. Addresses issue #251, reported by zyp-rgb.
|
||
|
||
- Fixed bug in which `ParseResults` replaces a collection type value with an invalid
|
||
type annotation (as a result of changed behavior in Python 3.9). Addresses issue #276, reported by
|
||
Rob Shuler, thanks.
|
||
|
||
- Fixed bug in `ParseResults` when calling `__getattr__` for special double-underscored
|
||
methods. Now raises `AttributeError` for non-existent results when accessing a
|
||
name starting with '__'. Addresses issue #208, reported by Joachim Metz.
|
||
|
||
- Modified debug fail messages to include the expression name to make it easier to sync
|
||
up match vs success/fail debug messages.
|
||
|
||
|
||
Version 3.0.0rc1 - September, 2021
|
||
----------------------------------
|
||
- Railroad diagrams have been reformatted:
|
||
. creating diagrams is easier - call
|
||
|
||
expr.create_diagram("diagram_output.html")
|
||
|
||
create_diagram() takes 3 arguments:
|
||
. the filename to write the diagram HTML
|
||
. optional 'vertical' argument, to specify the minimum number of items in a path
|
||
to be shown vertically; default=3
|
||
. optional 'show_results_names' argument, to specify whether results name
|
||
annotations should be shown; default=False
|
||
. every expression that gets a name using `setName()` gets separated out as
|
||
a separate subdiagram
|
||
. results names can be shown as annotations to diagram items
|
||
. `Each`, `FollowedBy`, and `PrecededBy` elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND]
|
||
annotations
|
||
. removed annotations for Suppress elements
|
||
. some diagram cleanup when a grammar contains Forward elements
|
||
. check out the examples make_diagram.py and railroad_diagram_demo.py
|
||
|
||
- Type annotations have been added to most public API methods and classes.
|
||
|
||
- Better exception messages to show full word where an exception occurred.
|
||
|
||
Word(alphas, alphanums)[...].parseString("ab1 123", parseAll=True)
|
||
|
||
Was:
|
||
pyparsing.ParseException: Expected end of text, found '1' (at char 4), (line:1, col:5)
|
||
Now:
|
||
pyparsing.exceptions.ParseException: Expected end of text, found '123' (at char 4), (line:1, col:5)
|
||
|
||
- Suppress can be used to suppress text skipped using "...".
|
||
|
||
source = "lead in START relevant text END trailing text"
|
||
start_marker = Keyword("START")
|
||
end_marker = Keyword("END")
|
||
find_body = Suppress(...) + start_marker + ... + end_marker
|
||
print(find_body.parseString(source).dump())
|
||
|
||
Prints:
|
||
|
||
['START', 'relevant text ', 'END']
|
||
- _skipped: ['relevant text ']
|
||
|
||
- New string constants `identchars` and `identbodychars` to help in defining identifier Word expressions
|
||
|
||
Two new module-level strings have been added to help when defining identifiers, `identchars` and `identbodychars`.
|
||
|
||
Instead of writing::
|
||
|
||
import pyparsing as pp
|
||
identifier = pp.Word(pp.alphas + "_", pp.alphanums + "_")
|
||
|
||
you will be able to write::
|
||
|
||
identifier = pp.Word(pp.identchars, pp.identbodychars)
|
||
|
||
Those constants have also been added to all the Unicode string classes::
|
||
|
||
import pyparsing as pp
|
||
ppu = pp.pyparsing_unicode
|
||
|
||
cjk_identifier = pp.Word(ppu.CJK.identchars, ppu.CJK.identbodychars)
|
||
greek_identifier = pp.Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
|
||
|
||
- Added a caseless parameter to the `CloseMatch` class to allow for casing to be
|
||
ignored when checking for close matches. (Issue #281) (PR by Adrian Edwards, thanks!)
|
||
|
||
- Fixed bug in Located class when used with a results name. (Issue #294)
|
||
|
||
- Fixed bug in `QuotedString` class when the escaped quote string is not a
|
||
repeated character. (Issue #263)
|
||
|
||
- `parseFile()` and `create_diagram()` methods now will accept `pathlib.Path`
|
||
arguments.
|
||
|
||
|
||
Version 3.0.0b3 - August, 2021
|
||
------------------------------
|
||
- PEP-8 compatible names are being introduced in pyparsing version 3.0!
|
||
All methods such as `parseString` have been replaced with the PEP-8
|
||
compliant name `parse_string`. In addition, arguments such as `parseAll`
|
||
have been renamed to `parse_all`. For backward-compatibility, synonyms for
|
||
all renamed methods and arguments have been added, so that existing
|
||
pyparsing parsers will not break. These synonyms will be removed in a future
|
||
release.
|
||
|
||
In addition, the Optional class has been renamed to Opt, since it clashes
|
||
with the common typing.Optional type specifier that is used in the Python
|
||
type annotations. A compatibility synonym is defined for now, but will be
|
||
removed in a future release.
|
||
|
||
- HUGE NEW FEATURE - Support for left-recursive parsers!
|
||
Following the method used in Python's PEG parser, pyparsing now supports
|
||
left-recursive parsers when left recursion is enabled.
|
||
|
||
import pyparsing as pp
|
||
pp.ParserElement.enable_left_recursion()
|
||
|
||
# a common left-recursion definition
|
||
# define a list of items as 'list + item | item'
|
||
# BNF:
|
||
# item_list := item_list item | item
|
||
# item := word of alphas
|
||
item_list = pp.Forward()
|
||
item = pp.Word(pp.alphas)
|
||
item_list <<= item_list + item | item
|
||
|
||
item_list.run_tests("""\
|
||
To parse or not to parse that is the question
|
||
""")
|
||
Prints:
|
||
|
||
['To', 'parse', 'or', 'not', 'to', 'parse', 'that', 'is', 'the', 'question']
|
||
|
||
Great work contributed by Max Fischer!
|
||
|
||
- `delimited_list` now supports an additional flag `allow_trailing_delim`,
|
||
to optionally parse an additional delimiter at the end of the list.
|
||
Contributed by Kazantcev Andrey, thanks!
|
||
|
||
- Removed internal comparison of results values against b"", which
|
||
raised a `BytesWarning` when run with `python -bb`. Fixes issue #271 reported
|
||
by Florian Bruhin, thank you!
|
||
|
||
- Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by
|
||
legrandlegrand - much better.
|
||
|
||
- Python 3.5 will not be supported in the pyparsing 3 releases. This will allow
|
||
for future pyparsing releases to add parameter type annotations, and to take
|
||
advantage of dict key ordering in internal results name tracking.
|
||
|
||
|
||
Version 3.0.0b2 - December, 2020
|
||
--------------------------------
|
||
- API CHANGE
|
||
`locatedExpr` is being replaced by the class `Located`. `Located` has the same
|
||
constructor interface as `locatedExpr`, but fixes bugs in the returned
|
||
`ParseResults` when the searched expression contains multiple tokens, or
|
||
has internal results names.
|
||
|
||
`locatedExpr` is deprecated, and will be removed in a future release.
|
||
|
||
|
||
Version 3.0.0b1 - November, 2020
|
||
--------------------------------
|
||
- API CHANGE
|
||
Diagnostic flags have been moved to an enum, `pyparsing.Diagnostics`, and
|
||
they are enabled through module-level methods:
|
||
- `pyparsing.enable_diag()`
|
||
- `pyparsing.disable_diag()`
|
||
- `pyparsing.enable_all_warnings()`
|
||
|
||
- API CHANGE
|
||
Most previous `SyntaxWarnings` that were warned when using pyparsing
|
||
classes incorrectly have been converted to `TypeError` and `ValueError` exceptions,
|
||
consistent with Python calling conventions. All warnings warned by diagnostic
|
||
flags have been converted from `SyntaxWarnings` to `UserWarnings`.
|
||
|
||
- To support parsers that are intended to generate native Python collection
|
||
types such as lists and dicts, the `Group` and `Dict` classes now accept an
|
||
additional boolean keyword argument `aslist` and `asdict` respectively. See
|
||
the `jsonParser.py` example in the `pyparsing/examples` source directory for
|
||
how to return types as `ParseResults` and as Python collection types, and the
|
||
distinctions in working with the different types.
|
||
|
||
In addition parse actions that must return a value of list type (which would
|
||
normally be converted internally to a `ParseResults`) can override this default
|
||
behavior by returning their list wrapped in the new `ParseResults.List` class:
|
||
|
||
# this parse action tries to return a list, but pyparsing
|
||
# will convert to a ParseResults
|
||
def return_as_list_but_still_get_parse_results(tokens):
|
||
return tokens.asList()
|
||
|
||
# this parse action returns the tokens as a list, and pyparsing will
|
||
# maintain its list type in the final parsing results
|
||
def return_as_list(tokens):
|
||
return ParseResults.List(tokens.asList())
|
||
|
||
This is the mechanism used internally by the `Group` class when defined
|
||
using `aslist=True`.
|
||
|
||
- A new `IndentedBlock` class is introduced, to eventually replace the
|
||
current `indentedBlock` helper method. The interface is largely the same,
|
||
however, the new class manages its own internal indentation stack, so
|
||
it is no longer necessary to maintain an external `indentStack` variable.
|
||
|
||
- API CHANGE
|
||
Added `cache_hit` keyword argument to debug actions. Previously, if packrat
|
||
parsing was enabled, the debug methods were not called in the event of cache
|
||
hits. Now these methods will be called, with an added argument
|
||
`cache_hit=True`.
|
||
|
||
If you are using packrat parsing and enable debug on expressions using a
|
||
custom debug method, you can add the `cache_hit=False` keyword argument,
|
||
and your method will be called on packrat cache hits. If you choose not
|
||
to add this keyword argument, the debug methods will fail silently,
|
||
behaving as they did previously.
|
||
|
||
- When using `setDebug` with packrat parsing enabled, packrat cache hits will
|
||
now be included in the output, shown with a leading '*'. (Previously, cache
|
||
hits and responses were not included in debug output.) For those using custom
|
||
debug actions, see the previous item regarding an optional API change
|
||
for those methods.
|
||
|
||
- `setDebug` output will also show more details about what expression
|
||
is about to be parsed (the current line of text being parsed, and
|
||
the current parse position):
|
||
|
||
Match integer at loc 0(1,1)
|
||
1 2 3
|
||
^
|
||
Matched integer -> ['1']
|
||
|
||
The current debug location will also be indicated after whitespace
|
||
has been skipped (was previously inconsistent, reported in Issue #244,
|
||
by Frank Goyens, thanks!).
|
||
|
||
- Modified the repr() output for `ParseResults` to include the class
|
||
name as part of the output. This is to clarify for new pyparsing users
|
||
who misread the repr output as a tuple of a list and a dict. pyparsing
|
||
results will now read like:
|
||
|
||
ParseResults(['abc', 'def'], {'qty': 100}]
|
||
|
||
instead of just:
|
||
|
||
(['abc', 'def'], {'qty': 100}]
|
||
|
||
- Fixed bugs in Each when passed `OneOrMore` or `ZeroOrMore` expressions:
|
||
. first expression match could be enclosed in an extra nesting level
|
||
. out-of-order expressions now handled correctly if mixed with required
|
||
expressions
|
||
. results names are maintained correctly for these expressions
|
||
|
||
- Fixed traceback trimming, and added `ParserElement.verbose_traceback`
|
||
save/restore to `reset_pyparsing_context()`.
|
||
|
||
- Default string for `Word` expressions now also include indications of
|
||
`min` and `max` length specification, if applicable, similar to regex length
|
||
specifications:
|
||
|
||
Word(alphas) -> "W:(A-Za-z)"
|
||
Word(nums) -> "W:(0-9)"
|
||
Word(nums, exact=3) -> "W:(0-9){3}"
|
||
Word(nums, min=2) -> "W:(0-9){2,...}"
|
||
Word(nums, max=3) -> "W:(0-9){1,3}"
|
||
Word(nums, min=2, max=3) -> "W:(0-9){2,3}"
|
||
|
||
For expressions of the `Char` class (similar to `Word(..., exact=1)`, the expression
|
||
is simply the character range in parentheses:
|
||
|
||
Char(nums) -> "(0-9)"
|
||
Char(alphas) -> "(A-Za-z)"
|
||
|
||
- Removed `copy()` override in `Keyword` class which did not preserve definition
|
||
of ident chars from the original expression. PR #233 submitted by jgrey4296,
|
||
thanks!
|
||
|
||
- In addition to `pyparsing.__version__`, there is now also a `pyparsing.__version_info__`,
|
||
following the same structure and field names as in `sys.version_info`.
|
||
|
||
|
||
Version 3.0.0a2 - June, 2020
|
||
----------------------------
|
||
- Summary of changes for 3.0.0 can be found in "What's New in Pyparsing 3.0.0"
|
||
documentation.
|
||
|
||
- API CHANGE
|
||
Changed result returned when parsing using `countedArray`,
|
||
the array items are no longer returned in a doubly-nested
|
||
list.
|
||
|
||
- An excellent new enhancement is the new railroad diagram
|
||
generator for documenting pyparsing parsers:
|
||
|
||
import pyparsing as pp
|
||
from pyparsing.diagram import to_railroad, railroad_to_html
|
||
from pathlib import Path
|
||
|
||
# define a simple grammar for parsing street addresses such
|
||
# as "123 Main Street"
|
||
# number word...
|
||
number = pp.Word(pp.nums).setName("number")
|
||
name = pp.Word(pp.alphas).setName("word")[1, ...]
|
||
|
||
parser = number("house_number") + name("street")
|
||
parser.setName("street address")
|
||
|
||
# construct railroad track diagram for this parser and
|
||
# save as HTML
|
||
rr = to_railroad(parser)
|
||
Path('parser_rr_diag.html').write_text(railroad_to_html(rr))
|
||
|
||
Very nice work provided by Michael Milton, thanks a ton!
|
||
|
||
- Enhanced default strings created for Word expressions, now showing
|
||
string ranges if possible. `Word(alphas)` would formerly
|
||
print as `W:(ABCD...)`, now prints as `W:(A-Za-z)`.
|
||
|
||
- Added `ignoreWhitespace(recurse:bool = True)`` and added a
|
||
recurse argument to `leaveWhitespace`, both added to provide finer
|
||
control over pyparsing's whitespace skipping. Also contributed
|
||
by Michael Milton.
|
||
|
||
- The unicode range definitions for the various languages were
|
||
recalculated by interrogating the unicodedata module by character
|
||
name, selecting characters that contained that language in their
|
||
Unicode name. (Issue #227)
|
||
|
||
Also, pyparsing_unicode.Korean was renamed to Hangul (Korean
|
||
is also defined as a synonym for compatibility).
|
||
|
||
- Enhanced `ParseResults` dump() to show both results names and list
|
||
subitems. Fixes bug where adding a results name would hide
|
||
lower-level structures in the `ParseResults`.
|
||
|
||
- Added new __diag__ warnings:
|
||
|
||
"warn_on_parse_using_empty_Forward" - warns that a Forward
|
||
has been included in a grammar, but no expression was
|
||
attached to it using '<<=' or '<<'
|
||
|
||
"warn_on_assignment_to_Forward" - warns that a Forward has
|
||
been created, but was probably later overwritten by
|
||
erroneously using '=' instead of '<<=' (this is a common
|
||
mistake when using Forwards)
|
||
(**currently not working on PyPy**)
|
||
|
||
- Added `ParserElement`.recurse() method to make it simpler for
|
||
grammar utilities to navigate through the tree of expressions in
|
||
a pyparsing grammar.
|
||
|
||
- Fixed bug in `ParseResults` repr() which showed all matching
|
||
entries for a results name, even if `listAllMatches` was set
|
||
to False when creating the `ParseResults` originally. Reported
|
||
by Nicholas42 on GitHub, good catch! (Issue #205)
|
||
|
||
- Modified refactored modules to use relative imports, as
|
||
pointed out by setuptools project member jaraco, thank you!
|
||
|
||
- Off-by-one bug found in the roman_numerals.py example, a bug
|
||
that has been there for about 14 years! PR submitted by
|
||
Jay Pedersen, nice catch!
|
||
|
||
- A simplified Lua parser has been added to the examples
|
||
(lua_parser.py).
|
||
|
||
- Added make_diagram.py to the examples directory to demonstrate
|
||
creation of railroad diagrams for selected pyparsing examples.
|
||
Also restructured some examples to make their parsers importable
|
||
without running their embedded tests.
|
||
|
||
|
||
Version 3.0.0a1 - April, 2020
|
||
-----------------------------
|
||
- Removed Py2.x support and other deprecated features. Pyparsing
|
||
now requires Python 3.5 or later. If you are using an earlier
|
||
version of Python, you must use a Pyparsing 2.4.x version
|
||
|
||
Deprecated features removed:
|
||
. `ParseResults.asXML()` - if used for debugging, switch
|
||
to using `ParseResults.dump()`; if used for data transfer,
|
||
use `ParseResults.asDict()` to convert to a nested Python
|
||
dict, which can then be converted to XML or JSON or
|
||
other transfer format
|
||
|
||
. `operatorPrecedence` synonym for `infixNotation` -
|
||
convert to calling `infixNotation`
|
||
|
||
. `commaSeparatedList` - convert to using
|
||
pyparsing_common.comma_separated_list
|
||
|
||
. `upcaseTokens` and `downcaseTokens` - convert to using
|
||
`pyparsing_common.upcaseTokens` and `downcaseTokens`
|
||
|
||
. __compat__.collect_all_And_tokens will not be settable to
|
||
False to revert to pre-2.3.1 results name behavior -
|
||
review use of names for `MatchFirst` and Or expressions
|
||
containing And expressions, as they will return the
|
||
complete list of parsed tokens, not just the first one.
|
||
Use `__diag__.warn_multiple_tokens_in_named_alternation`
|
||
to help identify those expressions in your parsers that
|
||
will have changed as a result.
|
||
|
||
- Removed support for running `python setup.py test`. The setuptools
|
||
maintainers consider the test command deprecated (see
|
||
<https://github.com/pypa/setuptools/issues/1684>). To run the Pyparsing test,
|
||
use the command `tox`.
|
||
|
||
- API CHANGE:
|
||
The staticmethod `ParseException.explain` has been moved to
|
||
`ParseBaseException.explain_exception`, and a new `explain` instance
|
||
method added to `ParseBaseException`. This will make calls to `explain`
|
||
much more natural:
|
||
|
||
try:
|
||
expr.parseString("...")
|
||
except ParseException as pe:
|
||
print(pe.explain())
|
||
|
||
- POTENTIAL API CHANGE:
|
||
`ZeroOrMore` expressions that have results names will now
|
||
include empty lists for their name if no matches are found.
|
||
Previously, no named result would be present. Code that tested
|
||
for the presence of any expressions using "if name in results:"
|
||
will now always return True. This code will need to change to
|
||
"if name in results and results[name]:" or just
|
||
"if results[name]:". Also, any parser unit tests that check the
|
||
`asDict()` contents will now see additional entries for parsers
|
||
having named `ZeroOrMore` expressions, whose values will be `[]`.
|
||
|
||
- POTENTIAL API CHANGE:
|
||
Fixed a bug in which calls to `ParserElement.setDefaultWhitespaceChars`
|
||
did not change whitespace definitions on any pyparsing built-in
|
||
expressions defined at import time (such as `quotedString`, or those
|
||
defined in pyparsing_common). This would lead to confusion when
|
||
built-in expressions would not use updated default whitespace
|
||
characters. Now a call to `ParserElement.setDefaultWhitespaceChars`
|
||
will also go and update all pyparsing built-ins to use the new
|
||
default whitespace characters. (Note that this will only modify
|
||
expressions defined within the pyparsing module.) Prompted by
|
||
work on a StackOverflow question posted by jtiai.
|
||
|
||
- Expanded __diag__ and __compat__ to actual classes instead of
|
||
just namespaces, to add some helpful behavior:
|
||
- enable() and .disable() methods to give extra
|
||
help when setting or clearing flags (detects invalid
|
||
flag names, detects when trying to set a __compat__ flag
|
||
that is no longer settable). Use these methods now to
|
||
set or clear flags, instead of directly setting to True or
|
||
False.
|
||
|
||
import pyparsing as pp
|
||
pp.__diag__.enable("warn_multiple_tokens_in_named_alternation")
|
||
|
||
- __diag__.enable_all_warnings() is another helper that sets
|
||
all "warn*" diagnostics to True.
|
||
|
||
pp.__diag__.enable_all_warnings()
|
||
|
||
- added new warning, "warn_on_match_first_with_lshift_operator" to
|
||
warn when using '<<' with a '|' `MatchFirst` operator, which will
|
||
create an unintended expression due to precedence of operations.
|
||
|
||
Example: This statement will erroneously define the `fwd` expression
|
||
as just `expr_a`, even though `expr_a | expr_b` was intended,
|
||
since '<<' operator has precedence over '|':
|
||
|
||
fwd << expr_a | expr_b
|
||
|
||
To correct this, use the '<<=' operator (preferred) or parentheses
|
||
to override operator precedence:
|
||
|
||
fwd <<= expr_a | expr_b
|
||
or
|
||
fwd << (expr_a | expr_b)
|
||
|
||
- Cleaned up default tracebacks when getting a `ParseException` when calling
|
||
`parseString`. Exception traces should now stop at the call in `parseString`,
|
||
and not include the internal traceback frames. (If the full traceback
|
||
is desired, then set `ParserElement`.verbose_traceback to True.)
|
||
|
||
- Fixed `FutureWarnings` that sometimes are raised when '[' passed as a
|
||
character to Word.
|
||
|
||
- New namespace, assert methods and classes added to support writing
|
||
unit tests.
|
||
- `assertParseResultsEquals`
|
||
- `assertParseAndCheckList`
|
||
- `assertParseAndCheckDict`
|
||
- `assertRunTestResults`
|
||
- `assertRaisesParseException`
|
||
- `reset_pyparsing_context` context manager, to restore pyparsing
|
||
config settings
|
||
|
||
- Enhanced error messages and error locations when parsing fails on
|
||
the Keyword or `CaselessKeyword` classes due to the presence of a
|
||
preceding or trailing keyword character. Surfaced while
|
||
working with metaperl on issue #201.
|
||
|
||
- Enhanced the Regex class to be compatible with re's compiled with the
|
||
re-equivalent regex module. Individual expressions can be built with
|
||
regex compiled expressions using:
|
||
|
||
import pyparsing as pp
|
||
import regex
|
||
|
||
# would use regex for this expression
|
||
integer_parser = pp.Regex(regex.compile(r'\d+'))
|
||
|
||
Inspired by PR submitted by bjrnfrdnnd on GitHub, very nice!
|
||
|
||
- Fixed handling of `ParseSyntaxExceptions` raised as part of Each
|
||
expressions, when sub-expressions contain '-' backtrack
|
||
suppression. As part of resolution to a question posted by John
|
||
Greene on StackOverflow.
|
||
|
||
- Potentially *huge* performance enhancement when parsing Word
|
||
expressions built from pyparsing_unicode character sets. Word now
|
||
internally converts ranges of consecutive characters to regex
|
||
character ranges (converting "0123456789" to "0-9" for instance),
|
||
resulting in as much as 50X improvement in performance! Work
|
||
inspired by a question posted by Midnighter on StackOverflow.
|
||
|
||
- Improvements in select_parser.py, to include new SQL syntax
|
||
from SQLite. PR submitted by Robert Coup, nice work!
|
||
|
||
- Fixed bug in `PrecededBy` which caused infinite recursion, issue #127
|
||
submitted by EdwardJB.
|
||
|
||
- Fixed bug in `CloseMatch` where end location was incorrectly
|
||
computed; and updated partial_gene_match.py example.
|
||
|
||
- Fixed bug in `indentedBlock` with a parser using two different
|
||
types of nested indented blocks with different indent values,
|
||
but sharing the same indent stack, submitted by renzbagaporo.
|
||
|
||
- Fixed bug in Each when using Regex, when Regex expression would
|
||
get parsed twice; issue #183 submitted by scauligi, thanks!
|
||
|
||
- `BigQueryViewParser.py` added to examples directory, PR submitted
|
||
by Michael Smedberg, nice work!
|
||
|
||
- booleansearchparser.py added to examples directory, PR submitted
|
||
by xecgr. Builds on searchparser.py, adding support for '*'
|
||
wildcards and non-Western alphabets.
|
||
|
||
- Fixed bug in delta_time.py example, when using a quantity
|
||
of seconds/minutes/hours/days > 999.
|
||
|
||
- Fixed bug in regex definitions for real and sci_real expressions in
|
||
pyparsing_common. Issue #194, reported by Michael Wayne Goodman, thanks!
|
||
|
||
- Fixed `FutureWarning` raised beginning in Python 3.7 for Regex expressions
|
||
containing '[' within a regex set.
|
||
|
||
- Minor reformatting of output from `runTests` to make embedded
|
||
comments more visible.
|
||
|
||
- And finally, many thanks to those who helped in the restructuring
|
||
of the pyparsing code base as part of this release. Pyparsing now
|
||
has more standard package structure, more standard unit tests,
|
||
and more standard code formatting (using black). Special thanks
|
||
to jdufresne, klahnakoski, mattcarmody, and ckeygusuz, to name just
|
||
a few.
|
||
|
||
|
||
Version 2.4.7 - April, 2020
|
||
---------------------------
|
||
- Backport of selected fixes from 3.0.0 work:
|
||
. Each bug with Regex expressions
|
||
. And expressions not properly constructing with generator
|
||
. Traceback abbreviation
|
||
. Bug in delta_time example
|
||
. Fix regexen in pyparsing_common.real and .sci_real
|
||
. Avoid FutureWarning on Python 3.7 or later
|
||
. Cleanup output in runTests if comments are embedded in test string
|
||
|
||
|
||
Version 2.4.6 - December, 2019
|
||
------------------------------
|
||
- Fixed typos in White mapping of whitespace characters, to use
|
||
correct "\u" prefix instead of "u\".
|
||
|
||
- Fix bug in left-associative ternary operators defined using
|
||
infixNotation. First reported on StackOverflow by user Jeronimo.
|
||
|
||
- Backport of pyparsing_test namespace from 3.0.0, including
|
||
TestParseResultsAsserts mixin class defining unittest-helper
|
||
methods:
|
||
. def assertParseResultsEquals(
|
||
self, result, expected_list=None, expected_dict=None, msg=None)
|
||
. def assertParseAndCheckList(
|
||
self, expr, test_string, expected_list, msg=None, verbose=True)
|
||
. def assertParseAndCheckDict(
|
||
self, expr, test_string, expected_dict, msg=None, verbose=True)
|
||
. def assertRunTestResults(
|
||
self, run_tests_report, expected_parse_results=None, msg=None)
|
||
. def assertRaisesParseException(self, exc_type=ParseException, msg=None)
|
||
|
||
To use the methods in this mixin class, declare your unittest classes as:
|
||
|
||
from pyparsing import pyparsing_test as ppt
|
||
class MyParserTest(ppt.TestParseResultsAsserts, unittest.TestCase):
|
||
...
|
||
|
||
|
||
Version 2.4.5 - November, 2019
|
||
------------------------------
|
||
- NOTE: final release compatible with Python 2.x.
|
||
|
||
- Fixed issue with reading README.rst as part of setup.py's
|
||
initialization of the project's long_description, with a
|
||
non-ASCII space character causing errors when installing from
|
||
source on platforms where UTF-8 is not the default encoding.
|
||
|
||
|
||
Version 2.4.4 - November, 2019
|
||
--------------------------------
|
||
- Unresolved symbol reference in 2.4.3 release was masked by stdout
|
||
buffering in unit tests, thanks for the prompt heads-up, Ned
|
||
Batchelder!
|
||
|
||
|
||
Version 2.4.3 - November, 2019
|
||
------------------------------
|
||
- Fixed a bug in ParserElement.__eq__ that would for some parsers
|
||
create a recursion error at parser definition time. Thanks to
|
||
Michael Clerx for the assist. (Addresses issue #123)
|
||
|
||
- Fixed bug in indentedBlock where a block that ended at the end
|
||
of the input string could cause pyparsing to loop forever. Raised
|
||
as part of discussion on StackOverflow with geckos.
|
||
|
||
- Backports from pyparsing 3.0.0:
|
||
. __diag__.enable_all_warnings()
|
||
. Fixed bug in PrecededBy which caused infinite recursion, issue #127
|
||
. support for using regex-compiled RE to construct Regex expressions
|
||
|
||
|
||
Version 2.4.2 - July, 2019
|
||
--------------------------
|
||
- Updated the shorthand notation that has been added for repetition
|
||
expressions: expr[min, max], with '...' valid as a min or max value:
|
||
- expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
|
||
- expr[1, ...] is equivalent to OneOrMore(expr)
|
||
- expr[n, ...] or expr[n,] is equivalent
|
||
to expr*n + ZeroOrMore(expr)
|
||
(read as "n or more instances of expr")
|
||
- expr[..., n] is equivalent to expr*(0, n)
|
||
- expr[m, n] is equivalent to expr*(m, n)
|
||
Note that expr[..., n] and expr[m, n] do not raise an exception
|
||
if more than n exprs exist in the input stream. If this
|
||
behavior is desired, then write expr[..., n] + ~expr.
|
||
|
||
Better interpretation of [...] as ZeroOrMore raised by crowsonkb,
|
||
thanks for keeping me in line!
|
||
|
||
If upgrading from 2.4.1 or 2.4.1.1 and you have used `expr[...]`
|
||
for `OneOrMore(expr)`, it must be updated to `expr[1, ...]`.
|
||
|
||
- The defaults on all the `__diag__` switches have been set to False,
|
||
to avoid getting alarming warnings. To use these diagnostics, set
|
||
them to True after importing pyparsing.
|
||
|
||
Example:
|
||
|
||
import pyparsing as pp
|
||
pp.__diag__.warn_multiple_tokens_in_named_alternation = True
|
||
|
||
- Fixed bug introduced by the use of __getitem__ for repetition,
|
||
overlooking Python's legacy implementation of iteration
|
||
by sequentially calling __getitem__ with increasing numbers until
|
||
getting an IndexError. Found during investigation of problem
|
||
reported by murlock, merci!
|
||
|
||
|
||
Version 2.4.2a1 - July, 2019
|
||
----------------------------
|
||
It turns out I got the meaning of `[...]` absolutely backwards,
|
||
so I've deleted 2.4.1 and am repushing this release as 2.4.2a1
|
||
for people to give it a try before I can call it ready to go.
|
||
|
||
The `expr[...]` notation was pushed out to be synonymous with
|
||
`OneOrMore(expr)`, but this is really counter to most Python
|
||
notations (and even other internal pyparsing notations as well).
|
||
It should have been defined to be equivalent to ZeroOrMore(expr).
|
||
|
||
- Changed [...] to emit ZeroOrMore instead of OneOrMore.
|
||
|
||
- Removed code that treats ParserElements like iterables.
|
||
|
||
- Change all __diag__ switches to False.
|
||
|
||
|
||
Version 2.4.1.1 - July 24, 2019
|
||
-------------------------------
|
||
This is a re-release of version 2.4.1 to restore the release history
|
||
in PyPI, since the 2.4.1 release was deleted.
|
||
|
||
There are 3 known issues in this release, which are fixed in
|
||
the upcoming 2.4.2:
|
||
|
||
- API change adding support for `expr[...]` - the original
|
||
code in 2.4.1 incorrectly implemented this as OneOrMore.
|
||
Code using this feature under this release should explicitly
|
||
use `expr[0, ...]` for ZeroOrMore and `expr[1, ...]` for
|
||
OneOrMore. In 2.4.2 you will be able to write `expr[...]`
|
||
equivalent to `ZeroOrMore(expr)`.
|
||
|
||
- Bug if composing And, Or, MatchFirst, or Each expressions
|
||
using an expression. This only affects code which uses
|
||
explicit expression construction using the And, Or, etc.
|
||
classes instead of using overloaded operators '+', '^', and
|
||
so on. If constructing an And using a single expression,
|
||
you may get an error that "cannot multiply ParserElement by
|
||
0 or (0, 0)" or a Python `IndexError`. Change code like
|
||
|
||
cmd = Or(Word(alphas))
|
||
|
||
to
|
||
|
||
cmd = Or([Word(alphas)])
|
||
|
||
(Note that this is not the recommended style for constructing
|
||
Or expressions.)
|
||
|
||
- Some newly-added `__diag__` switches are enabled by default,
|
||
which may give rise to noisy user warnings for existing parsers.
|
||
You can disable them using:
|
||
|
||
import pyparsing as pp
|
||
pp.__diag__.warn_multiple_tokens_in_named_alternation = False
|
||
pp.__diag__.warn_ungrouped_named_tokens_in_collection = False
|
||
pp.__diag__.warn_name_set_on_empty_Forward = False
|
||
pp.__diag__.warn_on_multiple_string_args_to_oneof = False
|
||
pp.__diag__.enable_debug_on_named_expressions = False
|
||
|
||
In 2.4.2 these will all be set to False by default.
|
||
|
||
|
||
Version 2.4.1 - July, 2019
|
||
--------------------------
|
||
- NOTE: Deprecated functions and features that will be dropped
|
||
in pyparsing 2.5.0 (planned next release):
|
||
|
||
. support for Python 2 - ongoing users running with
|
||
Python 2 can continue to use pyparsing 2.4.1
|
||
|
||
. ParseResults.asXML() - if used for debugging, switch
|
||
to using ParseResults.dump(); if used for data transfer,
|
||
use ParseResults.asDict() to convert to a nested Python
|
||
dict, which can then be converted to XML or JSON or
|
||
other transfer format
|
||
|
||
. operatorPrecedence synonym for infixNotation -
|
||
convert to calling infixNotation
|
||
|
||
. commaSeparatedList - convert to using
|
||
pyparsing_common.comma_separated_list
|
||
|
||
. upcaseTokens and downcaseTokens - convert to using
|
||
pyparsing_common.upcaseTokens and downcaseTokens
|
||
|
||
. __compat__.collect_all_And_tokens will not be settable to
|
||
False to revert to pre-2.3.1 results name behavior -
|
||
review use of names for MatchFirst and Or expressions
|
||
containing And expressions, as they will return the
|
||
complete list of parsed tokens, not just the first one.
|
||
Use __diag__.warn_multiple_tokens_in_named_alternation
|
||
(described below) to help identify those expressions
|
||
in your parsers that will have changed as a result.
|
||
|
||
- A new shorthand notation has been added for repetition
|
||
expressions: expr[min, max], with '...' valid as a min
|
||
or max value:
|
||
- expr[...] is equivalent to OneOrMore(expr)
|
||
- expr[0, ...] is equivalent to ZeroOrMore(expr)
|
||
- expr[1, ...] is equivalent to OneOrMore(expr)
|
||
- expr[n, ...] or expr[n,] is equivalent
|
||
to expr*n + ZeroOrMore(expr)
|
||
(read as "n or more instances of expr")
|
||
- expr[..., n] is equivalent to expr*(0, n)
|
||
- expr[m, n] is equivalent to expr*(m, n)
|
||
Note that expr[..., n] and expr[m, n] do not raise an exception
|
||
if more than n exprs exist in the input stream. If this
|
||
behavior is desired, then write expr[..., n] + ~expr.
|
||
|
||
- '...' can also be used as short hand for SkipTo when used
|
||
in adding parse expressions to compose an And expression.
|
||
|
||
Literal('start') + ... + Literal('end')
|
||
And(['start', ..., 'end'])
|
||
|
||
are both equivalent to:
|
||
|
||
Literal('start') + SkipTo('end')("_skipped*") + Literal('end')
|
||
|
||
The '...' form has the added benefit of not requiring repeating
|
||
the skip target expression. Note that the skipped text is
|
||
returned with '_skipped' as a results name, and that the contents of
|
||
`_skipped` will contain a list of text from all `...`s in the expression.
|
||
|
||
- '...' can also be used as a "skip forward in case of error" expression:
|
||
|
||
expr = "start" + (Word(nums).setName("int") | ...) + "end"
|
||
|
||
expr.parseString("start 456 end")
|
||
['start', '456', 'end']
|
||
|
||
expr.parseString("start 456 foo 789 end")
|
||
['start', '456', 'foo 789 ', 'end']
|
||
- _skipped: ['foo 789 ']
|
||
|
||
expr.parseString("start foo end")
|
||
['start', 'foo ', 'end']
|
||
- _skipped: ['foo ']
|
||
|
||
expr.parseString("start end")
|
||
['start', '', 'end']
|
||
- _skipped: ['missing <int>']
|
||
|
||
Note that in all the error cases, the '_skipped' results name is
|
||
present, showing a list of the extra or missing items.
|
||
|
||
This form is only valid when used with the '|' operator.
|
||
|
||
- Improved exception messages to show what was actually found, not
|
||
just what was expected.
|
||
|
||
word = pp.Word(pp.alphas)
|
||
pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)
|
||
|
||
Former exception message:
|
||
|
||
pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)
|
||
|
||
New exception message:
|
||
|
||
pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)
|
||
|
||
- Added diagnostic switches to help detect and warn about common
|
||
parser construction mistakes, or enable additional parse
|
||
debugging. Switches are attached to the pyparsing.__diag__
|
||
namespace object:
|
||
- warn_multiple_tokens_in_named_alternation - flag to enable warnings when a results
|
||
name is defined on a MatchFirst or Or expression with one or more And subexpressions
|
||
(default=True)
|
||
- warn_ungrouped_named_tokens_in_collection - flag to enable warnings when a results
|
||
name is defined on a containing expression with ungrouped subexpressions that also
|
||
have results names (default=True)
|
||
- warn_name_set_on_empty_Forward - flag to enable warnings when a Forward is defined
|
||
with a results name, but has no contents defined (default=False)
|
||
- warn_on_multiple_string_args_to_oneof - flag to enable warnings when oneOf is
|
||
incorrectly called with multiple str arguments (default=True)
|
||
- enable_debug_on_named_expressions - flag to auto-enable debug on all subsequent
|
||
calls to ParserElement.setName() (default=False)
|
||
|
||
warn_multiple_tokens_in_named_alternation is intended to help
|
||
those who currently have set __compat__.collect_all_And_tokens to
|
||
False as a workaround for using the pre-2.3.1 code with named
|
||
MatchFirst or Or expressions containing an And expression.
|
||
|
||
- Added ParseResults.from_dict classmethod, to simplify creation
|
||
of a ParseResults with results names using a dict, which may be nested.
|
||
This makes it easy to add a sub-level of named items to the parsed
|
||
tokens in a parse action.
|
||
|
||
- Added asKeyword argument (default=False) to oneOf, to force
|
||
keyword-style matching on the generated expressions.
|
||
|
||
- ParserElement.runTests now accepts an optional 'file' argument to
|
||
redirect test output to a file-like object (such as a StringIO,
|
||
or opened file). Default is to write to sys.stdout.
|
||
|
||
- conditionAsParseAction is a helper method for constructing a
|
||
parse action method from a predicate function that simply
|
||
returns a boolean result. Useful for those places where a
|
||
predicate cannot be added using addCondition, but must be
|
||
converted to a parse action (such as in infixNotation). May be
|
||
used as a decorator if default message and exception types
|
||
can be used. See ParserElement.addCondition for more details
|
||
about the expected signature and behavior for predicate condition
|
||
methods.
|
||
|
||
- While investigating issue #93, I found that Or and
|
||
addCondition could interact to select an alternative that
|
||
is not the longest match. This is because Or first checks
|
||
all alternatives for matches without running attached
|
||
parse actions or conditions, orders by longest match, and
|
||
then rechecks for matches with conditions and parse actions.
|
||
Some expressions, when checking with conditions, may end
|
||
up matching on a shorter token list than originally matched,
|
||
but would be selected because of its original priority.
|
||
This matching code has been expanded to do more extensive
|
||
searching for matches when a second-pass check matches a
|
||
smaller list than in the first pass.
|
||
|
||
- Fixed issue #87, a regression in indented block.
|
||
Reported by Renz Bagaporo, who submitted a very nice repro
|
||
example, which makes the bug-fixing process a lot easier,
|
||
thanks!
|
||
|
||
- Fixed MemoryError issue #85 and #91 with str generation for
|
||
Forwards. Thanks decalage2 and Harmon758 for your patience.
|
||
|
||
- Modified setParseAction to accept None as an argument,
|
||
indicating that all previously-defined parse actions for the
|
||
expression should be cleared.
|
||
|
||
- Modified pyparsing_common.real and sci_real to parse reals
|
||
without leading integer digits before the decimal point,
|
||
consistent with Python real number formats. Original PR #98
|
||
submitted by ansobolev.
|
||
|
||
- Modified runTests to call postParse function before dumping out
|
||
the parsed results - allows for postParse to add further results,
|
||
such as indications of additional validation success/failure.
|
||
|
||
- Updated statemachine example: refactored state transitions to use
|
||
overridden classmethods; added <statename>Mixin class to simplify
|
||
definition of application classes that "own" the state object and
|
||
delegate to it to model state-specific properties and behavior.
|
||
|
||
- Added example nested_markup.py, showing a simple wiki markup with
|
||
nested markup directives, and illustrating the use of '...' for
|
||
skipping over input to match the next expression. (This example
|
||
uses syntax that is not valid under Python 2.)
|
||
|
||
- Rewrote delta_time.py example (renamed from deltaTime.py) to
|
||
fix some omitted formats and upgrade to latest pyparsing idioms,
|
||
beginning with writing an actual BNF.
|
||
|
||
- With the help and encouragement from several contributors, including
|
||
Matěj Cepl and Cengiz Kaygusuz, I've started cleaning up the internal
|
||
coding styles in core pyparsing, bringing it up to modern coding
|
||
practices from pyparsing's early development days dating back to
|
||
2003. Whitespace has been largely standardized along PEP8 guidelines,
|
||
removing extra spaces around parentheses, and adding them around
|
||
arithmetic operators and after colons and commas. I was going to hold
|
||
off on doing this work until after 2.4.1, but after cleaning up a
|
||
few trial classes, the difference was so significant that I continued
|
||
on to the rest of the core code base. This should facilitate future
|
||
work and submitted PRs, allowing them to focus on substantive code
|
||
changes, and not get sidetracked by whitespace issues.
|
||
|
||
|
||
Version 2.4.0 - April, 2019
|
||
---------------------------
|
||
- Well, it looks like the API change that was introduced in 2.3.1 was more
|
||
drastic than expected, so for a friendlier forward upgrade path, this
|
||
release:
|
||
. Bumps the current version number to 2.4.0, to reflect this
|
||
incompatible change.
|
||
. Adds a pyparsing.__compat__ object for specifying compatibility with
|
||
future breaking changes.
|
||
. Conditionalizes the API-breaking behavior, based on the value
|
||
pyparsing.__compat__.collect_all_And_tokens. By default, this value
|
||
will be set to True, reflecting the new bugfixed behavior. To set this
|
||
value to False, add to your code:
|
||
|
||
import pyparsing
|
||
pyparsing.__compat__.collect_all_And_tokens = False
|
||
|
||
. User code that is dependent on the pre-bugfix behavior can restore
|
||
it by setting this value to False.
|
||
|
||
In 2.5 and later versions, the conditional code will be removed and
|
||
setting the flag to True or False in these later versions will have no
|
||
effect.
|
||
|
||
- Updated unitTests.py and simple_unit_tests.py to be compatible with
|
||
"python setup.py test". To run tests using setup, do:
|
||
|
||
python setup.py test
|
||
python setup.py test -s unitTests.suite
|
||
python setup.py test -s simple_unit_tests.suite
|
||
|
||
Prompted by issue #83 and PR submitted by bdragon28, thanks.
|
||
|
||
- Fixed bug in runTests handling '\n' literals in quoted strings.
|
||
|
||
- Added tag_body attribute to the start tag expressions generated by
|
||
makeHTMLTags, so that you can avoid using SkipTo to roll your own
|
||
tag body expression:
|
||
|
||
a, aEnd = pp.makeHTMLTags('a')
|
||
link = a + a.tag_body("displayed_text") + aEnd
|
||
for t in s.searchString(html_page):
|
||
print(t.displayed_text, '->', t.startA.href)
|
||
|
||
- indentedBlock failure handling was improved; PR submitted by TMiguelT,
|
||
thanks!
|
||
|
||
- Address Py2 incompatibility in simpleUnitTests, plus explain() and
|
||
Forward str() cleanup; PRs graciously provided by eswald.
|
||
|
||
- Fixed docstring with embedded '\w', which creates SyntaxWarnings in
|
||
Py3.8, issue #80.
|
||
|
||
- Examples:
|
||
|
||
- Added example parser for rosettacode.org tutorial compiler.
|
||
|
||
- Added example to show how an HTML table can be parsed into a
|
||
collection of Python lists or dicts, one per row.
|
||
|
||
- Updated SimpleSQL.py example to handle nested selects, reworked
|
||
'where' expression to use infixNotation.
|
||
|
||
- Added include_preprocessor.py, similar to macroExpander.py.
|
||
|
||
- Examples using makeHTMLTags use new tag_body expression when
|
||
retrieving a tag's body text.
|
||
|
||
- Updated examples that are runnable as unit tests:
|
||
|
||
python setup.py test -s examples.antlr_grammar_tests
|
||
python setup.py test -s examples.test_bibparse
|
||
|
||
|
||
Version 2.3.1 - January, 2019
|
||
-----------------------------
|
||
- POSSIBLE API CHANGE: this release fixes a bug when results names were
|
||
attached to a MatchFirst or Or object containing an And object.
|
||
Previously, a results name on an And object within an enclosing MatchFirst
|
||
or Or could return just the first token in the And. Now, all the tokens
|
||
matched by the And are correctly returned. This may result in subtle
|
||
changes in the tokens returned if you have this condition in your pyparsing
|
||
scripts.
|
||
|
||
- New staticmethod ParseException.explain() to help diagnose parse exceptions
|
||
by showing the failing input line and the trace of ParserElements in
|
||
the parser leading up to the exception. explain() returns a multiline
|
||
string listing each element by name. (This is still an experimental
|
||
method, and the method signature and format of the returned string may
|
||
evolve over the next few releases.)
|
||
|
||
Example:
|
||
# define a parser to parse an integer followed by an
|
||
# alphabetic word
|
||
expr = pp.Word(pp.nums).setName("int")
|
||
+ pp.Word(pp.alphas).setName("word")
|
||
try:
|
||
# parse a string with a numeric second value instead of alpha
|
||
expr.parseString("123 355")
|
||
except pp.ParseException as pe:
|
||
print(pp.ParseException.explain(pe))
|
||
|
||
Prints:
|
||
123 355
|
||
^
|
||
ParseException: Expected word (at char 4), (line:1, col:5)
|
||
__main__.ExplainExceptionTest
|
||
pyparsing.And - {int word}
|
||
pyparsing.Word - word
|
||
|
||
explain() will accept any exception type and will list the function
|
||
names and parse expressions in the stack trace. This is especially
|
||
useful when an exception is raised in a parse action.
|
||
|
||
Note: explain() is only supported under Python 3.
|
||
|
||
- Fix bug in dictOf which could match an empty sequence, making it
|
||
infinitely loop if wrapped in a OneOrMore.
|
||
|
||
- Added unicode sets to pyparsing_unicode for Latin-A and Latin-B ranges.
|
||
|
||
- Added ability to define custom unicode sets as combinations of other sets
|
||
using multiple inheritance.
|
||
|
||
class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA):
|
||
pass
|
||
|
||
turkish_word = pp.Word(Turkish_set.alphas)
|
||
|
||
- Updated state machine import examples, with state machine demos for:
|
||
. traffic light
|
||
. library book checkin/checkout
|
||
. document review/approval
|
||
|
||
In the traffic light example, you can use the custom 'statemachine' keyword
|
||
to define the states for a traffic light, and have the state classes
|
||
auto-generated for you:
|
||
|
||
statemachine TrafficLightState:
|
||
Red -> Green
|
||
Green -> Yellow
|
||
Yellow -> Red
|
||
|
||
Similar for state machines with named transitions, like the library book
|
||
state example:
|
||
|
||
statemachine LibraryBookState:
|
||
New -(shelve)-> Available
|
||
Available -(reserve)-> OnHold
|
||
OnHold -(release)-> Available
|
||
Available -(checkout)-> CheckedOut
|
||
CheckedOut -(checkin)-> Available
|
||
|
||
Once the classes are defined, then additional Python code can reference those
|
||
classes to add class attributes, instance methods, etc.
|
||
|
||
See the examples in examples/statemachine
|
||
|
||
- Added an example parser for the decaf language. This language is used in
|
||
CS compiler classes in many colleges and universities.
|
||
|
||
- Fixup of docstrings to Sphinx format, inclusion of test files in the source
|
||
package, and convert markdown to rst throughout the distribution, great job
|
||
by Matěj Cepl!
|
||
|
||
- Expanded the whitespace characters recognized by the White class to include
|
||
all unicode defined spaces. Suggested in Issue #51 by rtkjbillo.
|
||
|
||
- Added optional postParse argument to ParserElement.runTests() to add a
|
||
custom callback to be called for test strings that parse successfully. Useful
|
||
for running tests that do additional validation or processing on the parsed
|
||
results. See updated chemicalFormulas.py example.
|
||
|
||
- Removed distutils fallback in setup.py. If installing the package fails,
|
||
please update to the latest version of setuptools. Plus overall project code
|
||
cleanup (CRLFs, whitespace, imports, etc.), thanks Jon Dufresne!
|
||
|
||
- Fix bug in CaselessKeyword, to make its behavior consistent with
|
||
Keyword(caseless=True). Fixes Issue #65 reported by telesphore.
|
||
|
||
|
||
Version 2.3.0 - October, 2018
|
||
-----------------------------
|
||
- NEW SUPPORT FOR UNICODE CHARACTER RANGES
|
||
This release introduces the pyparsing_unicode namespace class, defining
|
||
a series of language character sets to simplify the definition of alphas,
|
||
nums, alphanums, and printables in the following language sets:
|
||
. Arabic
|
||
. Chinese
|
||
. Cyrillic
|
||
. Devanagari
|
||
. Greek
|
||
. Hebrew
|
||
. Japanese (including Kanji, Katakana, and Hirigana subsets)
|
||
. Korean
|
||
. Latin1 (includes 7 and 8-bit Latin characters)
|
||
. Thai
|
||
. CJK (combination of Chinese, Japanese, and Korean sets)
|
||
|
||
For example, your code can define words using:
|
||
|
||
korean_word = Word(pyparsing_unicode.Korean.alphas)
|
||
|
||
See their use in the updated examples greetingInGreek.py and
|
||
greetingInKorean.py.
|
||
|
||
This namespace class also offers access to these sets using their
|
||
unicode identifiers.
|
||
|
||
- POSSIBLE API CHANGE: Fixed bug where a parse action that explicitly
|
||
returned the input ParseResults could add another nesting level in
|
||
the results if the current expression had a results name.
|
||
|
||
vals = pp.OneOrMore(pp.pyparsing_common.integer)("int_values")
|
||
|
||
def add_total(tokens):
|
||
tokens['total'] = sum(tokens)
|
||
return tokens # this line can be removed
|
||
|
||
vals.addParseAction(add_total)
|
||
print(vals.parseString("244 23 13 2343").dump())
|
||
|
||
Before the fix, this code would print (note the extra nesting level):
|
||
|
||
[244, 23, 13, 2343]
|
||
- int_values: [244, 23, 13, 2343]
|
||
- int_values: [244, 23, 13, 2343]
|
||
- total: 2623
|
||
- total: 2623
|
||
|
||
With the fix, this code now prints:
|
||
|
||
[244, 23, 13, 2343]
|
||
- int_values: [244, 23, 13, 2343]
|
||
- total: 2623
|
||
|
||
This fix will change the structure of ParseResults returned if a
|
||
program defines a parse action that returns the tokens that were
|
||
sent in. This is not necessary, and statements like "return tokens"
|
||
in the example above can be safely deleted prior to upgrading to
|
||
this release, in order to avoid the bug and get the new behavior.
|
||
|
||
Reported by seron in Issue #22, nice catch!
|
||
|
||
- POSSIBLE API CHANGE: Fixed a related bug where a results name
|
||
erroneously created a second level of hierarchy in the returned
|
||
ParseResults. The intent for accumulating results names into ParseResults
|
||
is that, in the absence of Group'ing, all names get merged into a
|
||
common namespace. This allows us to write:
|
||
|
||
key_value_expr = (Word(alphas)("key") + '=' + Word(nums)("value"))
|
||
result = key_value_expr.parseString("a = 100")
|
||
|
||
and have result structured as {"key": "a", "value": "100"}
|
||
instead of [{"key": "a"}, {"value": "100"}].
|
||
|
||
However, if a named expression is used in a higher-level non-Group
|
||
expression that *also* has a name, a false sub-level would be created
|
||
in the namespace:
|
||
|
||
num = pp.Word(pp.nums)
|
||
num_pair = ("[" + (num("A") + num("B"))("values") + "]")
|
||
U = num_pair.parseString("[ 10 20 ]")
|
||
print(U.dump())
|
||
|
||
Since there is no grouping, "A", "B", and "values" should all appear
|
||
at the same level in the results, as:
|
||
|
||
['[', '10', '20', ']']
|
||
- A: '10'
|
||
- B: '20'
|
||
- values: ['10', '20']
|
||
|
||
Instead, an extra level of "A" and "B" show up under "values":
|
||
|
||
['[', '10', '20', ']']
|
||
- A: '10'
|
||
- B: '20'
|
||
- values: ['10', '20']
|
||
- A: '10'
|
||
- B: '20'
|
||
|
||
This bug has been fixed. Now, if this hierarchy is desired, then a
|
||
Group should be added:
|
||
|
||
num_pair = ("[" + pp.Group(num("A") + num("B"))("values") + "]")
|
||
|
||
Giving:
|
||
|
||
['[', ['10', '20'], ']']
|
||
- values: ['10', '20']
|
||
- A: '10'
|
||
- B: '20'
|
||
|
||
But in no case should "A" and "B" appear in multiple levels. This bug-fix
|
||
fixes that.
|
||
|
||
If you have current code which relies on this behavior, then add or remove
|
||
Groups as necessary to get your intended results structure.
|
||
|
||
Reported by Athanasios Anastasiou.
|
||
|
||
- IndexError's raised in parse actions will get explicitly reraised
|
||
as ParseExceptions that wrap the original IndexError. Since
|
||
IndexError sometimes occurs as part of pyparsing's normal parsing
|
||
logic, IndexErrors that are raised during a parse action may have
|
||
gotten silently reinterpreted as parsing errors. To retain the
|
||
information from the IndexError, these exceptions will now be
|
||
raised as ParseExceptions that reference the original IndexError.
|
||
This wrapping will only be visible when run under Python3, since it
|
||
emulates "raise ... from ..." syntax.
|
||
|
||
Addresses Issue #4, reported by guswns0528.
|
||
|
||
- Added Char class to simplify defining expressions of a single
|
||
character. (Char("abc") is equivalent to Word("abc", exact=1))
|
||
|
||
- Added class PrecededBy to perform lookbehind tests. PrecededBy is
|
||
used in the same way as FollowedBy, passing in an expression that
|
||
must occur just prior to the current parse location.
|
||
|
||
For fixed-length expressions like a Literal, Keyword, Char, or a
|
||
Word with an `exact` or `maxLen` length given, `PrecededBy(expr)`
|
||
is sufficient. For varying length expressions like a Word with no
|
||
given maximum length, `PrecededBy` must be constructed with an
|
||
integer `retreat` argument, as in
|
||
`PrecededBy(Word(alphas, nums), retreat=10)`, to specify the maximum
|
||
number of characters pyparsing must look backward to make a match.
|
||
pyparsing will check all the values from 1 up to retreat characters
|
||
back from the current parse location.
|
||
|
||
When stepping backwards through the input string, PrecededBy does
|
||
*not* skip over whitespace.
|
||
|
||
PrecededBy can be created with a results name so that, even though
|
||
it always returns an empty parse result, the result *can* include
|
||
named results.
|
||
|
||
Idea first suggested in Issue #30 by Freakwill.
|
||
|
||
- Updated FollowedBy to accept expressions that contain named results,
|
||
so that results names defined in the lookahead expression will be
|
||
returned, even though FollowedBy always returns an empty list.
|
||
Inspired by the same feature implemented in PrecededBy.
|
||
|
||
|
||
Version 2.2.2 - September, 2018
|
||
-------------------------------
|
||
- Fixed bug in SkipTo, if a SkipTo expression that was skipping to
|
||
an expression that returned a list (such as an And), and the
|
||
SkipTo was saved as a named result, the named result could be
|
||
saved as a ParseResults - should always be saved as a string.
|
||
Issue #28, reported by seron.
|
||
|
||
- Added simple_unit_tests.py, as a collection of easy-to-follow unit
|
||
tests for various classes and features of the pyparsing library.
|
||
Primary intent is more to be instructional than actually rigorous
|
||
testing. Complex tests can still be added in the unitTests.py file.
|
||
|
||
- New features added to the Regex class:
|
||
- optional asGroupList parameter, returns all the capture groups as
|
||
a list
|
||
- optional asMatch parameter, returns the raw re.match result
|
||
- new sub(repl) method, which adds a parse action calling
|
||
re.sub(pattern, repl, parsed_result). Simplifies creating
|
||
Regex expressions to be used with transformString. Like re.sub,
|
||
repl may be an ordinary string (similar to using pyparsing's
|
||
replaceWith), or may contain references to capture groups by group
|
||
number, or may be a callable that takes an re match group and
|
||
returns a string.
|
||
|
||
For instance:
|
||
expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>")
|
||
expr.transformString("h1: This is the title")
|
||
|
||
will return
|
||
<h1>This is the title</h1>
|
||
|
||
- Fixed omission of LICENSE file in source tarball, also added
|
||
CODE_OF_CONDUCT.md per GitHub community standards.
|
||
|
||
|
||
Version 2.2.1 - September, 2018
|
||
-------------------------------
|
||
- Applied changes necessary to migrate hosting of pyparsing source
|
||
over to GitHub. Many thanks for help and contributions from hugovk,
|
||
jdufresne, and cngkaygusuz among others through this transition,
|
||
sorry it took me so long!
|
||
|
||
- Fixed import of collections.abc to address DeprecationWarnings
|
||
in Python 3.7.
|
||
|
||
- Updated oc.py example to support function calls in arithmetic
|
||
expressions; fixed regex for '==' operator; and added packrat
|
||
parsing. Raised on the pyparsing wiki by Boris Marin, thanks!
|
||
|
||
- Fixed bug in select_parser.py example, group_by_terms was not
|
||
reported. Reported on SF bugs by Adam Groszer, thanks Adam!
|
||
|
||
- Added "Getting Started" section to the module docstring, to
|
||
guide new users to the most common starting points in pyparsing's
|
||
API.
|
||
|
||
- Fixed bug in Literal and Keyword classes, which erroneously
|
||
raised IndexError instead of ParseException.
|
||
|
||
|
||
Version 2.2.0 - March, 2017
|
||
---------------------------
|
||
- Bumped minor version number to reflect compatibility issues with
|
||
OneOrMore and ZeroOrMore bugfixes in 2.1.10. (2.1.10 fixed a bug
|
||
that was introduced in 2.1.4, but the fix could break code
|
||
written against 2.1.4 - 2.1.9.)
|
||
|
||
- Updated setup.py to address recursive import problems now
|
||
that pyparsing is part of 'packaging' (used by setuptools).
|
||
Patch submitted by Joshua Root, much thanks!
|
||
|
||
- Fixed KeyError issue reported by Yann Bizeul when using packrat
|
||
parsing in the Graphite time series database, thanks Yann!
|
||
|
||
- Fixed incorrect usages of '\' in literals, as described in
|
||
https://docs.python.org/3/whatsnew/3.6.html#deprecated-python-behavior
|
||
Patch submitted by Ville Skyttä - thanks!
|
||
|
||
- Minor internal change when using '-' operator, to be compatible
|
||
with ParserElement.streamline() method.
|
||
|
||
- Expanded infixNotation to accept a list or tuple of parse actions
|
||
to attach to an operation.
|
||
|
||
- New unit test added for dill support for storing pyparsing parsers.
|
||
Ordinary Python pickle can be used to pickle pyparsing parsers as
|
||
long as they do not use any parse actions. The 'dill' module is an
|
||
extension to pickle which *does* support pickling of attached
|
||
parse actions.
|
||
|
||
|
||
Version 2.1.10 - October, 2016
|
||
-------------------------------
|
||
- Fixed bug in reporting named parse results for ZeroOrMore
|
||
expressions, thanks Ethan Nash for reporting this!
|
||
|
||
- Fixed behavior of LineStart to be much more predictable.
|
||
LineStart can now be used to detect if the next parse position
|
||
is col 1, factoring in potential leading whitespace (which would
|
||
cause LineStart to fail). Also fixed a bug in col, which is
|
||
used in LineStart, where '\n's were erroneously considered to
|
||
be column 1.
|
||
|
||
- Added support for multiline test strings in runTests.
|
||
|
||
- Fixed bug in ParseResults.dump when keys were not strings.
|
||
Also changed display of string values to show them in quotes,
|
||
to help distinguish parsed numeric strings from parsed integers
|
||
that have been converted to Python ints.
|
||
|
||
|
||
Version 2.1.9 - September, 2016
|
||
-------------------------------
|
||
- Added class CloseMatch, a variation on Literal which matches
|
||
"close" matches, that is, strings with at most 'n' mismatching
|
||
characters.
|
||
|
||
- Fixed bug in Keyword.setDefaultKeywordChars(), reported by Kobayashi
|
||
Shinji - nice catch, thanks!
|
||
|
||
- Minor API change in pyparsing_common. Renamed some of the common
|
||
expressions to PEP8 format (to be consistent with the other
|
||
pyparsing_common expressions):
|
||
. signedInteger -> signed_integer
|
||
. sciReal -> sci_real
|
||
|
||
Also, in trying to stem the API bloat of pyparsing, I've copied
|
||
some of the global expressions and helper parse actions into
|
||
pyparsing_common, with the originals to be deprecated and removed
|
||
in a future release:
|
||
. commaSeparatedList -> pyparsing_common.comma_separated_list
|
||
. upcaseTokens -> pyparsing_common.upcaseTokens
|
||
. downcaseTokens -> pyparsing_common.downcaseTokens
|
||
|
||
(I don't expect any other expressions, like the comment expressions,
|
||
quotedString, or the Word-helping strings like alphas, nums, etc.
|
||
to migrate to pyparsing_common - they are just too pervasive. As for
|
||
the PEP8 vs camelCase naming, all the expressions are PEP8, while
|
||
the parse actions in pyparsing_common are still camelCase. It's a
|
||
small step - when pyparsing 3.0 comes around, everything will change
|
||
to PEP8 snake case.)
|
||
|
||
- Fixed Python3 compatibility bug when using dict keys() and values()
|
||
in ParseResults.getName().
|
||
|
||
- After some prodding, I've reworked the unitTests.py file for
|
||
pyparsing over the past few releases. It uses some variations on
|
||
unittest to handle my testing style. The test now:
|
||
. auto-discovers its test classes (while maintining their order
|
||
of definition)
|
||
. suppresses voluminous 'print' output for tests that pass
|
||
|
||
|
||
Version 2.1.8 - August, 2016
|
||
----------------------------
|
||
- Fixed issue in the optimization to _trim_arity, when the full
|
||
stacktrace is retrieved to determine if a TypeError is raised in
|
||
pyparsing or in the caller's parse action. Code was traversing
|
||
the full stacktrace, and potentially encountering UnicodeDecodeError.
|
||
|
||
- Fixed bug in ParserElement.inlineLiteralsUsing, causing infinite
|
||
loop with Suppress.
|
||
|
||
- Fixed bug in Each, when merging named results from multiple
|
||
expressions in a ZeroOrMore or OneOrMore. Also fixed bug when
|
||
ZeroOrMore expressions were erroneously treated as required
|
||
expressions in an Each expression.
|
||
|
||
- Added a few more inline doc examples.
|
||
|
||
- Improved use of runTests in several example scripts.
|
||
|
||
|
||
Version 2.1.7 - August, 2016
|
||
----------------------------
|
||
- Fixed regression reported by Andrea Censi (surfaced in PyContracts
|
||
tests) when using ParseSyntaxExceptions (raised when using operator '-')
|
||
with packrat parsing.
|
||
|
||
- Minor fix to oneOf, to accept all iterables, not just space-delimited
|
||
strings and lists. (If you have a list or set of strings, it is
|
||
not necessary to concat them using ' '.join to pass them to oneOf,
|
||
oneOf will accept the list or set or generator directly.)
|
||
|
||
|
||
Version 2.1.6 - August, 2016
|
||
----------------------------
|
||
- *Major packrat upgrade*, inspired by patch provided by Tal Einat -
|
||
many, many, thanks to Tal for working on this! Tal's tests show
|
||
faster parsing performance (2X in some tests), *and* memory reduction
|
||
from 3GB down to ~100MB! Requires no changes to existing code using
|
||
packratting. (Uses OrderedDict, available in Python 2.7 and later.
|
||
For Python 2.6 users, will attempt to import from ordereddict
|
||
backport. If not present, will implement pure-Python Fifo dict.)
|
||
|
||
- Minor API change - to better distinguish between the flexible
|
||
numeric types defined in pyparsing_common, I've changed "numeric"
|
||
(which parsed numbers of different types and returned int for ints,
|
||
float for floats, etc.) and "number" (which parsed numbers of int
|
||
or float type, and returned all floats) to "number" and "fnumber"
|
||
respectively. I hope the "f" prefix of "fnumber" will be a better
|
||
indicator of its internal conversion of parsed values to floats,
|
||
while the generic "number" is similar to the flexible number syntax
|
||
in other languages. Also fixed a bug in pyparsing_common.numeric
|
||
(now renamed to pyparsing_common.number), integers were parsed and
|
||
returned as floats instead of being retained as ints.
|
||
|
||
- Fixed bug in upcaseTokens and downcaseTokens introduced in 2.1.5,
|
||
when the parse action was used in conjunction with results names.
|
||
Reported by Steven Arcangeli from the dql project, thanks for your
|
||
patience, Steven!
|
||
|
||
- Major change to docs! After seeing some comments on reddit about
|
||
general issue with docs of Python modules, and thinking that I'm a
|
||
little overdue in doing some doc tuneup on pyparsing, I decided to
|
||
following the suggestions of the redditor and add more inline examples
|
||
to the pyparsing reference documentation. I hope this addition
|
||
will clarify some of the more common questions people have, especially
|
||
when first starting with pyparsing/Python.
|
||
|
||
- Deprecated ParseResults.asXML. I've never been too happy with this
|
||
method, and it usually forces some unnatural code in the parsers in
|
||
order to get decent tag names. The amount of guesswork that asXML
|
||
has to do to try to match names with values should have been a red
|
||
flag from day one. If you are using asXML, you will need to implement
|
||
your own ParseResults->XML serialization. Or consider migrating to
|
||
a more current format such as JSON (which is very easy to do:
|
||
results_as_json = json.dumps(parse_result.asDict()) Hopefully, when
|
||
I remove this code in a future version, I'll also be able to simplify
|
||
some of the craziness in ParseResults, which IIRC was only there to try
|
||
to make asXML work.
|
||
|
||
- Updated traceParseAction parse action decorator to show the repr
|
||
of the input and output tokens, instead of the str format, since
|
||
str has been simplified to just show the token list content.
|
||
|
||
(The change to ParseResults.__str__ occurred in pyparsing 2.0.4, but
|
||
it seems that didn't make it into the release notes - sorry! Too
|
||
many users, especially beginners, were confused by the
|
||
"([token_list], {names_dict})" str format for ParseResults, thinking
|
||
they were getting a tuple containing a list and a dict. The full form
|
||
can be seen if using repr().)
|
||
|
||
For tracing tokens in and out of parse actions, the more complete
|
||
repr form provides important information when debugging parse actions.
|
||
|
||
|
||
Version 2.1.5 - June, 2016
|
||
------------------------------
|
||
- Added ParserElement.split() generator method, similar to re.split().
|
||
Includes optional arguments maxsplit (to limit the number of splits),
|
||
and includeSeparators (to include the separating matched text in the
|
||
returned output, default=False).
|
||
|
||
- Added a new parse action construction helper tokenMap, which will
|
||
apply a function and optional arguments to each element in a
|
||
ParseResults. So this parse action:
|
||
|
||
def lowercase_all(tokens):
|
||
return [str(t).lower() for t in tokens]
|
||
OneOrMore(Word(alphas)).setParseAction(lowercase_all)
|
||
|
||
can now be written:
|
||
|
||
OneOrMore(Word(alphas)).setParseAction(tokenMap(str.lower))
|
||
|
||
Also simplifies writing conversion parse actions like:
|
||
|
||
integer = Word(nums).setParseAction(lambda t: int(t[0]))
|
||
|
||
to just:
|
||
|
||
integer = Word(nums).setParseAction(tokenMap(int))
|
||
|
||
If additional arguments are necessary, they can be included in the
|
||
call to tokenMap, as in:
|
||
|
||
hex_integer = Word(hexnums).setParseAction(tokenMap(int, 16))
|
||
|
||
- Added more expressions to pyparsing_common:
|
||
. IPv4 and IPv6 addresses (including long, short, and mixed forms
|
||
of IPv6)
|
||
. MAC address
|
||
. ISO8601 date and date time strings (with named fields for year, month, etc.)
|
||
. UUID (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
|
||
. hex integer (returned as int)
|
||
. fraction (integer '/' integer, returned as float)
|
||
. mixed integer (integer '-' fraction, or just fraction, returned as float)
|
||
. stripHTMLTags (parse action to remove tags from HTML source)
|
||
. parse action helpers convertToDate and convertToDatetime to do custom parse
|
||
time conversions of parsed ISO8601 strings
|
||
|
||
- runTests now returns a two-tuple: success if all tests succeed,
|
||
and an output list of each test and its output lines.
|
||
|
||
- Added failureTests argument (default=False) to runTests, so that
|
||
tests can be run that are expected failures, and runTests' success
|
||
value will return True only if all tests *fail* as expected. Also,
|
||
parseAll now defaults to True.
|
||
|
||
- New example numerics.py, shows samples of parsing integer and real
|
||
numbers using locale-dependent formats:
|
||
|
||
4.294.967.295,000
|
||
4 294 967 295,000
|
||
4,294,967,295.000
|
||
|
||
|
||
Version 2.1.4 - May, 2016
|
||
------------------------------
|
||
- Split out the '==' behavior in ParserElement, now implemented
|
||
as the ParserElement.matches() method. Using '==' for string test
|
||
purposes will be removed in a future release.
|
||
|
||
- Expanded capabilities of runTests(). Will now accept embedded
|
||
comments (default is Python style, leading '#' character, but
|
||
customizable). Comments will be emitted along with the tests and
|
||
test output. Useful during test development, to create a test string
|
||
consisting only of test case description comments separated by
|
||
blank lines, and then fill in the test cases. Will also highlight
|
||
ParseFatalExceptions with "(FATAL)".
|
||
|
||
- Added a 'pyparsing_common' class containing common/helpful little
|
||
expressions such as integer, float, identifier, etc. I used this
|
||
class as a sort of embedded namespace, to contain these helpers
|
||
without further adding to pyparsing's namespace bloat.
|
||
|
||
- Minor enhancement to traceParseAction decorator, to retain the
|
||
parse action's name for the trace output.
|
||
|
||
- Added optional 'fatal' keyword arg to addCondition, to indicate that
|
||
a condition failure should halt parsing immediately.
|
||
|
||
|
||
Version 2.1.3 - May, 2016
|
||
------------------------------
|
||
- _trim_arity fix in 2.1.2 was very version-dependent on Py 3.5.0.
|
||
Now works for Python 2.x, 3.3, 3.4, 3.5.0, and 3.5.1 (and hopefully
|
||
beyond).
|
||
|
||
|
||
Version 2.1.2 - May, 2016
|
||
------------------------------
|
||
- Fixed bug in _trim_arity when pyparsing code is included in a
|
||
PyInstaller, reported by maluwa.
|
||
|
||
- Fixed catastrophic regex backtracking in implementation of the
|
||
quoted string expressions (dblQuotedString, sglQuotedString, and
|
||
quotedString). Reported on the pyparsing wiki by webpentest,
|
||
good catch! (Also tuned up some other expressions susceptible to the
|
||
same backtracking problem, such as cStyleComment, cppStyleComment,
|
||
etc.)
|
||
|
||
|
||
Version 2.1.1 - March, 2016
|
||
---------------------------
|
||
- Added support for assigning to ParseResults using slices.
|
||
|
||
- Fixed bug in ParseResults.toDict(), in which dict values were always
|
||
converted to dicts, even if they were just unkeyed lists of tokens.
|
||
Reported on SO by Gerald Thibault, thanks Gerald!
|
||
|
||
- Fixed bug in SkipTo when using failOn, reported by robyschek, thanks!
|
||
|
||
- Fixed bug in Each introduced in 2.1.0, reported by AND patch and
|
||
unit test submitted by robyschek, well done!
|
||
|
||
- Removed use of functools.partial in replaceWith, as this creates
|
||
an ambiguous signature for the generated parse action, which fails in
|
||
PyPy. Reported by Evan Hubinger, thanks Evan!
|
||
|
||
- Added default behavior to QuotedString to convert embedded '\t', '\n',
|
||
etc. characters to their whitespace counterparts. Found during Q&A
|
||
exchange on SO with Maxim.
|
||
|
||
|
||
Version 2.1.0 - February, 2016
|
||
------------------------------
|
||
- Modified the internal _trim_arity method to distinguish between
|
||
TypeError's raised while trying to determine parse action arity and
|
||
those raised within the parse action itself. This will clear up those
|
||
confusing "<lambda>() takes exactly 1 argument (0 given)" error
|
||
messages when there is an actual TypeError in the body of the parse
|
||
action. Thanks to all who have raised this issue in the past, and
|
||
most recently to Michael Cohen, who sent in a proposed patch, and got
|
||
me to finally tackle this problem.
|
||
|
||
- Added compatibility for pickle protocols 2-4 when pickling ParseResults.
|
||
In Python 2.x, protocol 0 was the default, and protocol 2 did not work.
|
||
In Python 3.x, protocol 3 is the default, so explicitly naming
|
||
protocol 0 or 1 was required to pickle ParseResults. With this release,
|
||
all protocols 0-4 are supported. Thanks for reporting this on StackOverflow,
|
||
Arne Wolframm, and for providing a nice simple test case!
|
||
|
||
- Added optional 'stopOn' argument to ZeroOrMore and OneOrMore, to
|
||
simplify breaking on stop tokens that would match the repetition
|
||
expression.
|
||
|
||
It is a common problem to fail to look ahead when matching repetitive
|
||
tokens if the sentinel at the end also matches the repetition
|
||
expression, as when parsing "BEGIN aaa bbb ccc END" with:
|
||
|
||
"BEGIN" + OneOrMore(Word(alphas)) + "END"
|
||
|
||
Since "END" matches the repetition expression "Word(alphas)", it will
|
||
never get parsed as the terminating sentinel. Up until now, this has
|
||
to be resolved by the user inserting their own negative lookahead:
|
||
|
||
"BEGIN" + OneOrMore(~Literal("END") + Word(alphas)) + "END"
|
||
|
||
Using stopOn, they can more easily write:
|
||
|
||
"BEGIN" + OneOrMore(Word(alphas), stopOn="END") + "END"
|
||
|
||
The stopOn argument can be a literal string or a pyparsing expression.
|
||
Inspired by a question by Lamakaha on StackOverflow (and many previous
|
||
questions with the same negative-lookahead resolution).
|
||
|
||
- Added expression names for many internal and builtin expressions, to
|
||
reduce name and error message overhead during parsing.
|
||
|
||
- Converted helper lambdas to functions to refactor and add docstring
|
||
support.
|
||
|
||
- Fixed ParseResults.asDict() to correctly convert nested ParseResults
|
||
values to dicts.
|
||
|
||
- Cleaned up some examples, fixed typo in fourFn.py identified by
|
||
aristotle2600 on reddit.
|
||
|
||
- Removed keepOriginalText helper method, which was deprecated ages ago.
|
||
Superceded by originalTextFor.
|
||
|
||
- Same for the Upcase class, which was long ago deprecated and replaced
|
||
with the upcaseTokens method.
|
||
|
||
|
||
|
||
Version 2.0.7 - December, 2015
|
||
------------------------------
|
||
- Simplified string representation of Forward class, to avoid memory
|
||
and performance errors while building ParseException messages. Thanks,
|
||
Will McGugan, Andrea Censi, and Martijn Vermaat for the bug reports and
|
||
test code.
|
||
|
||
- Cleaned up additional issues from enhancing the error messages for
|
||
Or and MatchFirst, handling Unicode values in expressions. Fixes Unicode
|
||
encoding issues in Python 2, thanks to Evan Hubinger for the bug report.
|
||
|
||
- Fixed implementation of dir() for ParseResults - was leaving out all the
|
||
defined methods and just adding the custom results names.
|
||
|
||
- Fixed bug in ignore() that was introduced in pyparsing 1.5.3, that would
|
||
not accept a string literal as the ignore expression.
|
||
|
||
- Added new example parseTabularData.py to illustrate parsing of data
|
||
formatted in columns, with detection of empty cells.
|
||
|
||
- Updated a number of examples to more current Python and pyparsing
|
||
forms.
|
||
|
||
|
||
Version 2.0.6 - November, 2015
|
||
------------------------------
|
||
- Fixed a bug in Each when multiple Optional elements are present.
|
||
Thanks for reporting this, whereswalden on SO.
|
||
|
||
- Fixed another bug in Each, when Optional elements have results names
|
||
or parse actions, reported by Max Rothman - thank you, Max!
|
||
|
||
- Added optional parseAll argument to runTests, whether tests should
|
||
require the entire input string to be parsed or not (similar to
|
||
parseAll argument to parseString). Plus a little neaten-up of the
|
||
output on Python 2 (no stray ()'s).
|
||
|
||
- Modified exception messages from MatchFirst and Or expressions. These
|
||
were formerly misleading as they would only give the first or longest
|
||
exception mismatch error message. Now the error message includes all
|
||
the alternatives that were possible matches. Originally proposed by
|
||
a pyparsing user, but I've lost the email thread - finally figured out
|
||
a fairly clean way to do this.
|
||
|
||
- Fixed a bug in Or, when a parse action on an alternative raises an
|
||
exception, other potentially matching alternatives were not always tried.
|
||
Reported by TheVeryOmni on the pyparsing wiki, thanks!
|
||
|
||
- Fixed a bug to dump() introduced in 2.0.4, where list values were shown
|
||
in duplicate.
|
||
|
||
|
||
Version 2.0.5 - October, 2015
|
||
-----------------------------
|
||
- (&$(@#&$(@!!!! Some "print" statements snuck into pyparsing v2.0.4,
|
||
breaking Python 3 compatibility! Fixed. Reported by jenshn, thanks!
|
||
|
||
|
||
Version 2.0.4 - October, 2015
|
||
-----------------------------
|
||
- Added ParserElement.addCondition, to simplify adding parse actions
|
||
that act primarily as filters. If the given condition evaluates False,
|
||
pyparsing will raise a ParseException. The condition should be a method
|
||
with the same method signature as a parse action, but should return a
|
||
boolean. Suggested by Victor Porton, nice idea Victor, thanks!
|
||
|
||
- Slight mod to srange to accept unicode literals for the input string,
|
||
such as "[а-яА-Я]" instead of "[\u0430-\u044f\u0410-\u042f]". Thanks
|
||
to Alexandr Suchkov for the patch!
|
||
|
||
- Enhanced implementation of replaceWith.
|
||
|
||
- Fixed enhanced ParseResults.dump() method when the results consists
|
||
only of an unnamed array of sub-structure results. Reported by Robin
|
||
Siebler, thanks for your patience and persistence, Robin!
|
||
|
||
- Fixed bug in fourFn.py example code, where pi and e were defined using
|
||
CaselessLiteral instead of CaselessKeyword. This was not a problem until
|
||
adding a new function 'exp', and the leading 'e' of 'exp' was accidentally
|
||
parsed as the mathematical constant 'e'. Nice catch, Tom Grydeland - thanks!
|
||
|
||
- Adopt new-fangled Python features, like decorators and ternary expressions,
|
||
per suggestions from Williamzjc - thanks William! (Oh yeah, I'm not
|
||
supporting Python 2.3 with this code any more...) Plus, some additional
|
||
code fixes/cleanup - thanks again!
|
||
|
||
- Added ParserElement.runTests, a little test bench for quickly running
|
||
an expression against a list of sample input strings. Basically, I got
|
||
tired of writing the same test code over and over, and finally added it
|
||
as a test point method on ParserElement.
|
||
|
||
- Added withClass helper method, a simplified version of withAttribute for
|
||
the common but annoying case when defining a filter on a div's class -
|
||
made difficult because 'class' is a Python reserved word.
|
||
|
||
|
||
Version 2.0.3 - October, 2014
|
||
-----------------------------
|
||
- Fixed escaping behavior in QuotedString. Formerly, only quotation
|
||
marks (or characters designated as quotation marks in the QuotedString
|
||
constructor) would be escaped. Now all escaped characters will be
|
||
escaped, and the escaping backslashes will be removed.
|
||
|
||
- Fixed regression in ParseResults.pop() - pop() was pretty much
|
||
broken after I added *improvements* in 2.0.2. Reported by Iain
|
||
Shelvington, thanks Iain!
|
||
|
||
- Fixed bug in And class when initializing using a generator.
|
||
|
||
- Enhanced ParseResults.dump() method to list out nested ParseResults that
|
||
are unnamed arrays of sub-structures.
|
||
|
||
- Fixed UnboundLocalError under Python 3.4 in oneOf method, reported
|
||
on Sourceforge by aldanor, thanks!
|
||
|
||
- Fixed bug in ParseResults __init__ method, when returning non-ParseResults
|
||
types from parse actions that implement __eq__. Raised during discussion
|
||
on the pyparsing wiki with cyrfer.
|
||
|
||
|
||
Version 2.0.2 - April, 2014
|
||
---------------------------
|
||
- Extended "expr(name)" shortcut (same as "expr.setResultsName(name)")
|
||
to accept "expr()" as a shortcut for "expr.copy()".
|
||
|
||
- Added "locatedExpr(expr)" helper, to decorate any returned tokens
|
||
with their location within the input string. Adds the results names
|
||
locn_start and locn_end to the output parse results.
|
||
|
||
- Added "pprint()" method to ParseResults, to simplify troubleshooting
|
||
and prettified output. Now instead of importing the pprint module
|
||
and then writing "pprint.pprint(result)", you can just write
|
||
"result.pprint()". This method also accepts additional positional and
|
||
keyword arguments (such as indent, width, etc.), which get passed
|
||
through directly to the pprint method
|
||
(see https://docs.python.org/2/library/pprint.html#pprint.pprint).
|
||
|
||
- Removed deprecation warnings when using '<<' for Forward expression
|
||
assignment. '<<=' is still preferred, but '<<' will be retained
|
||
for cases where '<<=' operator is not suitable (such as in defining
|
||
lambda expressions).
|
||
|
||
- Expanded argument compatibility for classes and functions that
|
||
take list arguments, to now accept generators as well.
|
||
|
||
- Extended list-like behavior of ParseResults, adding support for
|
||
append and extend. NOTE: if you have existing applications using
|
||
these names as results names, you will have to access them using
|
||
dict-style syntax: res["append"] and res["extend"]
|
||
|
||
- ParseResults emulates the change in list vs. iterator semantics for
|
||
methods like keys(), values(), and items(). Under Python 2.x, these
|
||
methods will return lists, under Python 3.x, these methods will
|
||
return iterators.
|
||
|
||
- ParseResults now has a method haskeys() which returns True or False
|
||
depending on whether any results names have been defined. This simplifies
|
||
testing for the existence of results names under Python 3.x, which
|
||
returns keys() as an iterator, not a list.
|
||
|
||
- ParseResults now supports both list and dict semantics for pop().
|
||
If passed no argument or an integer argument, it will use list semantics
|
||
and pop tokens from the list of parsed tokens. If passed a non-integer
|
||
argument (most likely a string), it will use dict semantics and
|
||
pop the corresponding value from any defined results names. A
|
||
second default return value argument is supported, just as in
|
||
dict.pop().
|
||
|
||
- Fixed bug in markInputline, thanks for reporting this, Matt Grant!
|
||
|
||
- Cleaned up my unit test environment, now runs with Python 2.6 and
|
||
3.3.
|
||
|
||
|
||
Version 2.0.1 - July, 2013
|
||
--------------------------
|
||
- Removed use of "nonlocal" that prevented using this version of
|
||
pyparsing with Python 2.6 and 2.7. This will make it easier to
|
||
install for packages that depend on pyparsing, under Python
|
||
versions 2.6 and later. Those using older versions of Python
|
||
will have to manually install pyparsing 1.5.7.
|
||
|
||
- Fixed implementation of <<= operator to return self; reported by
|
||
Luc J. Bourhis, with patch fix by Mathias Mamsch - thanks, Luc
|
||
and Mathias!
|
||
|
||
|
||
Version 2.0.0 - November, 2012
|
||
------------------------------
|
||
- Rather than release another combined Python 2.x/3.x release
|
||
I've decided to start a new major version that is only
|
||
compatible with Python 3.x (and consequently Python 2.7 as
|
||
well due to backporting of key features). This version will
|
||
be the main development path from now on, with little follow-on
|
||
development on the 1.5.x path.
|
||
|
||
- Operator '<<' is now deprecated, in favor of operator '<<=' for
|
||
attaching parsing expressions to Forward() expressions. This is
|
||
being done to address precedence of operations problems with '<<'.
|
||
Operator '<<' will be removed in a future version of pyparsing.
|
||
|
||
|
||
Version 1.5.7 - November, 2012
|
||
-----------------------------
|
||
- NOTE: This is the last release of pyparsing that will try to
|
||
maintain compatibility with Python versions < 2.6. The next
|
||
release of pyparsing will be version 2.0.0, using new Python
|
||
syntax that will not be compatible for Python version 2.5 or
|
||
older.
|
||
|
||
- An awesome new example is included in this release, submitted
|
||
by Luca DellOlio, for parsing ANTLR grammar definitions, nice
|
||
work Luca!
|
||
|
||
- Fixed implementation of ParseResults.__str__ to use Pythonic
|
||
''.join() instead of repeated string concatenation. This
|
||
purportedly has been a performance issue under PyPy.
|
||
|
||
- Fixed bug in ParseResults.__dir__ under Python 3, reported by
|
||
Thomas Kluyver, thank you Thomas!
|
||
|
||
- Added ParserElement.inlineLiteralsUsing static method, to
|
||
override pyparsing's default behavior of converting string
|
||
literals to Literal instances, to use other classes (such
|
||
as Suppress or CaselessLiteral).
|
||
|
||
- Added new operator '<<=', which will eventually replace '<<' for
|
||
storing the contents of a Forward(). '<<=' does not have the same
|
||
operator precedence problems that '<<' does.
|
||
|
||
- 'operatorPrecedence' is being renamed 'infixNotation' as a better
|
||
description of what this helper function creates. 'operatorPrecedence'
|
||
is deprecated, and will be dropped entirely in a future release.
|
||
|
||
- Added optional arguments lpar and rpar to operatorPrecedence, so that
|
||
expressions that use it can override the default suppression of the
|
||
grouping characters.
|
||
|
||
- Added support for using single argument builtin functions as parse
|
||
actions. Now you can write 'expr.setParseAction(len)' and get back
|
||
the length of the list of matched tokens. Supported builtins are:
|
||
sum, len, sorted, reversed, list, tuple, set, any, all, min, and max.
|
||
A script demonstrating this feature is included in the examples
|
||
directory.
|
||
|
||
- Improved linking in generated docs, proposed on the pyparsing wiki
|
||
by techtonik, thanks!
|
||
|
||
- Fixed a bug in the definition of 'alphas', which was based on the
|
||
string.uppercase and string.lowercase "constants", which in fact
|
||
*aren't* constant, but vary with locale settings. This could make
|
||
parsers locale-sensitive in a subtle way. Thanks to Kef Schecter for
|
||
his diligence in following through on reporting and monitoring
|
||
this bugfix!
|
||
|
||
- Fixed a bug in the Py3 version of pyparsing, during exception
|
||
handling with packrat parsing enabled, reported by Catherine
|
||
Devlin - thanks Catherine!
|
||
|
||
- Fixed typo in ParseBaseException.__dir__, reported anonymously on
|
||
the SourceForge bug tracker, thank you Pyparsing User With No Name.
|
||
|
||
- Fixed bug in srange when using '\x###' hex character codes.
|
||
|
||
- Added optional 'intExpr' argument to countedArray, so that you
|
||
can define your own expression that will evaluate to an integer,
|
||
to be used as the count for the following elements. Allows you
|
||
to define a countedArray with the count given in hex, for example,
|
||
by defining intExpr as "Word(hexnums).setParseAction(int(t[0],16))".
|
||
|
||
|
||
Version 1.5.6 - June, 2011
|
||
----------------------------
|
||
- Cleanup of parse action normalizing code, to be more version-tolerant,
|
||
and robust in the face of future Python versions - much thanks to
|
||
Raymond Hettinger for this rewrite!
|
||
|
||
- Removal of exception cacheing, addressing a memory leak condition
|
||
in Python 3. Thanks to Michael Droettboom and the Cape Town PUG for
|
||
their analysis and work on this problem!
|
||
|
||
- Fixed bug when using packrat parsing, where a previously parsed
|
||
expression would duplicate subsequent tokens - reported by Frankie
|
||
Ribery on stackoverflow, thanks!
|
||
|
||
- Added 'ungroup' helper method, to address token grouping done
|
||
implicitly by And expressions, even if only one expression in the
|
||
And actually returns any text - also inspired by stackoverflow
|
||
discussion with Frankie Ribery!
|
||
|
||
- Fixed bug in srange, which accepted escaped hex characters of the
|
||
form '\0x##', but should be '\x##'. Both forms will be supported
|
||
for backwards compatibility.
|
||
|
||
- Enhancement to countedArray, accepting an optional expression to be
|
||
used for matching the leading integer count - proposed by Mathias on
|
||
the pyparsing mailing list, good idea!
|
||
|
||
- Added the Verilog parser to the provided set of examples, under the
|
||
MIT license. While this frees up this parser for any use, if you find
|
||
yourself using it in a commercial purpose, please consider making a
|
||
charitable donation as described in the parser's header.
|
||
|
||
- Added the excludeChars argument to the Word class, to simplify defining
|
||
a word composed of all characters in a large range except for one or
|
||
two. Suggested by JesterEE on the pyparsing wiki.
|
||
|
||
- Added optional overlap parameter to scanString, to return overlapping
|
||
matches found in the source text.
|
||
|
||
- Updated oneOf internal regular expression generation, with improved
|
||
parse time performance.
|
||
|
||
- Slight performance improvement in transformString, removing empty
|
||
strings from the list of string fragments built while scanning the
|
||
source text, before calling ''.join. Especially useful when using
|
||
transformString to strip out selected text.
|
||
|
||
- Enhanced form of using the "expr('name')" style of results naming,
|
||
in lieu of calling setResultsName. If name ends with an '*', then
|
||
this is equivalent to expr.setResultsName('name',listAllMatches=True).
|
||
|
||
- Fixed up internal list flattener to use iteration instead of recursion,
|
||
to avoid stack overflow when transforming large files.
|
||
|
||
- Added other new examples:
|
||
. protobuf parser - parses Google's protobuf language
|
||
. btpyparse - a BibTex parser contributed by Matthew Brett,
|
||
with test suite test_bibparse.py (thanks, Matthew!)
|
||
. groupUsingListAllMatches.py - demo using trailing '*' for results
|
||
names
|
||
|
||
|
||
Version 1.5.5 - August, 2010
|
||
----------------------------
|
||
|
||
- Typo in Python3 version of pyparsing, "builtin" should be "builtins".
|
||
(sigh)
|
||
|
||
|
||
Version 1.5.4 - August, 2010
|
||
----------------------------
|
||
|
||
- Fixed __builtins__ and file references in Python 3 code, thanks to
|
||
Greg Watson, saulspatz, sminos, and Mark Summerfield for reporting
|
||
their Python 3 experiences.
|
||
|
||
- Added new example, apicheck.py, as a sample of scanning a Tcl-like
|
||
language for functions with incorrect number of arguments (difficult
|
||
to track down in Tcl languages). This example uses some interesting
|
||
methods for capturing exceptions while scanning through source
|
||
code.
|
||
|
||
- Added new example deltaTime.py, that takes everyday time references
|
||
like "an hour from now", "2 days ago", "next Sunday at 2pm".
|
||
|
||
|
||
Version 1.5.3 - June, 2010
|
||
--------------------------
|
||
|
||
- ======= NOTE: API CHANGE!!!!!!! ===============
|
||
With this release, and henceforward, the pyparsing module is
|
||
imported as "pyparsing" on both Python 2.x and Python 3.x versions.
|
||
|
||
- Fixed up setup.py to auto-detect Python version and install the
|
||
correct version of pyparsing - suggested by Alex Martelli,
|
||
thanks, Alex! (and my apologies to all those who struggled with
|
||
those spurious installation errors caused by my earlier
|
||
fumblings!)
|
||
|
||
- Fixed bug on Python3 when using parseFile, getting bytes instead of
|
||
a str from the input file.
|
||
|
||
- Fixed subtle bug in originalTextFor, if followed by
|
||
significant whitespace (like a newline) - discovered by
|
||
Francis Vidal, thanks!
|
||
|
||
- Fixed very sneaky bug in Each, in which Optional elements were
|
||
not completely recognized as optional - found by Tal Weiss, thanks
|
||
for your patience.
|
||
|
||
- Fixed off-by-1 bug in line() method when the first line of the
|
||
input text was an empty line. Thanks to John Krukoff for submitting
|
||
a patch!
|
||
|
||
- Fixed bug in transformString if grammar contains Group expressions,
|
||
thanks to patch submitted by barnabas79, nice work!
|
||
|
||
- Fixed bug in originalTextFor in which trailing comments or otherwised
|
||
ignored text got slurped in with the matched expression. Thanks to
|
||
michael_ramirez44 on the pyparsing wiki for reporting this just in
|
||
time to get into this release!
|
||
|
||
- Added better support for summing ParseResults, see the new example,
|
||
parseResultsSumExample.py.
|
||
|
||
- Added support for composing a Regex using a compiled RE object;
|
||
thanks to my new colleague, Mike Thornton!
|
||
|
||
- In version 1.5.2, I changed the way exceptions are raised in order
|
||
to simplify the stacktraces reported during parsing. An anonymous
|
||
user posted a bug report on SF that this behavior makes it difficult
|
||
to debug some complex parsers, or parsers nested within parsers. In
|
||
this release I've added a class attribute ParserElement.verbose_stacktrace,
|
||
with a default value of False. If you set this to True, pyparsing will
|
||
report stacktraces using the pre-1.5.2 behavior.
|
||
|
||
- New examples:
|
||
|
||
. pymicko.py, a MicroC compiler submitted by Zarko Zivanov.
|
||
(Note: this example is separately licensed under the GPLv3,
|
||
and requires Python 2.6 or higher.) Thank you, Zarko!
|
||
|
||
. oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C
|
||
Contest.
|
||
|
||
. stateMachine2.py, a modified version of stateMachine.py submitted
|
||
by Matt Anderson, that is compatible with Python versions 2.7 and
|
||
above - thanks so much, Matt!
|
||
|
||
. select_parser.py, a parser for reading SQLite SELECT statements,
|
||
as specified at https://www.sqlite.org/lang_select.html this goes
|
||
into much more detail than the simple SQL parser included in pyparsing's
|
||
source code
|
||
|
||
. excelExpr.py, a *simplistic* first-cut at a parser for Excel
|
||
expressions, which I originally posted on comp.lang.python in January,
|
||
2010; beware, this parser omits many common Excel cases (addition of
|
||
numbers represented as strings, references to named ranges)
|
||
|
||
. cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on
|
||
comp.lang.python in August, 2009 (redistributed here with Mark's
|
||
permission). Thanks a bunch, Mark!
|
||
|
||
. partial_gene_match.py, a sample I posted to Stackoverflow.com,
|
||
implementing a special variation on Literal that does "close" matching,
|
||
up to a given number of allowed mismatches. The application was to
|
||
find matching gene sequences, with allowance for one or two mismatches.
|
||
|
||
. tagCapture.py, a sample showing how to use a Forward placeholder to
|
||
enforce matching of text parsed in a previous expression.
|
||
|
||
. matchPreviousDemo.py, simple demo showing how the matchPreviousLiteral
|
||
helper method is used to match a previously parsed token.
|
||
|
||
|
||
Version 1.5.2 - April, 2009
|
||
------------------------------
|
||
- Added pyparsing_py3.py module, so that Python 3 users can use
|
||
pyparsing by changing their pyparsing import statement to:
|
||
|
||
import pyparsing_py3
|
||
|
||
Thanks for help from Patrick Laban and his friend Geremy
|
||
Condra on the pyparsing wiki.
|
||
|
||
- Removed __slots__ declaration on ParseBaseException, for
|
||
compatibility with IronPython 2.0.1. Raised by David
|
||
Lawler on the pyparsing wiki, thanks David!
|
||
|
||
- Fixed bug in SkipTo/failOn handling - caught by eagle eye
|
||
cpennington on the pyparsing wiki!
|
||
|
||
- Fixed second bug in SkipTo when using the ignore constructor
|
||
argument, reported by Catherine Devlin, thanks!
|
||
|
||
- Fixed obscure bug reported by Eike Welk when using a class
|
||
as a ParseAction with an errant __getitem__ method.
|
||
|
||
- Simplified exception stack traces when reporting parse
|
||
exceptions back to caller of parseString or parseFile - thanks
|
||
to a tip from Peter Otten on comp.lang.python.
|
||
|
||
- Changed behavior of scanString to avoid infinitely looping on
|
||
expressions that match zero-length strings. Prompted by a
|
||
question posted by ellisonbg on the wiki.
|
||
|
||
- Enhanced classes that take a list of expressions (And, Or,
|
||
MatchFirst, and Each) to accept generator expressions also.
|
||
This can be useful when generating lists of alternative
|
||
expressions, as in this case, where the user wanted to match
|
||
any repetitions of '+', '*', '#', or '.', but not mixtures
|
||
of them (that is, match '+++', but not '+-+'):
|
||
|
||
codes = "+*#."
|
||
format = MatchFirst(Word(c) for c in codes)
|
||
|
||
Based on a problem posed by Denis Spir on the Python tutor
|
||
list.
|
||
|
||
- Added new example eval_arith.py, which extends the example
|
||
simpleArith.py to actually evaluate the parsed expressions.
|
||
|
||
|
||
Version 1.5.1 - October, 2008
|
||
-------------------------------
|
||
- Added new helper method originalTextFor, to replace the use of
|
||
the current keepOriginalText parse action. Now instead of
|
||
using the parse action, as in:
|
||
|
||
fullName = Word(alphas) + Word(alphas)
|
||
fullName.setParseAction(keepOriginalText)
|
||
|
||
(in this example, we used keepOriginalText to restore any white
|
||
space that may have been skipped between the first and last
|
||
names)
|
||
You can now write:
|
||
|
||
fullName = originalTextFor(Word(alphas) + Word(alphas))
|
||
|
||
The implementation of originalTextFor is simpler and faster than
|
||
keepOriginalText, and does not depend on using the inspect or
|
||
imp modules.
|
||
|
||
- Added optional parseAll argument to parseFile, to be consistent
|
||
with parseAll argument to parseString. Posted by pboucher on the
|
||
pyparsing wiki, thanks!
|
||
|
||
- Added failOn argument to SkipTo, so that grammars can define
|
||
literal strings or pyparsing expressions which, if found in the
|
||
skipped text, will cause SkipTo to fail. Useful to prevent
|
||
SkipTo from reading past terminating expression. Instigated by
|
||
question posed by Aki Niimura on the pyparsing wiki.
|
||
|
||
- Fixed bug in nestedExpr if multi-character expressions are given
|
||
for nesting delimiters. Patch provided by new pyparsing user,
|
||
Hans-Martin Gaudecker - thanks, H-M!
|
||
|
||
- Removed dependency on xml.sax.saxutils.escape, and included
|
||
internal implementation instead - proposed by Mike Droettboom on
|
||
the pyparsing mailing list, thanks Mike! Also fixed erroneous
|
||
mapping in replaceHTMLEntity of " to ', now correctly maps
|
||
to ". (Also added support for mapping ' to '.)
|
||
|
||
- Fixed typo in ParseResults.insert, found by Alejandro Dubrovsky,
|
||
good catch!
|
||
|
||
- Added __dir__() methods to ParseBaseException and ParseResults,
|
||
to support new dir() behavior in Py2.6 and Py3.0. If dir() is
|
||
called on a ParseResults object, the returned list will include
|
||
the base set of attribute names, plus any results names that are
|
||
defined.
|
||
|
||
- Fixed bug in ParseResults.asXML(), in which the first named
|
||
item within a ParseResults gets reported with an <ITEM> tag
|
||
instead of with the correct results name.
|
||
|
||
- Fixed bug in '-' error stop, when '-' operator is used inside a
|
||
Combine expression.
|
||
|
||
- Reverted generator expression to use list comprehension, for
|
||
better compatibility with old versions of Python. Reported by
|
||
jester/artixdesign on the SourceForge pyparsing discussion list.
|
||
|
||
- Fixed bug in parseString(parseAll=True), when the input string
|
||
ends with a comment or whitespace.
|
||
|
||
- Fixed bug in LineStart and LineEnd that did not recognize any
|
||
special whitespace chars defined using ParserElement.setDefault-
|
||
WhitespaceChars, found while debugging an issue for Marek Kubica,
|
||
thanks for the new test case, Marek!
|
||
|
||
- Made Forward class more tolerant of subclassing.
|
||
|
||
|
||
Version 1.5.0 - June, 2008
|
||
--------------------------
|
||
This version of pyparsing includes work on two long-standing
|
||
FAQ's: support for forcing parsing of the complete input string
|
||
(without having to explicitly append StringEnd() to the grammar),
|
||
and a method to improve the mechanism of detecting where syntax
|
||
errors occur in an input string with various optional and
|
||
alternative paths. This release also includes a helper method
|
||
to simplify definition of indentation-based grammars. With
|
||
these changes (and the past few minor updates), I thought it was
|
||
finally time to bump the minor rev number on pyparsing - so
|
||
1.5.0 is now available! Read on...
|
||
|
||
- AT LAST!!! You can now call parseString and have it raise
|
||
an exception if the expression does not parse the entire
|
||
input string. This has been an FAQ for a LONG time.
|
||
|
||
The parseString method now includes an optional parseAll
|
||
argument (default=False). If parseAll is set to True, then
|
||
the given parse expression must parse the entire input
|
||
string. (This is equivalent to adding StringEnd() to the
|
||
end of the expression.) The default value is False to
|
||
retain backward compatibility.
|
||
|
||
Inspired by MANY requests over the years, most recently by
|
||
ecir-hana on the pyparsing wiki!
|
||
|
||
- Added new operator '-' for composing grammar sequences. '-'
|
||
behaves just like '+' in creating And expressions, but '-'
|
||
is used to mark grammar structures that should stop parsing
|
||
immediately and report a syntax error, rather than just
|
||
backtracking to the last successful parse and trying another
|
||
alternative. For instance, running the following code:
|
||
|
||
port_definition = Keyword("port") + '=' + Word(nums)
|
||
entity_definition = Keyword("entity") + "{" +
|
||
Optional(port_definition) + "}"
|
||
|
||
entity_definition.parseString("entity { port 100 }")
|
||
|
||
pyparsing fails to detect the missing '=' in the port definition.
|
||
But, since this expression is optional, pyparsing then proceeds
|
||
to try to match the closing '}' of the entity_definition. Not
|
||
finding it, pyparsing reports that there was no '}' after the '{'
|
||
character. Instead, we would like pyparsing to parse the 'port'
|
||
keyword, and if not followed by an equals sign and an integer,
|
||
to signal this as a syntax error.
|
||
|
||
This can now be done simply by changing the port_definition to:
|
||
|
||
port_definition = Keyword("port") - '=' + Word(nums)
|
||
|
||
Now after successfully parsing 'port', pyparsing must also find
|
||
an equals sign and an integer, or it will raise a fatal syntax
|
||
exception.
|
||
|
||
By judicious insertion of '-' operators, a pyparsing developer
|
||
can have their grammar report much more informative syntax error
|
||
messages.
|
||
|
||
Patches and suggestions proposed by several contributors on
|
||
the pyparsing mailing list and wiki - special thanks to
|
||
Eike Welk and Thomas/Poldy on the pyparsing wiki!
|
||
|
||
- Added indentedBlock helper method, to encapsulate the parse
|
||
actions and indentation stack management needed to keep track of
|
||
indentation levels. Use indentedBlock to define grammars for
|
||
indentation-based grouping grammars, like Python's.
|
||
|
||
indentedBlock takes up to 3 parameters:
|
||
- blockStatementExpr - expression defining syntax of statement
|
||
that is repeated within the indented block
|
||
- indentStack - list created by caller to manage indentation
|
||
stack (multiple indentedBlock expressions
|
||
within a single grammar should share a common indentStack)
|
||
- indent - boolean indicating whether block must be indented
|
||
beyond the current level; set to False for block of
|
||
left-most statements (default=True)
|
||
|
||
A valid block must contain at least one indented statement.
|
||
|
||
- Fixed bug in nestedExpr in which ignored expressions needed
|
||
to be set off with whitespace. Reported by Stefaan Himpe,
|
||
nice catch!
|
||
|
||
- Expanded multiplication of an expression by a tuple, to
|
||
accept tuple values of None:
|
||
. expr*(n,None) or expr*(n,) is equivalent
|
||
to expr*n + ZeroOrMore(expr)
|
||
(read as "at least n instances of expr")
|
||
. expr*(None,n) is equivalent to expr*(0,n)
|
||
(read as "0 to n instances of expr")
|
||
. expr*(None,None) is equivalent to ZeroOrMore(expr)
|
||
. expr*(1,None) is equivalent to OneOrMore(expr)
|
||
|
||
Note that expr*(None,n) does not raise an exception if
|
||
more than n exprs exist in the input stream; that is,
|
||
expr*(None,n) does not enforce a maximum number of expr
|
||
occurrences. If this behavior is desired, then write
|
||
expr*(None,n) + ~expr
|
||
|
||
- Added None as a possible operator for operatorPrecedence.
|
||
None signifies "no operator", as in multiplying m times x
|
||
in "y=mx+b".
|
||
|
||
- Fixed bug in Each, reported by Michael Ramirez, in which the
|
||
order of terms in the Each affected the parsing of the results.
|
||
Problem was due to premature grouping of the expressions in
|
||
the overall Each during grammar construction, before the
|
||
complete Each was defined. Thanks, Michael!
|
||
|
||
- Also fixed bug in Each in which Optional's with default values
|
||
were not getting the defaults added to the results of the
|
||
overall Each expression.
|
||
|
||
- Fixed a bug in Optional in which results names were not
|
||
assigned if a default value was supplied.
|
||
|
||
- Cleaned up Py3K compatibility statements, including exception
|
||
construction statements, and better equivalence between _ustr
|
||
and basestring, and __nonzero__ and __bool__.
|
||
|
||
|
||
Version 1.4.11 - February, 2008
|
||
-------------------------------
|
||
- With help from Robert A. Clark, this version of pyparsing
|
||
is compatible with Python 3.0a3. Thanks for the help,
|
||
Robert!
|
||
|
||
- Added WordStart and WordEnd positional classes, to support
|
||
expressions that must occur at the start or end of a word.
|
||
Proposed by piranha on the pyparsing wiki, good idea!
|
||
|
||
- Added matchOnlyAtCol helper parser action, to simplify
|
||
parsing log or data files that have optional fields that are
|
||
column dependent. Inspired by a discussion thread with
|
||
hubritic on comp.lang.python.
|
||
|
||
- Added withAttribute.ANY_VALUE as a match-all value when using
|
||
withAttribute. Used to ensure that an attribute is present,
|
||
without having to match on the actual attribute value.
|
||
|
||
- Added get() method to ParseResults, similar to dict.get().
|
||
Suggested by new pyparsing user, Alejandro Dubrovksy, thanks!
|
||
|
||
- Added '==' short-cut to see if a given string matches a
|
||
pyparsing expression. For instance, you can now write:
|
||
|
||
integer = Word(nums)
|
||
if "123" == integer:
|
||
# do something
|
||
|
||
print [ x for x in "123 234 asld".split() if x==integer ]
|
||
# prints ['123', '234']
|
||
|
||
- Simplified the use of nestedExpr when using an expression for
|
||
the opening or closing delimiters. Now the content expression
|
||
will not have to explicitly negate closing delimiters. Found
|
||
while working with dfinnie on GHOP Task #277, thanks!
|
||
|
||
- Fixed bug when defining ignorable expressions that are
|
||
later enclosed in a wrapper expression (such as ZeroOrMore,
|
||
OneOrMore, etc.) - found while working with Prabhu
|
||
Gurumurthy, thanks Prahbu!
|
||
|
||
- Fixed bug in withAttribute in which keys were automatically
|
||
converted to lowercase, making it impossible to match XML
|
||
attributes with uppercase characters in them. Using with-
|
||
Attribute requires that you reference attributes in all
|
||
lowercase if parsing HTML, and in correct case when parsing
|
||
XML.
|
||
|
||
- Changed '<<' operator on Forward to return None, since this
|
||
is really used as a pseudo-assignment operator, not as a
|
||
left-shift operator. By returning None, it is easier to
|
||
catch faulty statements such as a << b | c, where precedence
|
||
of operations causes the '|' operation to be performed
|
||
*after* inserting b into a, so no alternation is actually
|
||
implemented. The correct form is a << (b | c). With this
|
||
change, an error will be reported instead of silently
|
||
clipping the alternative term. (Note: this may break some
|
||
existing code, but if it does, the code had a silent bug in
|
||
it anyway.) Proposed by wcbarksdale on the pyparsing wiki,
|
||
thanks!
|
||
|
||
- Several unit tests were added to pyparsing's regression
|
||
suite, courtesy of the Google Highly-Open Participation
|
||
Contest. Thanks to all who administered and took part in
|
||
this event!
|
||
|
||
|
||
Version 1.4.10 - December 9, 2007
|
||
---------------------------------
|
||
- Fixed bug introduced in v1.4.8, parse actions were called for
|
||
intermediate operator levels, not just the deepest matching
|
||
operation level. Again, big thanks to Torsten Marek for
|
||
helping isolate this problem!
|
||
|
||
|
||
Version 1.4.9 - December 8, 2007
|
||
--------------------------------
|
||
- Added '*' multiplication operator support when creating
|
||
grammars, accepting either an integer, or a two-integer
|
||
tuple multiplier, as in:
|
||
ipAddress = Word(nums) + ('.'+Word(nums))*3
|
||
usPhoneNumber = Word(nums) + ('-'+Word(nums))*(1,2)
|
||
If multiplying by a tuple, the two integer values represent
|
||
min and max multiples. Suggested by Vincent of eToy.com,
|
||
great idea, Vincent!
|
||
|
||
- Fixed bug in nestedExpr, original version was overly greedy!
|
||
Thanks to Michael Ramirez for raising this issue.
|
||
|
||
- Fixed internal bug in ParseResults - when an item was deleted,
|
||
the key indices were not updated. Thanks to Tim Mitchell for
|
||
posting a bugfix patch to the SF bug tracking system!
|
||
|
||
- Fixed internal bug in operatorPrecedence - when the results of
|
||
a right-associative term were sent to a parse action, the wrong
|
||
tokens were sent. Reported by Torsten Marek, nice job!
|
||
|
||
- Added pop() method to ParseResults. If pop is called with an
|
||
integer or with no arguments, it will use list semantics and
|
||
update the ParseResults' list of tokens. If pop is called with
|
||
a non-integer (a string, for instance), then it will use dict
|
||
semantics and update the ParseResults' internal dict.
|
||
Suggested by Donn Ingle, thanks Donn!
|
||
|
||
- Fixed quoted string built-ins to accept '\xHH' hex characters
|
||
within the string.
|
||
|
||
|
||
Version 1.4.8 - October, 2007
|
||
-----------------------------
|
||
- Added new helper method nestedExpr to easily create expressions
|
||
that parse lists of data in nested parentheses, braces, brackets,
|
||
etc.
|
||
|
||
- Added withAttribute parse action helper, to simplify creating
|
||
filtering parse actions to attach to expressions returned by
|
||
makeHTMLTags and makeXMLTags. Use withAttribute to qualify a
|
||
starting tag with one or more required attribute values, to avoid
|
||
false matches on common tags such as <TD> or <DIV>.
|
||
|
||
- Added new examples nested.py and withAttribute.py to demonstrate
|
||
the new features.
|
||
|
||
- Added performance speedup to grammars using operatorPrecedence,
|
||
instigated by Stefan Reichör - thanks for the feedback, Stefan!
|
||
|
||
- Fixed bug/typo when deleting an element from a ParseResults by
|
||
using the element's results name.
|
||
|
||
- Fixed whitespace-skipping bug in wrapper classes (such as Group,
|
||
Suppress, Combine, etc.) and when using setDebug(), reported by
|
||
new pyparsing user dazzawazza on SourceForge, nice job!
|
||
|
||
- Added restriction to prevent defining Word or CharsNotIn expressions
|
||
with minimum length of 0 (should use Optional if this is desired),
|
||
and enhanced docstrings to reflect this limitation. Issue was
|
||
raised by Joey Tallieu, who submitted a patch with a slightly
|
||
different solution. Thanks for taking the initiative, Joey, and
|
||
please keep submitting your ideas!
|
||
|
||
- Fixed bug in makeHTMLTags that did not detect HTML tag attributes
|
||
with no '= value' portion (such as "<td nowrap>"), reported by
|
||
hamidh on the pyparsing wiki - thanks!
|
||
|
||
- Fixed minor bug in makeHTMLTags and makeXMLTags, which did not
|
||
accept whitespace in closing tags.
|
||
|
||
|
||
Version 1.4.7 - July, 2007
|
||
--------------------------
|
||
- NEW NOTATION SHORTCUT: ParserElement now accepts results names using
|
||
a notational shortcut, following the expression with the results name
|
||
in parentheses. So this:
|
||
|
||
stats = "AVE:" + realNum.setResultsName("average") + \
|
||
"MIN:" + realNum.setResultsName("min") + \
|
||
"MAX:" + realNum.setResultsName("max")
|
||
|
||
can now be written as this:
|
||
|
||
stats = "AVE:" + realNum("average") + \
|
||
"MIN:" + realNum("min") + \
|
||
"MAX:" + realNum("max")
|
||
|
||
The intent behind this change is to make it simpler to define results
|
||
names for significant fields within the expression, while keeping
|
||
the grammar syntax clean and uncluttered.
|
||
|
||
- Fixed bug when packrat parsing is enabled, with cached ParseResults
|
||
being updated by subsequent parsing. Reported on the pyparsing
|
||
wiki by Kambiz, thanks!
|
||
|
||
- Fixed bug in operatorPrecedence for unary operators with left
|
||
associativity, if multiple operators were given for the same term.
|
||
|
||
- Fixed bug in example simpleBool.py, corrected precedence of "and" vs.
|
||
"or" operations.
|
||
|
||
- Fixed bug in Dict class, in which keys were converted to strings
|
||
whether they needed to be or not. Have narrowed this logic to
|
||
convert keys to strings only if the keys are ints (which would
|
||
confuse __getitem__ behavior for list indexing vs. key lookup).
|
||
|
||
- Added ParserElement method setBreak(), which will invoke the pdb
|
||
module's set_trace() function when this expression is about to be
|
||
parsed.
|
||
|
||
- Fixed bug in StringEnd in which reading off the end of the input
|
||
string raises an exception - should match. Resolved while
|
||
answering a question for Shawn on the pyparsing wiki.
|
||
|
||
|
||
Version 1.4.6 - April, 2007
|
||
---------------------------
|
||
- Simplified constructor for ParseFatalException, to support common
|
||
exception construction idiom:
|
||
raise ParseFatalException, "unexpected text: 'Spanish Inquisition'"
|
||
|
||
- Added method getTokensEndLoc(), to be called from within a parse action,
|
||
for those parse actions that need both the starting *and* ending
|
||
location of the parsed tokens within the input text.
|
||
|
||
- Enhanced behavior of keepOriginalText so that named parse fields are
|
||
preserved, even though tokens are replaced with the original input
|
||
text matched by the current expression. Also, cleaned up the stack
|
||
traversal to be more robust. Suggested by Tim Arnold - thanks, Tim!
|
||
|
||
- Fixed subtle bug in which countedArray (and similar dynamic
|
||
expressions configured in parse actions) failed to match within Or,
|
||
Each, FollowedBy, or NotAny. Reported by Ralf Vosseler, thanks for
|
||
your patience, Ralf!
|
||
|
||
- Fixed Unicode bug in upcaseTokens and downcaseTokens parse actions,
|
||
scanString, and default debugging actions; reported (and patch submitted)
|
||
by Nikolai Zamkovoi, spasibo!
|
||
|
||
- Fixed bug when saving a tuple as a named result. The returned
|
||
token list gave the proper tuple value, but accessing the result by
|
||
name only gave the first element of the tuple. Reported by
|
||
Poromenos, nice catch!
|
||
|
||
- Fixed bug in makeHTMLTags/makeXMLTags, which failed to match tag
|
||
attributes with namespaces.
|
||
|
||
- Fixed bug in SkipTo when setting include=True, to have the skipped-to
|
||
tokens correctly included in the returned data. Reported by gunars on
|
||
the pyparsing wiki, thanks!
|
||
|
||
- Fixed typobug in OnceOnly.reset method, omitted self argument.
|
||
Submitted by eike welk, thanks for the lint-picking!
|
||
|
||
- Added performance enhancement to Forward class, suggested by
|
||
akkartik on the pyparsing Wiki discussion, nice work!
|
||
|
||
- Added optional asKeyword to Word constructor, to indicate that the
|
||
given word pattern should be matched only as a keyword, that is, it
|
||
should only match if it is within word boundaries.
|
||
|
||
- Added S-expression parser to examples directory.
|
||
|
||
- Added macro substitution example to examples directory.
|
||
|
||
- Added holaMundo.py example, excerpted from Marco Alfonso's blog -
|
||
muchas gracias, Marco!
|
||
|
||
- Modified internal cyclic references in ParseResults to use weakrefs;
|
||
this should help reduce the memory footprint of large parsing
|
||
programs, at some cost to performance (3-5%). Suggested by bca48150 on
|
||
the pyparsing wiki, thanks!
|
||
|
||
- Enhanced the documentation describing the vagaries and idiosyncrasies
|
||
of parsing strings with embedded tabs, and the impact on:
|
||
. parse actions
|
||
. scanString
|
||
. col and line helper functions
|
||
(Suggested by eike welk in response to some unexplained inconsistencies
|
||
between parsed location and offsets in the input string.)
|
||
|
||
- Cleaned up internal decorators to preserve function names,
|
||
docstrings, etc.
|
||
|
||
|
||
Version 1.4.5 - December, 2006
|
||
------------------------------
|
||
- Removed debugging print statement from QuotedString class. Sorry
|
||
for not stripping this out before the 1.4.4 release!
|
||
|
||
- A significant performance improvement, the first one in a while!
|
||
For my Verilog parser, this version of pyparsing is about double the
|
||
speed - YMMV.
|
||
|
||
- Added support for pickling of ParseResults objects. (Reported by
|
||
Jeff Poole, thanks Jeff!)
|
||
|
||
- Fixed minor bug in makeHTMLTags that did not recognize tag attributes
|
||
with embedded '-' or '_' characters. Also, added support for
|
||
passing expressions to makeHTMLTags and makeXMLTags, and used this
|
||
feature to define the globals anyOpenTag and anyCloseTag.
|
||
|
||
- Fixed error in alphas8bit, I had omitted the y-with-umlaut character.
|
||
|
||
- Added punc8bit string to complement alphas8bit - it contains all the
|
||
non-alphabetic, non-blank 8-bit characters.
|
||
|
||
- Added commonHTMLEntity expression, to match common HTML "ampersand"
|
||
codes, such as "<", ">", "&", " ", and """. This
|
||
expression also defines a results name 'entity', which can be used
|
||
to extract the entity field (that is, "lt", "gt", etc.). Also added
|
||
built-in parse action replaceHTMLEntity, which can be attached to
|
||
commonHTMLEntity to translate "<", ">", "&", " ", and
|
||
""" to "<", ">", "&", " ", and "'".
|
||
|
||
- Added example, htmlStripper.py, that strips HTML tags and scripts
|
||
from HTML pages. It also translates common HTML entities to their
|
||
respective characters.
|
||
|
||
|
||
Version 1.4.4 - October, 2006
|
||
-------------------------------
|
||
- Fixed traceParseAction decorator to also trap and record exception
|
||
returns from parse actions, and to handle parse actions with 0,
|
||
1, 2, or 3 arguments.
|
||
|
||
- Enhanced parse action normalization to support using classes as
|
||
parse actions; that is, the class constructor is called at parse
|
||
time and the __init__ function is called with 0, 1, 2, or 3
|
||
arguments. If passing a class as a parse action, the __init__
|
||
method must use one of the valid parse action parameter list
|
||
formats. (This technique is useful when using pyparsing to compile
|
||
parsed text into a series of application objects - see the new
|
||
example simpleBool.py.)
|
||
|
||
- Fixed bug in ParseResults when setting an item using an integer
|
||
index. (Reported by Christopher Lambacher, thanks!)
|
||
|
||
- Fixed whitespace-skipping bug, patch submitted by Paolo Losi -
|
||
grazie, Paolo!
|
||
|
||
- Fixed bug when a Combine contained an embedded Forward expression,
|
||
reported by cie on the pyparsing wiki - good catch!
|
||
|
||
- Fixed listAllMatches bug, when a listAllMatches result was
|
||
nested within another result. (Reported by don pasquale on
|
||
comp.lang.python, well done!)
|
||
|
||
- Fixed bug in ParseResults items() method, when returning an item
|
||
marked as listAllMatches=True
|
||
|
||
- Fixed bug in definition of cppStyleComment (and javaStyleComment)
|
||
in which '//' line comments were not continued to the next line
|
||
if the line ends with a '\'. (Reported by eagle-eyed Ralph
|
||
Corderoy!)
|
||
|
||
- Optimized re's for cppStyleComment and quotedString for better
|
||
re performance - also provided by Ralph Corderoy, thanks!
|
||
|
||
- Added new example, indentedGrammarExample.py, showing how to
|
||
define a grammar using indentation to show grouping (as Python
|
||
does for defining statement nesting). Instigated by an e-mail
|
||
discussion with Andrew Dalke, thanks Andrew!
|
||
|
||
- Added new helper operatorPrecedence (based on e-mail list discussion
|
||
with Ralph Corderoy and Paolo Losi), to facilitate definition of
|
||
grammars for expressions with unary and binary operators. For
|
||
instance, this grammar defines a 6-function arithmetic expression
|
||
grammar, with unary plus and minus, proper operator precedence,and
|
||
right- and left-associativity:
|
||
|
||
expr = operatorPrecedence( operand,
|
||
[("!", 1, opAssoc.LEFT),
|
||
("^", 2, opAssoc.RIGHT),
|
||
(oneOf("+ -"), 1, opAssoc.RIGHT),
|
||
(oneOf("* /"), 2, opAssoc.LEFT),
|
||
(oneOf("+ -"), 2, opAssoc.LEFT),]
|
||
)
|
||
|
||
Also added example simpleArith.py and simpleBool.py to provide
|
||
more detailed code samples using this new helper method.
|
||
|
||
- Added new helpers matchPreviousLiteral and matchPreviousExpr, for
|
||
creating adaptive parsing expressions that match the same content
|
||
as was parsed in a previous parse expression. For instance:
|
||
|
||
first = Word(nums)
|
||
matchExpr = first + ":" + matchPreviousLiteral(first)
|
||
|
||
will match "1:1", but not "1:2". Since this matches at the literal
|
||
level, this will also match the leading "1:1" in "1:10".
|
||
|
||
In contrast:
|
||
|
||
first = Word(nums)
|
||
matchExpr = first + ":" + matchPreviousExpr(first)
|
||
|
||
will *not* match the leading "1:1" in "1:10"; the expressions are
|
||
evaluated first, and then compared, so "1" is compared with "10".
|
||
|
||
- Added keepOriginalText parse action. Sometimes pyparsing's
|
||
whitespace-skipping leaves out too much whitespace. Adding this
|
||
parse action will restore any internal whitespace for a parse
|
||
expression. This is especially useful when defining expressions
|
||
for scanString or transformString applications.
|
||
|
||
- Added __add__ method for ParseResults class, to better support
|
||
using Python sum built-in for summing ParseResults objects returned
|
||
from scanString.
|
||
|
||
- Added reset method for the new OnlyOnce class wrapper for parse
|
||
actions (to allow a grammar to be used multiple times).
|
||
|
||
- Added optional maxMatches argument to scanString and searchString,
|
||
to short-circuit scanning after 'n' expression matches are found.
|
||
|
||
|
||
Version 1.4.3 - July, 2006
|
||
------------------------------
|
||
- Fixed implementation of multiple parse actions for an expression
|
||
(added in 1.4.2).
|
||
. setParseAction() reverts to its previous behavior, setting
|
||
one (or more) actions for an expression, overwriting any
|
||
action or actions previously defined
|
||
. new method addParseAction() appends one or more parse actions
|
||
to the list of parse actions attached to an expression
|
||
Now it is harder to accidentally append parse actions to an
|
||
expression, when what you wanted to do was overwrite whatever had
|
||
been defined before. (Thanks, Jean-Paul Calderone!)
|
||
|
||
- Simplified interface to parse actions that do not require all 3
|
||
parse action arguments. Very rarely do parse actions require more
|
||
than just the parsed tokens, yet parse actions still require all
|
||
3 arguments including the string being parsed and the location
|
||
within the string where the parse expression was matched. With this
|
||
release, parse actions may now be defined to be called as:
|
||
. fn(string,locn,tokens) (the current form)
|
||
. fn(locn,tokens)
|
||
. fn(tokens)
|
||
. fn()
|
||
The setParseAction and addParseAction methods will internally decorate
|
||
the provided parse actions with compatible wrappers to conform to
|
||
the full (string,locn,tokens) argument sequence.
|
||
|
||
- REMOVED SUPPORT FOR RETURNING PARSE LOCATION FROM A PARSE ACTION.
|
||
I announced this in March, 2004, and gave a final warning in the last
|
||
release. Now you can return a tuple from a parse action, and it will
|
||
be treated like any other return value (i.e., the tuple will be
|
||
substituted for the incoming tokens passed to the parse action,
|
||
which is useful when trying to parse strings into tuples).
|
||
|
||
- Added setFailAction method, taking a callable function fn that
|
||
takes the arguments fn(s,loc,expr,err) where:
|
||
. s - string being parsed
|
||
. loc - location where expression match was attempted and failed
|
||
. expr - the parse expression that failed
|
||
. err - the exception thrown
|
||
The function returns no values. It may throw ParseFatalException
|
||
if it is desired to stop parsing immediately.
|
||
(Suggested by peter21081944 on wikispaces.com)
|
||
|
||
- Added class OnlyOnce as helper wrapper for parse actions. OnlyOnce
|
||
only permits a parse action to be called one time, after which
|
||
all subsequent calls throw a ParseException.
|
||
|
||
- Added traceParseAction decorator to help debug parse actions.
|
||
Simply insert "@traceParseAction" ahead of the definition of your
|
||
parse action, and each invocation will be displayed, along with
|
||
incoming arguments, and returned value.
|
||
|
||
- Fixed bug when copying ParserElements using copy() or
|
||
setResultsName(). (Reported by Dan Thill, great catch!)
|
||
|
||
- Fixed bug in asXML() where token text contains <, >, and &
|
||
characters - generated XML now escapes these as <, > and
|
||
&. (Reported by Jacek Sieka, thanks!)
|
||
|
||
- Fixed bug in SkipTo() when searching for a StringEnd(). (Reported
|
||
by Pete McEvoy, thanks Pete!)
|
||
|
||
- Fixed "except Exception" statements, the most critical added as part
|
||
of the packrat parsing enhancement. (Thanks, Erick Tryzelaar!)
|
||
|
||
- Fixed end-of-string infinite looping on LineEnd and StringEnd
|
||
expressions. (Thanks again to Erick Tryzelaar.)
|
||
|
||
- Modified setWhitespaceChars to return self, to be consistent with
|
||
other ParserElement modifiers. (Suggested by Erick Tryzelaar.)
|
||
|
||
- Fixed bug/typo in new ParseResults.dump() method.
|
||
|
||
- Fixed bug in searchString() method, in which only the first token of
|
||
an expression was returned. searchString() now returns a
|
||
ParseResults collection of all search matches.
|
||
|
||
- Added example program removeLineBreaks.py, a string transformer that
|
||
converts text files with hard line-breaks into one with line breaks
|
||
only between paragraphs.
|
||
|
||
- Added example program listAllMatches.py, to illustrate using the
|
||
listAllMatches option when specifying results names (also shows new
|
||
support for passing lists to oneOf).
|
||
|
||
- Added example program linenoExample.py, to illustrate using the
|
||
helper methods lineno, line, and col, and returning objects from a
|
||
parse action.
|
||
|
||
- Added example program parseListString.py, to which can parse the
|
||
string representation of a Python list back into a true list. Taken
|
||
mostly from my PyCon presentation examples, but now with support
|
||
for tuple elements, too!
|
||
|
||
|
||
|
||
Version 1.4.2 - April 1, 2006 (No foolin'!)
|
||
-------------------------------------------
|
||
- Significant speedup from memoizing nested expressions (a technique
|
||
known as "packrat parsing"), thanks to Chris Lesniewski-Laas! Your
|
||
mileage may vary, but my Verilog parser almost doubled in speed to
|
||
over 600 lines/sec!
|
||
|
||
This speedup may break existing programs that use parse actions that
|
||
have side-effects. For this reason, packrat parsing is disabled when
|
||
you first import pyparsing. To activate the packrat feature, your
|
||
program must call the class method ParserElement.enablePackrat(). If
|
||
your program uses psyco to "compile as you go", you must call
|
||
enablePackrat before calling psyco.full(). If you do not do this,
|
||
Python will crash. For best results, call enablePackrat() immediately
|
||
after importing pyparsing.
|
||
|
||
- Added new helper method countedArray(expr), for defining patterns that
|
||
start with a leading integer to indicate the number of array elements,
|
||
followed by that many elements, matching the given expr parse
|
||
expression. For instance, this two-liner:
|
||
wordArray = countedArray(Word(alphas))
|
||
print wordArray.parseString("3 Practicality beats purity")[0]
|
||
returns the parsed array of words:
|
||
['Practicality', 'beats', 'purity']
|
||
The leading token '3' is suppressed, although it is easily obtained
|
||
from the length of the returned array.
|
||
(Inspired by e-mail discussion with Ralf Vosseler.)
|
||
|
||
- Added support for attaching multiple parse actions to a single
|
||
ParserElement. (Suggested by Dan "Dang" Griffith - nice idea, Dan!)
|
||
|
||
- Added support for asymmetric quoting characters in the recently-added
|
||
QuotedString class. Now you can define your own quoted string syntax
|
||
like "<<This is a string in double angle brackets.>>". To define
|
||
this custom form of QuotedString, your code would define:
|
||
dblAngleQuotedString = QuotedString('<<',endQuoteChar='>>')
|
||
QuotedString also supports escaped quotes, escape character other
|
||
than '\', and multiline.
|
||
|
||
- Changed the default value returned internally by Optional, so that
|
||
None can be used as a default value. (Suggested by Steven Bethard -
|
||
I finally saw the light!)
|
||
|
||
- Added dump() method to ParseResults, to make it easier to list out
|
||
and diagnose values returned from calling parseString.
|
||
|
||
- A new example, a search query string parser, submitted by Steven
|
||
Mooij and Rudolph Froger - a very interesting application, thanks!
|
||
|
||
- Added an example that parses the BNF in Python's Grammar file, in
|
||
support of generating Python grammar documentation. (Suggested by
|
||
J H Stovall.)
|
||
|
||
- A new example, submitted by Tim Cera, of a flexible parser module,
|
||
using a simple config variable to adjust parsing for input formats
|
||
that have slight variations - thanks, Tim!
|
||
|
||
- Added an example for parsing Roman numerals, showing the capability
|
||
of parse actions to "compile" Roman numerals into their integer
|
||
values during parsing.
|
||
|
||
- Added a new docs directory, for additional documentation or help.
|
||
Currently, this includes the text and examples from my recent
|
||
presentation at PyCon.
|
||
|
||
- Fixed another typo in CaselessKeyword, thanks Stefan Behnel.
|
||
|
||
- Expanded oneOf to also accept tuples, not just lists. This really
|
||
should be sufficient...
|
||
|
||
- Added deprecation warnings when tuple is returned from a parse action.
|
||
Looking back, I see that I originally deprecated this feature in March,
|
||
2004, so I'm guessing people really shouldn't have been using this
|
||
feature - I'll drop it altogether in the next release, which will
|
||
allow users to return a tuple from a parse action (which is really
|
||
handy when trying to reconstuct tuples from a tuple string
|
||
representation!).
|
||
|
||
|
||
Version 1.4.1 - February, 2006
|
||
------------------------------
|
||
- Converted generator expression in QuotedString class to list
|
||
comprehension, to retain compatibility with Python 2.3. (Thanks, Titus
|
||
Brown for the heads-up!)
|
||
|
||
- Added searchString() method to ParserElement, as an alternative to
|
||
using "scanString(instring).next()[0][0]" to search through a string
|
||
looking for a substring matching a given parse expression. (Inspired by
|
||
e-mail conversation with Dave Feustel.)
|
||
|
||
- Modified oneOf to accept lists of strings as well as a single string
|
||
of space-delimited literals. (Suggested by Jacek Sieka - thanks!)
|
||
|
||
- Removed deprecated use of Upcase in pyparsing test code. (Also caught by
|
||
Titus Brown.)
|
||
|
||
- Removed lstrip() call from Literal - too aggressive in stripping
|
||
whitespace which may be valid for some grammars. (Point raised by Jacek
|
||
Sieka). Also, made Literal more robust in the event of passing an empty
|
||
string.
|
||
|
||
- Fixed bug in replaceWith when returning None.
|
||
|
||
- Added cautionary documentation for Forward class when assigning a
|
||
MatchFirst expression, as in:
|
||
fwdExpr << a | b | c
|
||
Precedence of operators causes this to be evaluated as:
|
||
(fwdExpr << a) | b | c
|
||
thereby leaving b and c out as parseable alternatives. Users must
|
||
explicitly group the values inserted into the Forward:
|
||
fwdExpr << (a | b | c)
|
||
(Suggested by Scot Wilcoxon - thanks, Scot!)
|
||
|
||
|
||
Version 1.4 - January 18, 2006
|
||
------------------------------
|
||
- Added Regex class, to permit definition of complex embedded expressions
|
||
using regular expressions. (Enhancement provided by John Beisley, great
|
||
job!)
|
||
|
||
- Converted implementations of Word, oneOf, quoted string, and comment
|
||
helpers to utilize regular expression matching. Performance improvements
|
||
in the 20-40% range.
|
||
|
||
- Added QuotedString class, to support definition of non-standard quoted
|
||
strings (Suggested by Guillaume Proulx, thanks!)
|
||
|
||
- Added CaselessKeyword class, to streamline grammars with, well, caseless
|
||
keywords (Proposed by Stefan Behnel, thanks!)
|
||
|
||
- Fixed bug in SkipTo, when using an ignoreable expression. (Patch provided
|
||
by Anonymous, thanks, whoever-you-are!)
|
||
|
||
- Fixed typo in NoMatch class. (Good catch, Stefan Behnel!)
|
||
|
||
- Fixed minor bug in _makeTags(), using string.printables instead of
|
||
pyparsing.printables.
|
||
|
||
- Cleaned up some of the expressions created by makeXXXTags helpers, to
|
||
suppress extraneous <> characters.
|
||
|
||
- Added some grammar definition-time checking to verify that a grammar is
|
||
being built using proper ParserElements.
|
||
|
||
- Added examples:
|
||
. LAparser.py - linear algebra C preprocessor (submitted by Mike Ellis,
|
||
thanks Mike!)
|
||
. wordsToNum.py - converts word description of a number back to
|
||
the original number (such as 'one hundred and twenty three' -> 123)
|
||
. updated fourFn.py to support unary minus, added BNF comments
|
||
|
||
|
||
Version 1.3.3 - September 12, 2005
|
||
----------------------------------
|
||
- Improved support for Unicode strings that would be returned using
|
||
srange. Added greetingInKorean.py example, for a Korean version of
|
||
"Hello, World!" using Unicode. (Thanks, June Kim!)
|
||
|
||
- Added 'hexnums' string constant (nums+"ABCDEFabcdef") for defining
|
||
hexadecimal value expressions.
|
||
|
||
- NOTE: ===THIS CHANGE MAY BREAK EXISTING CODE===
|
||
Modified tag and results definitions returned by makeHTMLTags(),
|
||
to better support the looseness of HTML parsing. Tags to be
|
||
parsed are now caseless, and keys generated for tag attributes are
|
||
now converted to lower case.
|
||
|
||
Formerly, makeXMLTags("XYZ") would return a tag with results
|
||
name of "startXYZ", this has been changed to "startXyz". If this
|
||
tag is matched against '<XYZ Abc="1" DEF="2" ghi="3">', the
|
||
matched keys formerly would be "Abc", "DEF", and "ghi"; keys are
|
||
now converted to lower case, giving keys of "abc", "def", and
|
||
"ghi". These changes were made to try to address the lax
|
||
case sensitivity agreement between start and end tags in many
|
||
HTML pages.
|
||
|
||
No changes were made to makeXMLTags(), which assumes more rigorous
|
||
parsing rules.
|
||
|
||
Also, cleaned up case-sensitivity bugs in closing tags, and
|
||
switched to using Keyword instead of Literal class for tags.
|
||
(Thanks, Steve Young, for getting me to look at these in more
|
||
detail!)
|
||
|
||
- Added two helper parse actions, upcaseTokens and downcaseTokens,
|
||
which will convert matched text to all uppercase or lowercase,
|
||
respectively.
|
||
|
||
- Deprecated Upcase class, to be replaced by upcaseTokens parse
|
||
action.
|
||
|
||
- Converted messages sent to stderr to use warnings module, such as
|
||
when constructing a Literal with an empty string, one should use
|
||
the Empty() class or the empty helper instead.
|
||
|
||
- Added ' ' (space) as an escapable character within a quoted
|
||
string.
|
||
|
||
- Added helper expressions for common comment types, in addition
|
||
to the existing cStyleComment (/*...*/) and htmlStyleComment
|
||
(<!-- ... -->)
|
||
. dblSlashComment = // ... (to end of line)
|
||
. cppStyleComment = cStyleComment or dblSlashComment
|
||
. javaStyleComment = cppStyleComment
|
||
. pythonStyleComment = # ... (to end of line)
|
||
|
||
|
||
|
||
Version 1.3.2 - July 24, 2005
|
||
-----------------------------
|
||
- Added Each class as an enhanced version of And. 'Each' requires
|
||
that all given expressions be present, but may occur in any order.
|
||
Special handling is provided to group ZeroOrMore and OneOrMore
|
||
elements that occur out-of-order in the input string. You can also
|
||
construct 'Each' objects by joining expressions with the '&'
|
||
operator. When using the Each class, results names are strongly
|
||
recommended for accessing the matched tokens. (Suggested by Pradam
|
||
Amini - thanks, Pradam!)
|
||
|
||
- Stricter interpretation of 'max' qualifier on Word elements. If the
|
||
'max' attribute is specified, matching will fail if an input field
|
||
contains more than 'max' consecutive body characters. For example,
|
||
previously, Word(nums,max=3) would match the first three characters
|
||
of '0123456', returning '012' and continuing parsing at '3'. Now,
|
||
when constructed using the max attribute, Word will raise an
|
||
exception with this string.
|
||
|
||
- Cleaner handling of nested dictionaries returned by Dict. No
|
||
longer necessary to dereference sub-dictionaries as element [0] of
|
||
their parents.
|
||
=== NOTE: THIS CHANGE MAY BREAK SOME EXISTING CODE, BUT ONLY IF
|
||
PARSING NESTED DICTIONARIES USING THE LITTLE-USED DICT CLASS ===
|
||
(Prompted by discussion thread on the Python Tutor list, with
|
||
contributions from Danny Yoo, Kent Johnson, and original post by
|
||
Liam Clarke - thanks all!)
|
||
|
||
|
||
|
||
Version 1.3.1 - June, 2005
|
||
----------------------------------
|
||
- Added markInputline() method to ParseException, to display the input
|
||
text line location of the parsing exception. (Thanks, Stefan Behnel!)
|
||
|
||
- Added setDefaultKeywordChars(), so that Keyword definitions using a
|
||
custom keyword character set do not all need to add the keywordChars
|
||
constructor argument (similar to setDefaultWhitespaceChars()).
|
||
(suggested by rzhanka on the SourceForge pyparsing forum.)
|
||
|
||
- Simplified passing debug actions to setDebugAction(). You can now
|
||
pass 'None' for a debug action if you want to take the default
|
||
debug behavior. To suppress a particular debug action, you can pass
|
||
the pyparsing method nullDebugAction.
|
||
|
||
- Refactored parse exception classes, moved all behavior to
|
||
ParseBaseException, and the former ParseException is now a subclass of
|
||
ParseBaseException. Added a second subclass, ParseFatalException, as
|
||
a subclass of ParseBaseException. User-defined parse actions can raise
|
||
ParseFatalException if a data inconsistency is detected (such as a
|
||
begin-tag/end-tag mismatch), and this will stop all parsing immediately.
|
||
(Inspired by e-mail thread with Michele Petrazzo - thanks, Michelle!)
|
||
|
||
- Added helper methods makeXMLTags and makeHTMLTags, that simplify the
|
||
definition of XML or HTML tag parse expressions for a given tagname.
|
||
Both functions return a pair of parse expressions, one for the opening
|
||
tag (that is, '<tagname>') and one for the closing tag ('</tagname>').
|
||
The opening tagame also recognizes any attribute definitions that have
|
||
been included in the opening tag, as well as an empty tag (one with a
|
||
trailing '/', as in '<BODY/>' which is equivalent to '<BODY></BODY>').
|
||
makeXMLTags uses stricter XML syntax for attributes, requiring that they
|
||
be enclosed in double quote characters - makeHTMLTags is more lenient,
|
||
and accepts single-quoted strings or any contiguous string of characters
|
||
up to the next whitespace character or '>' character. Attributes can
|
||
be retrieved as dictionary or attribute values of the returned results
|
||
from the opening tag.
|
||
|
||
- Added example minimath2.py, a refinement on fourFn.py that adds
|
||
an interactive session and support for variables. (Thanks, Steven Siew!)
|
||
|
||
- Added performance improvement, up to 20% reduction! (Found while working
|
||
with Wolfgang Borgert on performance tuning of his TTCN3 parser.)
|
||
|
||
- And another performance improvement, up to 25%, when using scanString!
|
||
(Found while working with Henrik Westlund on his C header file scanner.)
|
||
|
||
- Updated UML diagrams to reflect latest class/method changes.
|
||
|
||
|
||
Version 1.3 - March, 2005
|
||
----------------------------------
|
||
- Added new Keyword class, as a special form of Literal. Keywords
|
||
must be followed by whitespace or other non-keyword characters, to
|
||
distinguish them from variables or other identifiers that just
|
||
happen to start with the same characters as a keyword. For instance,
|
||
the input string containing "ifOnlyIfOnly" will match a Literal("if")
|
||
at the beginning and in the middle, but will fail to match a
|
||
Keyword("if"). Keyword("if") will match only strings such as "if only"
|
||
or "if(only)". (Proposed by Wolfgang Borgert, and Berteun Damman
|
||
separately requested this on comp.lang.python - great idea!)
|
||
|
||
- Added setWhitespaceChars() method to override the characters to be
|
||
skipped as whitespace before matching a particular ParseElement. Also
|
||
added the class-level method setDefaultWhitespaceChars(), to allow
|
||
users to override the default set of whitespace characters (space,
|
||
tab, newline, and return) for all subsequently defined ParseElements.
|
||
(Inspired by Klaas Hofstra's inquiry on the Sourceforge pyparsing
|
||
forum.)
|
||
|
||
- Added helper parse actions to support some very common parse
|
||
action use cases:
|
||
. replaceWith(replStr) - replaces the matching tokens with the
|
||
provided replStr replacement string; especially useful with
|
||
transformString()
|
||
. removeQuotes - removes first and last character from string enclosed
|
||
in quotes (note - NOT the same as the string strip() method, as only
|
||
a single character is removed at each end)
|
||
|
||
- Added copy() method to ParseElement, to make it easier to define
|
||
different parse actions for the same basic parse expression. (Note, copy
|
||
is implicitly called when using setResultsName().)
|
||
|
||
|
||
(The following changes were posted to CVS as Version 1.2.3 -
|
||
October-December, 2004)
|
||
|
||
- Added support for Unicode strings in creating grammar definitions.
|
||
(Big thanks to Gavin Panella!)
|
||
|
||
- Added constant alphas8bit to include the following 8-bit characters:
|
||
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ
|
||
|
||
- Added srange() function to simplify definition of Word elements, using
|
||
regexp-like '[A-Za-z0-9]' syntax. This also simplifies referencing
|
||
common 8-bit characters.
|
||
|
||
- Fixed bug in Dict when a single element Dict was embedded within another
|
||
Dict. (Thanks Andy Yates for catching this one!)
|
||
|
||
- Added 'formatted' argument to ParseResults.asXML(). If set to False,
|
||
suppresses insertion of whitespace for pretty-print formatting. Default
|
||
equals True for backward compatibility.
|
||
|
||
- Added setDebugActions() function to ParserElement, to allow user-defined
|
||
debugging actions.
|
||
|
||
- Added support for escaped quotes (either in \', \", or doubled quote
|
||
form) to the predefined expressions for quoted strings. (Thanks, Ero
|
||
Carrera!)
|
||
|
||
- Minor performance improvement (~5%) converting "char in string" tests
|
||
to "char in dict". (Suggested by Gavin Panella, cool idea!)
|
||
|
||
|
||
Version 1.2.2 - September 27, 2004
|
||
----------------------------------
|
||
- Modified delimitedList to accept an expression as the delimiter, instead
|
||
of only accepting strings.
|
||
|
||
- Modified ParseResults, to convert integer field keys to strings (to
|
||
avoid confusion with list access).
|
||
|
||
- Modified Combine, to convert all embedded tokens to strings before
|
||
combining.
|
||
|
||
- Fixed bug in MatchFirst in which parse actions would be called for
|
||
expressions that only partially match. (Thanks, John Hunter!)
|
||
|
||
- Fixed bug in fourFn.py example that fixes right-associativity of ^
|
||
operator. (Thanks, Andrea Griffini!)
|
||
|
||
- Added class FollowedBy(expression), to look ahead in the input string
|
||
without consuming tokens.
|
||
|
||
- Added class NoMatch that never matches any input. Can be useful in
|
||
debugging, and in very specialized grammars.
|
||
|
||
- Added example pgn.py, for parsing chess game files stored in Portable
|
||
Game Notation. (Thanks, Alberto Santini!)
|
||
|
||
|
||
Version 1.2.1 - August 19, 2004
|
||
-------------------------------
|
||
- Added SkipTo(expression) token type, simplifying grammars that only
|
||
want to specify delimiting expressions, and want to match any characters
|
||
between them.
|
||
|
||
- Added helper method dictOf(key,value), making it easier to work with
|
||
the Dict class. (Inspired by Pavel Volkovitskiy, thanks!).
|
||
|
||
- Added optional argument listAllMatches (default=False) to
|
||
setResultsName(). Setting listAllMatches to True overrides the default
|
||
modal setting of tokens to results names; instead, the results name
|
||
acts as an accumulator for all matching tokens within the local
|
||
repetition group. (Suggested by Amaury Le Leyzour - thanks!)
|
||
|
||
- Fixed bug in ParseResults, throwing exception when trying to extract
|
||
slice, or make a copy using [:]. (Thanks, Wilson Fowlie!)
|
||
|
||
- Fixed bug in transformString() when the input string contains <TAB>'s
|
||
(Thanks, Rick Walia!).
|
||
|
||
- Fixed bug in returning tokens from un-Grouped And's, Or's and
|
||
MatchFirst's, where too many tokens would be included in the results,
|
||
confounding parse actions and returned results.
|
||
|
||
- Fixed bug in naming ParseResults returned by And's, Or's, and Match
|
||
First's.
|
||
|
||
- Fixed bug in LineEnd() - matching this token now correctly consumes
|
||
and returns the end of line "\n".
|
||
|
||
- Added a beautiful example for parsing Mozilla calendar files (Thanks,
|
||
Petri Savolainen!).
|
||
|
||
- Added support for dynamically modifying Forward expressions during
|
||
parsing.
|
||
|
||
|
||
Version 1.2 - 20 June 2004
|
||
--------------------------
|
||
- Added definition for htmlComment to help support HTML scanning and
|
||
parsing.
|
||
|
||
- Fixed bug in generating XML for Dict classes, in which trailing item was
|
||
duplicated in the output XML.
|
||
|
||
- Fixed release bug in which scanExamples.py was omitted from release
|
||
files.
|
||
|
||
- Fixed bug in transformString() when parse actions are not defined on the
|
||
outermost parser element.
|
||
|
||
- Added example urlExtractor.py, as another example of using scanString
|
||
and parse actions.
|
||
|
||
|
||
Version 1.2beta3 - 4 June 2004
|
||
------------------------------
|
||
- Added White() token type, analogous to Word, to match on whitespace
|
||
characters. Use White in parsers with significant whitespace (such as
|
||
configuration file parsers that use indentation to indicate grouping).
|
||
Construct White with a string containing the whitespace characters to be
|
||
matched. Similar to Word, White also takes optional min, max, and exact
|
||
parameters.
|
||
|
||
- As part of supporting whitespace-signficant parsing, added parseWithTabs()
|
||
method to ParserElement, to override the default behavior in parseString
|
||
of automatically expanding tabs to spaces. To retain tabs during
|
||
parsing, call parseWithTabs() before calling parseString(), parseFile() or
|
||
scanString(). (Thanks, Jean-Guillaume Paradis for catching this, and for
|
||
your suggestions on whitespace-significant parsing.)
|
||
|
||
- Added transformString() method to ParseElement, as a complement to
|
||
scanString(). To use transformString, define a grammar and attach a parse
|
||
action to the overall grammar that modifies the returned token list.
|
||
Invoking transformString() on a target string will then scan for matches,
|
||
and replace the matched text patterns according to the logic in the parse
|
||
action. transformString() returns the resulting transformed string.
|
||
(Note: transformString() does *not* automatically expand tabs to spaces.)
|
||
Also added scanExamples.py to the examples directory to show sample uses of
|
||
scanString() and transformString().
|
||
|
||
- Removed group() method that was introduced in beta2. This turns out NOT to
|
||
be equivalent to nesting within a Group() object, and I'd prefer not to sow
|
||
more seeds of confusion.
|
||
|
||
- Fixed behavior of asXML() where tags for groups were incorrectly duplicated.
|
||
(Thanks, Brad Clements!)
|
||
|
||
- Changed beta version message to display to stderr instead of stdout, to
|
||
make asXML() easier to use. (Thanks again, Brad.)
|
||
|
||
|
||
Version 1.2beta2 - 19 May 2004
|
||
------------------------------
|
||
- *** SIMPLIFIED API *** - Parse actions that do not modify the list of tokens
|
||
no longer need to return a value. This simplifies those parse actions that
|
||
use the list of tokens to update a counter or record or display some of the
|
||
token content; these parse actions can simply end without having to specify
|
||
'return toks'.
|
||
|
||
- *** POSSIBLE API INCOMPATIBILITY *** - Fixed CaselessLiteral bug, where the
|
||
returned token text was not the original string (as stated in the docs),
|
||
but the original string converted to upper case. (Thanks, Dang Griffith!)
|
||
**NOTE: this may break some code that relied on this erroneous behavior.
|
||
Users should scan their code for uses of CaselessLiteral.**
|
||
|
||
- *** POSSIBLE CODE INCOMPATIBILITY *** - I have renamed the internal
|
||
attributes on ParseResults from 'dict' and 'list' to '__tokdict' and
|
||
'__toklist', to avoid collisions with user-defined data fields named 'dict'
|
||
and 'list'. Any client code that accesses these attributes directly will
|
||
need to be modified. Hopefully the implementation of methods such as keys(),
|
||
items(), len(), etc. on ParseResults will make such direct attribute
|
||
accessess unnecessary.
|
||
|
||
- Added asXML() method to ParseResults. This greatly simplifies the process
|
||
of parsing an input data file and generating XML-structured data.
|
||
|
||
- Added getName() method to ParseResults. This method is helpful when
|
||
a grammar specifies ZeroOrMore or OneOrMore of a MatchFirst or Or
|
||
expression, and the parsing code needs to know which expression matched.
|
||
(Thanks, Eric van der Vlist, for this idea!)
|
||
|
||
- Added items() and values() methods to ParseResults, to better support using
|
||
ParseResults as a Dictionary.
|
||
|
||
- Added parseFile() as a convenience function to parse the contents of an
|
||
entire text file. Accepts either a file name or a file object. (Thanks
|
||
again, Dang!)
|
||
|
||
- Added group() method to And, Or, and MatchFirst, as a short-cut alternative
|
||
to enclosing a construct inside a Group object.
|
||
|
||
- Extended fourFn.py to support exponentiation, and simple built-in functions.
|
||
|
||
- Added EBNF parser to examples, including a demo where it parses its own
|
||
EBNF! (Thanks to Seo Sanghyeon!)
|
||
|
||
- Added Delphi Form parser to examples, dfmparse.py, plus a couple of
|
||
sample Delphi forms as tests. (Well done, Dang!)
|
||
|
||
- Another performance speedup, 5-10%, inspired by Dang! Plus about a 20%
|
||
speedup, by pre-constructing and cacheing exception objects instead of
|
||
constructing them on the fly.
|
||
|
||
- Fixed minor bug when specifying oneOf() with 'caseless=True'.
|
||
|
||
- Cleaned up and added a few more docstrings, to improve the generated docs.
|
||
|
||
|
||
Version 1.1.2 - 21 Mar 2004
|
||
---------------------------
|
||
- Fixed minor bug in scanString(), so that start location is at the start of
|
||
the matched tokens, not at the start of the whitespace before the matched
|
||
tokens.
|
||
|
||
- Inclusion of HTML documentation, generated using Epydoc. Reformatted some
|
||
doc strings to better generate readable docs. (Beautiful work, Ed Loper,
|
||
thanks for Epydoc!)
|
||
|
||
- Minor performance speedup, 5-15%
|
||
|
||
- And on a process note, I've used the unittest module to define a series of
|
||
unit tests, to help avoid the embarrassment of the version 1.1 snafu.
|
||
|
||
|
||
Version 1.1.1 - 6 Mar 2004
|
||
--------------------------
|
||
- Fixed critical bug introduced in 1.1, which broke MatchFirst(!) token
|
||
matching.
|
||
**THANK YOU, SEO SANGHYEON!!!**
|
||
|
||
- Added "from future import __generators__" to permit running under
|
||
pre-Python 2.3.
|
||
|
||
- Added example getNTPservers.py, showing how to use pyparsing to extract
|
||
a text pattern from the HTML of a web page.
|
||
|
||
|
||
Version 1.1 - 3 Mar 2004
|
||
-------------------------
|
||
- ***Changed API*** - While testing out parse actions, I found that the value
|
||
of loc passed in was not the starting location of the matched tokens, but
|
||
the location of the next token in the list. With this version, the location
|
||
passed to the parse action is now the starting location of the tokens that
|
||
matched.
|
||
|
||
A second part of this change is that the return value of parse actions no
|
||
longer needs to return a tuple containing both the location and the parsed
|
||
tokens (which may optionally be modified); parse actions only need to return
|
||
the list of tokens. Parse actions that return a tuple are deprecated; they
|
||
will still work properly for conversion/compatibility, but this behavior will
|
||
be removed in a future version.
|
||
|
||
- Added validate() method, to help diagnose infinite recursion in a grammar tree.
|
||
validate() is not 100% fool-proof, but it can help track down nasty infinite
|
||
looping due to recursively referencing the same grammar construct without some
|
||
intervening characters.
|
||
|
||
- Cleaned up default listing of some parse element types, to more closely match
|
||
ordinary BNF. Instead of the form <classname>:[contents-list], some changes
|
||
are:
|
||
. And(token1,token2,token3) is "{ token1 token2 token3 }"
|
||
. Or(token1,token2,token3) is "{ token1 ^ token2 ^ token3 }"
|
||
. MatchFirst(token1,token2,token3) is "{ token1 | token2 | token3 }"
|
||
. Optional(token) is "[ token ]"
|
||
. OneOrMore(token) is "{ token }..."
|
||
. ZeroOrMore(token) is "[ token ]..."
|
||
|
||
- Fixed an infinite loop in oneOf if the input string contains a duplicated
|
||
option. (Thanks Brad Clements)
|
||
|
||
- Fixed a bug when specifying a results name on an Optional token. (Thanks
|
||
again, Brad Clements)
|
||
|
||
- Fixed a bug introduced in 1.0.6 when I converted quotedString to use
|
||
CharsNotIn; I accidentally permitted quoted strings to span newlines. I have
|
||
fixed this in this version to go back to the original behavior, in which
|
||
quoted strings do *not* span newlines.
|
||
|
||
- Fixed minor bug in HTTP server log parser. (Thanks Jim Richardson)
|
||
|
||
|
||
Version 1.0.6 - 13 Feb 2004
|
||
----------------------------
|
||
- Added CharsNotIn class (Thanks, Lee SangYeong). This is the opposite of
|
||
Word, in that it is constructed with a set of characters *not* to be matched.
|
||
(This enhancement also allowed me to clean up and simplify some of the
|
||
definitions for quoted strings, cStyleComment, and restOfLine.)
|
||
|
||
- **MINOR API CHANGE** - Added joinString argument to the __init__ method of
|
||
Combine (Thanks, Thomas Kalka). joinString defaults to "", but some
|
||
applications might choose some other string to use instead, such as a blank
|
||
or newline. joinString was inserted as the second argument to __init__,
|
||
so if you have code that specifies an adjacent value, without using
|
||
'adjacent=', this code will break.
|
||
|
||
- Modified LineStart to recognize the start of an empty line.
|
||
|
||
- Added optional caseless flag to oneOf(), to create a list of CaselessLiteral
|
||
tokens instead of Literal tokens.
|
||
|
||
- Added some enhancements to the SQL example:
|
||
. Oracle-style comments (Thanks to Harald Armin Massa)
|
||
. simple WHERE clause
|
||
|
||
- Minor performance speedup - 5-15%
|
||
|
||
|
||
Version 1.0.5 - 19 Jan 2004
|
||
----------------------------
|
||
- Added scanString() generator method to ParseElement, to support regex-like
|
||
pattern-searching
|
||
|
||
- Added items() list to ParseResults, to return named results as a
|
||
list of (key,value) pairs
|
||
|
||
- Fixed memory overflow in asList() for deeply nested ParseResults (Thanks,
|
||
Sverrir Valgeirsson)
|
||
|
||
- Minor performance speedup - 10-15%
|
||
|
||
|
||
Version 1.0.4 - 8 Jan 2004
|
||
---------------------------
|
||
- Added positional tokens StringStart, StringEnd, LineStart, and LineEnd
|
||
|
||
- Added commaSeparatedList to pre-defined global token definitions; also added
|
||
commasep.py to the examples directory, to demonstrate the differences between
|
||
parsing comma-separated data and simple line-splitting at commas
|
||
|
||
- Minor API change: delimitedList does not automatically enclose the
|
||
list elements in a Group, but makes this the responsibility of the caller;
|
||
also, if invoked using 'combine=True', the list delimiters are also included
|
||
in the returned text (good for scoped variables, such as a.b.c or a::b::c, or
|
||
for directory paths such as a/b/c)
|
||
|
||
- Performance speed-up again, 30-40%
|
||
|
||
- Added httpServerLogParser.py to examples directory, as this is
|
||
a common parsing task
|
||
|
||
|
||
Version 1.0.3 - 23 Dec 2003
|
||
---------------------------
|
||
- Performance speed-up again, 20-40%
|
||
|
||
- Added Python distutils installation setup.py, etc. (thanks, Dave Kuhlman)
|
||
|
||
|
||
Version 1.0.2 - 18 Dec 2003
|
||
---------------------------
|
||
- **NOTE: Changed API again!!!** (for the last time, I hope)
|
||
|
||
+ Renamed module from parsing to pyparsing, to better reflect Python
|
||
linkage.
|
||
|
||
- Also added dictExample.py to examples directory, to illustrate
|
||
usage of the Dict class.
|
||
|
||
|
||
Version 1.0.1 - 17 Dec 2003
|
||
---------------------------
|
||
- **NOTE: Changed API!**
|
||
|
||
+ Renamed 'len' argument on Word.__init__() to 'exact'
|
||
|
||
- Performance speed-up, 10-30%
|
||
|
||
|
||
Version 1.0.0 - 15 Dec 2003
|
||
---------------------------
|
||
- Initial public release
|
||
|
||
Version 0.1.1 thru 0.1.17 - October-November, 2003
|
||
--------------------------------------------------
|
||
- initial development iterations:
|
||
- added Dict, Group
|
||
- added helper methods oneOf, delimitedList
|
||
- added helpers quotedString (and double and single), restOfLine, cStyleComment
|
||
- added MatchFirst as an alternative to the slower Or
|
||
- added UML class diagram
|
||
- fixed various logic bugs
|