=============================
What's New in Pyparsing 3.2.x
=============================

:author: Paul McGuire

:date: September, 2025

:abstract: This document summarizes the changes made
    in the 3.2.x releases of pyparsing.

.. contents::   :depth: 4


Supported Python versions
=========================

- Added support for Python 3.13 and 3.14.

- Python versions before 3.9 are no longer supported.
  Removed legacy Py2.x support and other deprecated features. Pyparsing
  now requires Python 3.9 or later. If you are using an earlier 3.x
  version of Python, use pyparsing 3.1; for Python 2.x, use Pyparsing
  2.4.7.


New Features
============

- Added type annotations to remainder of ``pyparsing`` package, and added ``mypy``
  run to ``tox.ini``, so that type annotations are now run as part of pyparsing's CI.

- Exception message format can now be customized, by overriding
  ``ParseBaseException.format_message``::

      def custom_exception_message(exc) -> str:
          found_phrase = f", found {exc.found}" if exc.found else ""
          return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"

      ParseBaseException.formatted_message = custom_exception_message

- ``run_tests`` now detects if an exception is raised in a parse action, and will
  report it with an enhanced error message, with the exception type, string,
  and parse action name.

- ``QuotedString`` now handles translation of escaped integer, hex, octal, and
  Unicode sequences to their corresponding characters.

- Defined a more performant regular expression used internally by ``common_html_entity``.

- ``Regex`` instances can now be created using a callable that takes no arguments
  and just returns a string or a compiled regular expression, so that creating complex
  regular expression patterns can be deferred until they are actually used for the first
  time in the parser.

- Fixed the displayed output of ``Regex`` terms to deduplicate repeated backslashes,
  for easier reading in debugging, printing, and railroad diagrams.

- Railroad diagramming improvements

  - Updated generated railroad diagrams to make non-terminal elements links to their related
    sub-diagrams. This *greatly* improves navigation of the diagram, especially for
    large, complex parsers.

  - Fixed railroad diagrams that get generated with a parser containing a `Regex` element
    defined using a verbose pattern - the pattern gets flattened and comments removed
    before creating the corresponding diagram element.

  - Added optional argument `show_hidden` to `ParserElement.create_diagram()` to show
    elements that are used internally by pyparsing, but are not part of the actual
    parser grammar. For instance, the `Tag` class can insert values into the parsed
    results but it does not actually parse any input, so by default it is not included
    in a railroad diagram. By calling `create_diagram()` with `show_hidden = True`,
    these internal elements will be included. (You can see this in the tag_metadata.py
    script in the examples directory.)

  - Fixed the displayed output of `Regex` terms to deduplicate repeated backslashes,
    for easier reading in debugging, printing, and railroad diagrams.

  - Simplified railroad diagrams emitted for parsers using `infix_notation()`, by hiding
    lookahead terms. Renamed internally generated expressions for clarity, and improved
    diagramming.


API Changes
===========

Possible breaking changes
-------------------------
- Fixed code in ``ParseElementEnhance`` subclasses that
  replaced detailed exception messages raised in contained expressions with a
  less-specific and less-informative generic exception message and location.

  If your code has conditional logic based on the message content in raised
  ``ParseExceptions``, this bugfix may require changes in your code.

- Fixed bug in ``transform_string()`` where whitespace
  in the input string was not properly preserved in the output string.

  If your code uses ``transform_string``, this bugfix may require changes in
  your code.

- Fixed bug where an ``IndexError`` raised in a parse action was
  incorrectly handled as an ``IndexError`` raised as part of the ``ParserElement``
  parsing methods, and reraised as a ``ParseException``. Now an ``IndexError``
  that raises inside a parse action will properly propagate out as an ``IndexError``.

  If your code raises ``IndexError`` in parse actions, this bugfix may require
  changes in your code.


Additional API changes
----------------------
- Added optional ``flatten`` Boolean argument to ``ParseResults.as_list()``, to
  return the parsed values in a flattened list.

- Added ``indent`` and ``base_1`` arguments to ``pyparsing.testing.with_line_numbers``. When
  using ``with_line_numbers`` inside a parse action, set ``base_1`` =False, since the
  reported ``loc`` value is 0-based. ``indent`` can be a leading string (typically of
  spaces or tabs) to indent the numbered string passed to ``with_line_numbers``.


New / Enhanced Examples
=======================
- Added query syntax to ``mongodb_query_expression.py`` with:

  - better support for array fields ("contains all",
    "contains any", and "contains none")
  - "like" and "not like" operators to support SQL "%" wildcard matching
    and "=~" operator to support regex matching
  - text search using "search for"
  - dates and datetimes as query values
  - ``a[0]`` style array referencing

- Added ``lox_parser.py`` example, a parser for the Lox language used as a tutorial in
  Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/).

- Added ``complex_chemical_formulas.py`` example, to add parsing capability for
  formulas such as "Ba(BrO₃)₂·H₂O".

- Updated ``tag_emitter.py`` to use new ``Tag`` class, introduced in pyparsing
  3.1.3.


Acknowledgments
===============
Again, thanks to the many contributors who submitted issues, questions, suggestions,
and PRs.
