Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small validator utility for PEG grammars #23519

Merged
merged 1 commit into from Dec 26, 2020
Merged

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Nov 26, 2020

One common gotcha of PEG grammars is having two alternatives in a rule in which one that appears first is contained in one that appears last. In this scenario, the second one will never match as if there is input text that matches it, it will also match the first and that comes first.

To avoid this problem, this PR create an initial version of a small utility that detects these cases and raises to alert the user.

We don't need to detect all cases, only the ones that is easy enough to implement. The current form is just an initial prototype that does the matching based in the string representation. To make this better we need to improve the matching algorithm into a visitor to allow checking rules that contain options. For example:

some_rule:
    | NAME '+' NAME 'blech'?
    | NAME '+' NAME ';'

One "straightforward enough" way to do this is to generate all possible string representations: with and without optional and doing substring matching, but at this point a visitor that visits both alternatives at the same time and advances accordingly is probably better.

We could also support some rule expansion so we also detect something like this:

some_rule:
    | NAME '+' NAME
    | NAME some_other_rule
some_other_rule:
   '+' NAME ';'
@lysnikolaou
Copy link
Contributor

@lysnikolaou lysnikolaou commented Nov 27, 2020

This is really great, @pablogsal! Thanks for putting in the time, again! 🚀

There are indeed many things we could do here and many different approaches we could follow. I'd certainly be okay with merging this as is and maybe start building on top of it to support things like rule expansion.

@lysnikolaou
Copy link
Contributor

@lysnikolaou lysnikolaou commented Nov 27, 2020

Another thought is to link to some explanation of why the alternative will never be reached in the error message, since most people that will fall into these kind of traps won't be too familiar with the PEG formalism.

@pablogsal
Copy link
Member Author

@pablogsal pablogsal commented Nov 27, 2020

Another thought is to link to some explanation of why the alternative will never be reached in the error message, since most people that will fall into these kind of traps won't be too familiar with the PEG formalism.

I am currently working on a document for the devguide on how to work with PEG grammars 🤫

@pablogsal pablogsal marked this pull request as ready for review Dec 26, 2020
@pablogsal
Copy link
Member Author

@pablogsal pablogsal commented Dec 26, 2020

@lysnikolaou I plan to go ahead with this so we can start building up afterwards if that is ok with you.

@lysnikolaou
Copy link
Contributor

@lysnikolaou lysnikolaou commented Dec 26, 2020

Of course! Go for it.

@pablogsal pablogsal merged commit 3bcc4ea into python:master Dec 26, 2020
10 checks passed
10 checks passed
Check for source changes
Details
Check if generated files are up to date
Details
Windows (x86)
Details
Windows (x64)
Details
macOS
Details
Ubuntu
Details
Azure Pipelines PR #20201226.8 succeeded
Details
Travis CI - Pull Request Build Passed
Details
bedevere/issue-number Issue report skipped
bedevere/news "skip news" label found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.