Skip to content

Conversation

@aibaars
Copy link
Contributor

@aibaars aibaars commented Mar 18, 2022

This pull request refactors the regular expression libraries for Ruby to match the Python and JavaScript libraries.

I aligned the contents of the regex parser module (ruby/ql/lib/codeql/ruby/regexp/internal/ParseRegExp.qll) and the regex AST module (ruby/ql/lib/codeql/ruby/regexp/internal/RegExpTreeView.qll) as much as possible with the respective Python implementations python/ql/lib/semmle/python/regex.qll and python/ql/lib/semmle/python/RegexTreeView.qll

The additional classes and predicates are moved into ruby/ql/lib/codeql/ruby/Regexp.qll .This module should be used by most users and imports the parser module and re-exports the AST module.

@aibaars aibaars force-pushed the regex-refactor branch 2 times, most recently from 72936fa to ae91423 Compare March 18, 2022 17:43
@aibaars aibaars force-pushed the regex-refactor branch 5 times, most recently from 21c8e6a to 659c009 Compare March 18, 2022 19:36
@github github deleted a comment Mar 18, 2022
@aibaars aibaars force-pushed the regex-refactor branch 2 times, most recently from 438e6c3 to a5e9a7b Compare March 18, 2022 20:28
@aibaars aibaars marked this pull request as ready for review March 21, 2022 10:22
@aibaars aibaars requested review from a team as code owners March 21, 2022 10:22
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only typos.
Why does ReDoS.expected change?

@aibaars
Copy link
Contributor Author

aibaars commented Mar 21, 2022

LGTM, only typos. Why does ReDoS.expected change?

@yoff ReDoS.expected changed because I noticed we were missing handling of \f and \v when I compared the Python and the Ruby versions of the QLL files:
https://github.com/github/codeql/pull/8489/files#diff-19c4ea6adc90edc0f6a1fa956d00494f6a23c1f42eab27f4c80ff5b195cdeccdR495-R497


import regexp.RegExpTreeView // re-export
private import regexp.ParseRegExp
import regexp.internal.RegExpTreeView // re-export
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears dangerous: I would expect I could make all sorts of breaking changes to a file in an internal folder (except perhaps if it contains a Public module that documents it explicitly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I think everything in RegExpTreeView should actually be public. So perhaps I should not put it in the internal folder.

Copy link
Contributor

@nickrolfe nickrolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of trivial/pedantic comments. Other than that, having two different RegExpTreeView.qll files seems undesirable. Can they be merged?

@@ -0,0 +1,143 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the naming of this file: although the Ruby class is spelled Regexp, we've been using a capital E (RegExp) for the QL module/class names, for consistency between languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I considered that, however, for JavaScript the file is named javascript/ql/lib/semmle/javascript/Regexp.qll too.

@aibaars
Copy link
Contributor Author

aibaars commented Mar 22, 2022

Other than that, having two different RegExpTreeView.qll files seems undesirable. Can they be merged?

The one in the performance folder is required for the ReDoS queries. We should probably rename it to ReDoSUtilSpecific.qll or so. We could do the following aibaars#12

@aibaars aibaars requested a review from yoff March 22, 2022 16:26
@aibaars aibaars force-pushed the regex-refactor branch 2 times, most recently from b15a05f to 1d0d166 Compare March 24, 2022 10:33
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvitved
Copy link
Contributor

hvitved commented Mar 28, 2022

Should we have done a DCA run before merging this PR? I am not intimate with the changes, but the diff is quite large.

@yoff
Copy link
Contributor

yoff commented Mar 28, 2022

Should we have done a DCA run before merging this PR? I am not intimate with the changes, but the diff is quite large.

I do not expect any of the changes to have an impact, but yes..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants