Python: How to make TaintKinds for dict subclasses #3545
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For #3267 and #3485 I needed to model Python subclasses of dict -- and that turned out to be a bit more tricky than I had anticipated (see example in 2d8627c). The core of the problem is that currently, you can't make QL subclasses of
DictKindthat are non-overlapping.This PR is recording my progress in trying to solve this problem. It is currently a draft, since I still don't have a good solution.
My initial example was a bit too simple, so I thought I had found a usable work-around solution, but with an example where the TaintKind of the elements is not static (508e6e1), both original work-around solutions fell short 😕
The third solution with using a delegate inside the custom TaintKind also had problems. It can work, but is not as expressive as I would like it to be.
So now I'm off to give it a go with IPA types.
Solutions that didn't work
Didn't work 1: Abstract
FooDictKind, with 2 subclassesHowever, this is not general enough. If we can't determine the value-type ahead-of-time, we will have to make a definition like so:
usually we would restrict the actual TainKind inside the dict in the source, as such:
but we can't, since
FooDictKindis not aDictKind, we can't access the.getValue:( so IT HAS TO BEand that is simply not good enough :|
Didn't work 2: source uses
orWhat is currently used in the code of this PR. Isn't expressive enough, as shown by
AttrBasedDictKind:\This has the problem that if your Python subclass overwrites the behavior of
__iter__, you can overridegetTaintForIterationin the subclass specific part, but there is no way to prevent thegetTaintForIterationfromDictKindto also be used (not as much a problem for DictKind right now, but it would be for a subclass oflistandSequenceKind).Didn't work 3: Using delegation
When I talked with
@aschackmullabout how to handle this, we discussed the idea of using a delegate.I imagined it would be something like
and while this could work, there is a major problem: Since we don't restrict the relation between the
dictfield andthis, the snippetkind.(FooDictKind).getValue() instanceof ExternalStringKinddoesn't do what we really want it to -- se condensed example in https://lgtm.com/query/6077904279195063093/.An obvious solution is to change the char-pred to do
this = "FooDictKind" + dict; however, since a FooDictKind can be the value inside a dict, we end up with infinite recursion :confusedIf we know all the possible kinds we could want our dict to contain beforehand, we could change the char-pred to do
and while that is sort of annoying having to do, it could make this trick work.
Well, actually there will be one problem, since our taint-tracking implementation code only handles dictionary lookups (with subscripts) when the TaintKind is a CollectionKind (and our new
FooDictKindisn't). Seecodeql/python/ql/src/semmle/python/dataflow/Implementation.qll
Lines 333 to 335 in 727cde3
codeql/python/ql/src/semmle/python/dataflow/TaintTracking.qll
Lines 321 to 323 in 2fad5e8
I'm not 100% sure why we have this
flowToMember/flowFromMembersetup instead of just usinggetTaintForFlowStep, but this is the way it is today :|Bad solutions
Bad solution 1
Add class-specific suffix to the string defining the TaintKind. So we could define our classes as
This requires a change to the char-pred of DictKind to only look at the prefix, such as:
Bad solution 2
Duplicate the code for DictKind for FooDictKind, BarDictKind, and any new dict subclass.