Skip to content

Conversation

@RasmusWL
Copy link
Member

For #3267 and #3485 I needed to model Python subclasses of dict -- and that turned out to be a bit more tricky than I had anticipated (see example in 2d8627c). The core of the problem is that currently, you can't make QL subclasses of DictKind that are non-overlapping.

This PR is recording my progress in trying to solve this problem. It is currently a draft, since I still don't have a good solution.


My initial example was a bit too simple, so I thought I had found a usable work-around solution, but with an example where the TaintKind of the elements is not static (508e6e1), both original work-around solutions fell short 😕

The third solution with using a delegate inside the custom TaintKind also had problems. It can work, but is not as expressive as I would like it to be.

So now I'm off to give it a go with IPA types.

Solutions that didn't work

Didn't work 1: Abstract FooDictKind, with 2 subclasses

abstract class FooDictKind extends TaintKind {
    bindingset[this]
    FooDictKind() { any() }
}

private module FooDictKind {
    class SubclassSpecific extends FooDictKind {
        SubclassSpecific() { this = "FooDictKind::SubclassSpecific" }

        override TaintKind getTaintOfAttribute(string name) {
            name = "foo" and result instanceof ExternalFileObject
        }
    }

    class DictPart extends FooDictKind, ExternalStringDictKind { }
}

class FooDictSource extends TaintSource {
    FooDictSource() { this.(NameNode).getId() = "FOO_DICT" }

    override predicate isSourceOf(TaintKind kind) {
        kind instanceof FooDictKind
    }
}

However, this is not general enough. If we can't determine the value-type ahead-of-time, we will have to make a definition like so:

private module FooDictKind {
    class DictPart extends FooDictKind, DictKind { }
}

usually we would restrict the actual TainKind inside the dict in the source, as such:

override predicate isSourceOf(TaintKind kind) {
    kind.(FooDictKind).getValue() instanceof ExternalStringKind
}

but we can't, since FooDictKind is not a DictKind, we can't access the .getValue :( so IT HAS TO BE

override predicate isSourceOf(TaintKind kind) {
    kind instanceof FooDictKind
}

and that is simply not good enough :|

Didn't work 2: source uses or

What is currently used in the code of this PR. Isn't expressive enough, as shown by AttrBasedDictKind :\

/** The subclass specific part of BarDict */
class BarDictKind extends TaintKind {
    BarDictKind() { this = "BarDictKind" }

    override TaintKind getTaintOfAttribute(string name) {
        name = "bar" and result instanceof ExternalFileObject
    }
}

class BarDictSource extends TaintSource {
    BarDictSource() { this.(NameNode).getId() = "BAR_DICT" }

    override predicate isSourceOf(TaintKind kind) {
        kind instanceof BarDictKind
        or
        kind.(DictKind).getValue() instanceof ExternalStringKind
    }
}

This has the problem that if your Python subclass overwrites the behavior of __iter__, you can override getTaintForIteration in the subclass specific part, but there is no way to prevent the getTaintForIteration from DictKind to also be used (not as much a problem for DictKind right now, but it would be for a subclass of list and SequenceKind).

Didn't work 3: Using delegation

When I talked with @aschackmull about how to handle this, we discussed the idea of using a delegate.

I imagined it would be something like

class FooDictKind extends TaintKind {
    DictKind dict;

    TaintKind getValue() { result = dict.getValue() }

    FooDictKind() { this = "FooDictKind" }

    override TaintKind getTaintOfAttribute(string name) {
        result = dict.getTaintOfAttribute(name)
        or
        name = "foo" and result instanceof ExternalFileObject
    }

    override TaintKind getTaintOfMethodResult(string name) {
        result = dict.getTaintOfMethodResult(name)
    }

    override TaintKind getTaintForFlowStep(ControlFlowNode fromnode, ControlFlowNode tonode) {
        result = dict.getTaintForFlowStep(fromnode, tonode)
    }

    // ...
}

class FooDictSource extends TaintSource {
    FooDictSource() { this.(NameNode).getId() = "FOO_DICT" }

    override predicate isSourceOf(TaintKind kind) {
        kind.(FooDictKind).getValue() instanceof ExternalStringKind
    }
}

and while this could work, there is a major problem: Since we don't restrict the relation between the dict field and this, the snippet kind.(FooDictKind).getValue() instanceof ExternalStringKind doesn't do what we really want it to -- se condensed example in https://lgtm.com/query/6077904279195063093/.

An obvious solution is to change the char-pred to do this = "FooDictKind" + dict; however, since a FooDictKind can be the value inside a dict, we end up with infinite recursion :confused

If we know all the possible kinds we could want our dict to contain beforehand, we could change the char-pred to do

FooDictKind() {
    exists(TaintKind valueKind | valueKind in [KindA, KindB]
    |
        dict.getValue() = valueKind and
        this = "FooDictKind" + dict
    )
}

and while that is sort of annoying having to do, it could make this trick work.

Well, actually there will be one problem, since our taint-tracking implementation code only handles dictionary lookups (with subscripts) when the TaintKind is a CollectionKind (and our new FooDictKind isn't). See

I'm not 100% sure why we have this flowToMember/flowFromMember setup instead of just using getTaintForFlowStep, but this is the way it is today :|

Bad solutions

Bad solution 1

Add class-specific suffix to the string defining the TaintKind. So we could define our classes as

class FooDictKind extends DictKind {
    FooDictKind() { this = "{" + this.getValue() + "!" + "FooDictKind" }
}

class BarDictKind extends DictKind {
    BarDictKind() { this = "{" + this.getValue() + "!" + "BarDictKind" }
}

This requires a change to the char-pred of DictKind to only look at the prefix, such as:

DictKind() { this.indexOf("{" + valueKind) = 0 }

Bad solution 2

Duplicate the code for DictKind for FooDictKind, BarDictKind, and any new dict subclass.

RasmusWL added 6 commits May 19, 2020 17:53
we need to discuss which approach to take
The initial approach for FooDictKind will not work in the general case. If we can't determine the value-type ahead-of-time, we will have to make a definition like so:

```
private module FooDictKind {
    class DictPart extends FooDictKind, DictKind { }
}
```

usually that means we will restrict the kinds of taint that can be inside the dict in the source, as such:

```
override predicate isSourceOf(TaintKind kind) {
    kind.(FooDictKind).getValue() instanceof ExternalStringKind
}
```

but actually we can't, since `FooDictKind` is _not_ a `DictKind`, we can't access the `.getValue` :( so IT HAS TO BE

```
override predicate isSourceOf(TaintKind kind) {
    kind instanceof FooDictKind
}
```

and that is simply not good enough :|
I should really do that all over, but I'm taking it one step at a time right
now
@RasmusWL
Copy link
Member Author

Right, so I played around with Algebraic datatypes today, and I ended up concluding I couldn't really use them to solve this case (at least not without changing everything else as well).

Basically, I tried to make CollectionKind also extend a new algebraic datatype, and that just didn't work since something can't both be an instance of an algebraic datatype and a string 😅

private newtype TFoo = <...>

class Foo extends string, TFoo { ... } // does not work

One solution would be to change the super type of TaintKind to be something else than string, but I couldn't fully gauge the consequences of doing so.

Will need to discuss with the rest of the team how to move this forward.

@adityasharad adityasharad changed the base branch from master to main August 14, 2020 18:34
@RasmusWL
Copy link
Member Author

RasmusWL commented Mar 2, 2021

Closing this off since I'm not planning to pick this up again -- and probably never will

@RasmusWL RasmusWL closed this Mar 2, 2021
@RasmusWL RasmusWL deleted the python-allow-modeling-of-dict-subclasses branch March 2, 2021 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant