Skip to content

Conversation

@yoff
Copy link
Contributor

@yoff yoff commented May 18, 2020

Add ESSA definitions via copy as sanitiser.
Currently only tested on the FP snapshot.
I would write tests, but I first need to figure out why the unmodified tests fail...

@yoff
Copy link
Contributor Author

yoff commented May 19, 2020

Jenkins seems to see the same as me. No new results, but many results duplicated.

@yoff yoff force-pushed the CopySanitizes branch from d9a6c2b to 7832ed8 Compare May 20, 2020 10:42
@yoff
Copy link
Contributor Author

yoff commented May 20, 2020

Apparently the configuration created the many duplicated results, and apparently it now works without it. Perhaps I fixed the sanitiser in the meantime.

@yoff
Copy link
Contributor Author

yoff commented May 20, 2020

Ok, so tests pass now. I should look into a more robust detection of "value gets copied" I think, the current one only catches the posted FP.

@yoff yoff force-pushed the CopySanitizes branch from 0956713 to a869f91 Compare May 24, 2020 07:10
@yoff yoff marked this pull request as ready for review May 24, 2020 07:14
@yoff yoff requested a review from a team as a code owner May 24, 2020 07:14
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks good, but there are a bunch of subtleties surrounding how some of Python's mutating list methods work that need to be addressed in the tests. Currently, most of the test code doesn't work, and this is a bit unfortunate.

It's always a good idea to actually run the tests (using cpython) once in a while, as the behaviour may surprise you. 😉

def safe_method(x=[]):
return x.count(42)

# Modification of parameter with default (sanitised)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nit: we aim to use US English in our documentation, so get your zs ready for writing sanitization etc. 🙂

@yoff
Copy link
Contributor Author

yoff commented May 25, 2020

Cool, I shall figure out how to run tests via cpython :)

@tausbn
Copy link
Contributor

tausbn commented May 25, 2020

Oh, I didn't notice your earlier comments about the duplicated test results. This is to be expected, due to the fact that you're now using a configuration. In the olden days, we didn't use configurations at all, but relied only on whatever classes you had imported for the given query. To make things a bit more similar to the other languages, we added the configurations. This left us with a bunch of queries that would yield no results, for lack of an appropriate configuration. Because of this, we have a LegacyConfiguration class that takes care of these configuration-less queries. Previously, it would only be active when no other configurations were present, but this turned out to have some unfortunate consequences, so eventually we just made it active all the time. This does mean that you see twice the edges and nodes you saw previously -- one comes from the configuration you provided, and the other from the legacy configuration.

@yoff
Copy link
Contributor Author

yoff commented May 25, 2020

Thanks, Larsen explained this to me also, I guess I could have edited the previous posts to reflect my new state of less confusion.

@yoff
Copy link
Contributor Author

yoff commented May 25, 2020

Rewrote the tests, I had also duplicated some previous ones. And now I can test that test code behaves as I expect inside VSCode :)
Also simplified the query as suggested.

@yoff
Copy link
Contributor Author

yoff commented May 27, 2020

And it came to pass that Jenkins in its wisdom removed the false positive, thus building a better and brighter future, but leaving the developer rather confused.

@yoff
Copy link
Contributor Author

yoff commented May 27, 2020

It is the alert

| functions_test.py:226:5:226:8 | Subscript | functions_test.py:224:51:224:51 | empty mutable value | functions_test.py:226:5:226:8 | empty mutable value | $@ flows to here and is mutated. | functions_test.py:224:51:224:51 | y | Default value |

which I get locally but not on Jenkins. It pertains to this function in function_test.py:

def tuple_unsuccessfully_copy_before_modify(x=[], y=[]):
    t = (x.copy(), y) # Whole tuple tainted here
    t[0].append(42) # FP here, since we taint both t[0] and t[1]
    return t[0]

Apparently, this FP does not occur on Jenkins.

@yoff
Copy link
Contributor Author

yoff commented May 27, 2020

Aha, I can see that I have also received an email from Jenkins about a query change for py/modification-of-default-value
Looks like the most likely reason is the autoformatter!

@yoff
Copy link
Contributor Author

yoff commented May 27, 2020

After having pulled from upstream/master, restarted VSCode, and upgraded the CLI, I still see the FP locally :(

@yoff
Copy link
Contributor Author

yoff commented Jun 4, 2020

Ok, I think I should write down our conclusions so far. This PR currently suffers from three separate issues:

  • Too much flow
    In the program
def tuple_unsuccessfully_copy_before_modify(x=[]):
    t = (copy(x), 3) # Tuple wrongly tainted here
    t[0].append(42) # FP here
    return t[0]

taint flows into the tuple t even though it should be sanitised by the call to copy.

  • Too little flow
    In the program
def tuple_copy_before_modify(x=[]):
    t = (x.copy(), 3) # Tuple not tainted here
    t[0].append(42)
    return t[0]

taint does not flow into the tuple t, but not because it is sanitised. Rather because taint flowing into tuples by the standard mechanism from member requires the flow to be between two ControlFlowNodes. However, TaintTrackingNodes are annotated DataFlow::Nodes and a DataFlow::Node can be either a ControlFlowNode or an EssaVariable. As it happens, taint is flowing to x.copy() but only as represented by an EssaVariable, and so it does not flow further into the tuple.

  • Py2/3 discrepancy
    The program in [Too much flow] does not exhibit too much flow when treated as being Python 2.

I hope that [Too much flow] can be handled by implementing further sanitisers. I Propose that [Too little flow] could be moved to a separate issue, perhaps the current behaviour is even desirable/intended. I think we must solve [Py2/3 discrepancy] before the present PR can be merged.

@RasmusWL
Copy link
Member

RasmusWL commented Jun 4, 2020

I'm confused about [Too little flow], it sounds more like a case of "correct result, but for wrong reasons"?

@yoff
Copy link
Contributor Author

yoff commented Jun 5, 2020

It may be a case of me having untrained expectations of our data flow implementation. I would expect that if taint flows into a component of a tuple, then it flows, in sequence form, into the tuple.

@RasmusWL
Copy link
Member

RasmusWL commented Jun 8, 2020

It may be a case of me having untrained expectations of our data flow implementation. I would expect that if taint flows into a component of a tuple, then it flows, in sequence form, into the tuple.

I'm not sure how to interpret this response 😅

So just to be clear: the function tuple_copy_before_modify is safe, since it does not modify the default value. I would expect that x.copy() is not tainted, so t = (x.copy(), 3) should not make t a tainted tuple either. Which is happening right now?

  1. x.copy() is not tainted, t is not tainted (all good ✔️)
  2. x.copy() is tainted, but t is not tainted because of the SSA thing you mention (not good ❌)

@yoff
Copy link
Contributor Author

yoff commented Jun 9, 2020

  1. x.copy() the ControlFlowNode is not tainted, but an EssaVariable with the same location is

@yoff
Copy link
Contributor Author

yoff commented Jun 24, 2020

I commented out the function that behaves differently between py2/3 for now..

@yoff yoff requested a review from tausbn June 24, 2020 19:11
@adityasharad adityasharad changed the base branch from master to main August 14, 2020 18:34
@yoff
Copy link
Contributor Author

yoff commented Sep 9, 2020

Closing for now as this is getting out-dated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants