Python: Copy sanitizes default modification #3502

yoff · 2020-05-18T15:45:00Z

Add ESSA definitions via copy as sanitiser.
Currently only tested on the FP snapshot.
I would write tests, but I first need to figure out why the unmodified tests fail...

yoff · 2020-05-19T07:08:12Z

Jenkins seems to see the same as me. No new results, but many results duplicated.

yoff · 2020-05-20T10:45:07Z

Apparently the configuration created the many duplicated results, and apparently it now works without it. Perhaps I fixed the sanitiser in the meantime.

yoff · 2020-05-20T11:49:33Z

Ok, so tests pass now. I should look into a more robust detection of "value gets copied" I think, the current one only catches the posted FP.

python/ql/src/Functions/ModificationOfParameterWithDefault.ql

tausbn

Generally this looks good, but there are a bunch of subtleties surrounding how some of Python's mutating list methods work that need to be addressed in the tests. Currently, most of the test code doesn't work, and this is a bit unfortunate.

It's always a good idea to actually run the tests (using cpython) once in a while, as the behaviour may surprise you. 😉

python/ql/src/Functions/ModificationOfParameterWithDefault.ql

python/ql/test/query-tests/Functions/general/functions_test.py

tausbn · 2020-05-25T11:00:43Z

python/ql/test/query-tests/Functions/general/functions_test.py

 def safe_method(x=[]):
    return x.count(42)
+
+# Modification of parameter with default (sanitised)


Very minor nit: we aim to use US English in our documentation, so get your zs ready for writing sanitization etc. 🙂

python/ql/test/query-tests/Functions/general/functions_test.py

python/ql/src/Functions/ModificationOfParameterWithDefault.ql

python/ql/test/query-tests/Functions/general/functions_test.py

yoff · 2020-05-25T13:50:42Z

Cool, I shall figure out how to run tests via cpython :)

tausbn · 2020-05-25T13:54:21Z

Oh, I didn't notice your earlier comments about the duplicated test results. This is to be expected, due to the fact that you're now using a configuration. In the olden days, we didn't use configurations at all, but relied only on whatever classes you had imported for the given query. To make things a bit more similar to the other languages, we added the configurations. This left us with a bunch of queries that would yield no results, for lack of an appropriate configuration. Because of this, we have a LegacyConfiguration class that takes care of these configuration-less queries. Previously, it would only be active when no other configurations were present, but this turned out to have some unfortunate consequences, so eventually we just made it active all the time. This does mean that you see twice the edges and nodes you saw previously -- one comes from the configuration you provided, and the other from the legacy configuration.

yoff · 2020-05-25T15:14:31Z

Thanks, Larsen explained this to me also, I guess I could have edited the previous posts to reflect my new state of less confusion.

yoff · 2020-05-25T16:51:40Z

Rewrote the tests, I had also duplicated some previous ones. And now I can test that test code behaves as I expect inside VSCode :)
Also simplified the query as suggested.

yoff · 2020-05-27T04:59:35Z

And it came to pass that Jenkins in its wisdom removed the false positive, thus building a better and brighter future, but leaving the developer rather confused.

yoff · 2020-05-27T05:19:34Z

It is the alert

| functions_test.py:226:5:226:8 | Subscript | functions_test.py:224:51:224:51 | empty mutable value | functions_test.py:226:5:226:8 | empty mutable value | $@ flows to here and is mutated. | functions_test.py:224:51:224:51 | y | Default value |

which I get locally but not on Jenkins. It pertains to this function in function_test.py:

def tuple_unsuccessfully_copy_before_modify(x=[], y=[]):
    t = (x.copy(), y) # Whole tuple tainted here
    t[0].append(42) # FP here, since we taint both t[0] and t[1]
    return t[0]

Apparently, this FP does not occur on Jenkins.

yoff · 2020-05-27T06:03:41Z

Aha, I can see that I have also received an email from Jenkins about a query change for py/modification-of-default-value
Looks like the most likely reason is the autoformatter!

yoff · 2020-05-27T06:22:37Z

After having pulled from upstream/master, restarted VSCode, and upgraded the CLI, I still see the FP locally :(

yoff · 2020-06-04T08:34:10Z

Ok, I think I should write down our conclusions so far. This PR currently suffers from three separate issues:

Too much flow
In the program

def tuple_unsuccessfully_copy_before_modify(x=[]):
    t = (copy(x), 3) # Tuple wrongly tainted here
    t[0].append(42) # FP here
    return t[0]

taint flows into the tuple t even though it should be sanitised by the call to copy.

Too little flow
In the program

def tuple_copy_before_modify(x=[]):
    t = (x.copy(), 3) # Tuple not tainted here
    t[0].append(42)
    return t[0]

taint does not flow into the tuple t, but not because it is sanitised. Rather because taint flowing into tuples by the standard mechanism from member requires the flow to be between two ControlFlowNodes. However, TaintTrackingNodes are annotated DataFlow::Nodes and a DataFlow::Node can be either a ControlFlowNode or an EssaVariable. As it happens, taint is flowing to x.copy() but only as represented by an EssaVariable, and so it does not flow further into the tuple.

Py2/3 discrepancy
The program in [Too much flow] does not exhibit too much flow when treated as being Python 2.

I hope that [Too much flow] can be handled by implementing further sanitisers. I Propose that [Too little flow] could be moved to a separate issue, perhaps the current behaviour is even desirable/intended. I think we must solve [Py2/3 discrepancy] before the present PR can be merged.

RasmusWL · 2020-06-04T15:12:32Z

I'm confused about [Too little flow], it sounds more like a case of "correct result, but for wrong reasons"?

yoff · 2020-06-05T20:25:13Z

It may be a case of me having untrained expectations of our data flow implementation. I would expect that if taint flows into a component of a tuple, then it flows, in sequence form, into the tuple.

RasmusWL · 2020-06-08T16:06:32Z

It may be a case of me having untrained expectations of our data flow implementation. I would expect that if taint flows into a component of a tuple, then it flows, in sequence form, into the tuple.

I'm not sure how to interpret this response 😅

So just to be clear: the function tuple_copy_before_modify is safe, since it does not modify the default value. I would expect that x.copy() is not tainted, so t = (x.copy(), 3) should not make t a tainted tuple either. Which is happening right now?

x.copy() is not tainted, t is not tainted (all good ✔️)
x.copy() is tainted, but t is not tainted because of the SSA thing you mention (not good ❌)

yoff · 2020-06-09T11:11:18Z

x.copy() the ControlFlowNode is not tainted, but an EssaVariable with the same location is

python/ql/src/Functions/ModificationOfParameterWithDefault.ql

to obtain code scan config

yoff · 2020-06-24T19:11:42Z

I commented out the function that behaves differently between py2/3 for now..

yoff · 2020-09-09T15:33:28Z

Closing for now as this is getting out-dated.

yoff added Python false-positive labels May 18, 2020

yoff force-pushed the CopySanitizes branch from d9a6c2b to 7832ed8 Compare May 20, 2020 10:42

yoff added 3 commits May 24, 2020 08:53

Python: Tests for modification of default value

35fe852

Python: Sanitise default value on copy

744abe1

Python: Add taint tracking configuration

a869f91

yoff force-pushed the CopySanitizes branch from 0956713 to a869f91 Compare May 24, 2020 07:10

yoff marked this pull request as ready for review May 24, 2020 07:14

yoff requested a review from a team as a code owner May 24, 2020 07:14

yoff commented May 24, 2020

View reviewed changes

python/ql/src/Functions/ModificationOfParameterWithDefault.ql Outdated Show resolved Hide resolved

tausbn requested changes May 25, 2020

View reviewed changes

yoff added 2 commits May 25, 2020 18:06

Python: more realistic tests

ae1731b

Python: better predicate name, simpler expression

1254f02

yoff added 2 commits May 26, 2020 13:59

Merge branch 'master' of github.com:github/codeql into CopySanitizes

98cb206

Python: Explain why copying sanitizes.

3723c73

Merge branch 'master' of github.com:github/codeql into CopySanitizes

b8b9971

Python: Another FP, interaction of call and tuple

b81f380

yoff added 2 commits June 11, 2020 07:22

Python "sanitised" -> "sanitized"

d5abda3

Python: try sanitizingNode

8282e81

yoff commented Jun 11, 2020

View reviewed changes

python/ql/src/Functions/ModificationOfParameterWithDefault.ql Outdated Show resolved Hide resolved

yoff added 3 commits June 24, 2020 11:35

Python: remove superflous case and format

8558809

Merge branch 'master' of github.com:github/codeql into CopySanitizes

587b833

to obtain code scan config

Python: remove function which differs on py2/3

a0aeede

yoff requested a review from tausbn June 24, 2020 19:11

yoff added 2 commits August 5, 2020 15:07

Merge branch 'master' of github.com:github/codeql into CopySanitizes

fd4ee06

Python: format ql

3a1b87f

adityasharad changed the base branch from master to main August 14, 2020 18:34

yoff closed this Sep 9, 2020

tausbn mentioned this pull request Jan 22, 2024

Python: Fix deepcopy mutable default FP #15397

Merged

Python: Copy sanitizes default modification #3502

Python: Copy sanitizes default modification #3502

Uh oh!

Conversation

yoff commented May 18, 2020

Uh oh!

yoff commented May 19, 2020

Uh oh!

yoff commented May 20, 2020

Uh oh!

yoff commented May 20, 2020

Uh oh!

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tausbn May 25, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoff commented May 25, 2020

Uh oh!

tausbn commented May 25, 2020

Uh oh!

yoff commented May 25, 2020

Uh oh!

yoff commented May 25, 2020

Uh oh!

yoff commented May 27, 2020

Uh oh!

yoff commented May 27, 2020

Uh oh!

yoff commented May 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoff commented May 27, 2020

Uh oh!

yoff commented Jun 4, 2020

Uh oh!

RasmusWL commented Jun 4, 2020

Uh oh!

yoff commented Jun 5, 2020

Uh oh!

RasmusWL commented Jun 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoff commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yoff commented Jun 24, 2020

Uh oh!

yoff commented Sep 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yoff commented May 27, 2020 •

edited

Loading

RasmusWL commented Jun 8, 2020 •

edited

Loading

yoff commented Jun 9, 2020 •

edited

Loading