New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Handle call-contexts mismatches in cpp/invalid-pointer-deref
#13699
base: main
Are you sure you want to change the base?
C++: Handle call-contexts mismatches in cpp/invalid-pointer-deref
#13699
Conversation
d010102
to
3fe58d9
Compare
|
@jketema I think this PR is good to go. I need to verify that the lost results are all from fixing call contexts, but I'm pretty confident that this is the case. Looking forward to trying this out on MRVA |
…ent conflation of paths.
3302647
to
63c5684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this looks incorrect to me. See comments below.
cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql
Outdated
Show resolved
Hide resolved
cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql
Outdated
Show resolved
Hide resolved
cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql
Outdated
Show resolved
Hide resolved
|
|
||
| predicate isBarrier(DataFlow::Node n, FlowState state) { none() } | ||
|
|
||
| predicate isAdditionalFlowStep( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually doing anything since we're using invalidPointerToDerefSource as a source anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're starting the flow at the allocation just like before. Note that the first column of invalidPointerToDerefSource (which is the one we use to mark the source in FinalConfig):
predicate isSource(DataFlow::Node source, FlowState state) {
state = TInitial() and
exists(DataFlow::Node derefSource |
invalidPointerToDerefSource(source, _, derefSource, _) and
InvalidPointerToDerefFlow::flow(derefSource, _)
)
}is the dataflow node that represents the allocation:
predicate invalidPointerToDerefSource(
DataFlow::Node source1, PointerArithmeticInstruction pai, DataFlow::Node derefSource, int delta
) {
exists(
AllocToInvalidPointerFlow::PathNode1 pSource1, AllocToInvalidPointerFlow::PathNode1 pSink1,
AllocToInvalidPointerFlow::PathNode2 pSink2, DataFlow::Node sink1, DataFlow::Node sink2,
int delta0
|
pragma[only_bind_out](pSource1.getNode()) = source1 and
// ...
AllocToInvalidPointerFlow::flowPath(pSource1, _, pragma[only_bind_into](pSink1),
pragma[only_bind_into](pSink2)) and
// ...
)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. Why is all this complexity needed, and why can we just use the source from AllocToInvalidPointerConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be a source that actually flows to a sink. If we just write AllocToInvalidPointerConfig::isSourcePair(source1, _, _, _, _) we'll mark all allocations as sources irregardless of all the filtering that the rest of the query has done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why this final flow configuration is basically "for free" (as can be seen in the DCA run). There are not a lot of sources/sinks now since this configuration is really jus there to make sure the call contexts match up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with more and more functionality piling into
invalidPointerToDerefSource.
This PR doesn't do anything to invalidPointerToDerefSource other than renaming a variable and moving another variable from exists to a parameter.
This is a query structuring issue.
In that case, I'll think about how we can change the structure of the query, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't do anything to invalidPointerToDerefSource other than renaming a variable and moving another variable from exists to a parameter.
Allowing the predicate to be used in one more non-obvious way. Maybe we should start by actually properly naming the predicate (because it being a source of one of the dataflow configurations is probably the least relevant of all its uses at the moment), and also giving its parameters more descriptive names.
More general, every name that ends in a number should probably get a more descriptive name that does not involve number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are definitely some easy changes to make. I've created an internal issue for these things, and I suggest we delay this PR (as well as #13725) until this has been completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could actually properly simplify the query based on the work you did here. In particular:
- Could we get rid of
TMergedPathNodeby adding another state and step toFinalConfigthat goes to the final load/store instruction? - Could we get rid of
InvalidPointerToDerefConfigby merging its functionality intoFinalConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to get rid of TMergedPathNode in 5e06043. I'm still trying to think about how we can get rid of InvalidPointerToDerefConfig by merging its functionality into FinalConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment seemed to have gone missing. Here you go.
| invalidPointerToDerefSource(source, _, derefSource, _) and | ||
| InvalidPointerToDerefFlow::flow(derefSource, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct to me, as the paths we display now no longer start from an allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is a side-effect of the misunderstanding I've tried to clear up here: source is indeed the node that represents the allocation, and when we write:
invalidPointerToDerefSource(source, _, derefSource, _) and
InvalidPointerToDerefFlow::flow(derefSource, _)we're saying: "the source is the node that represents the allocation, and the derefSource must be a node that flows to a dereference (as identified by InvalidPointerToDerefFlow).
Does that make sense?
…ef.ql Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
…ef.ql Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
…w node has a better 'toString'.
The
cpp/invalid-pointer-derefquery finds problematic dereferences in two steps:This is split up into two configurations, and whenever we do such a split in a query we risk FPs caused by losing all call-context information when we transition between the two configurations when faced with code that roughly looks like this:
Normally we'd fix this problem by using a flow state to collapse the two configurations into one, but this isn't very easy because step 1 in this query is done using the product-flow library (and not just a normal dataflow configurations).
So this PR fixes the above problem by constructing a final configuration that represents the problematic flow path.