Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Handle call-contexts mismatches in cpp/invalid-pointer-deref #13699

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

MathiasVP
Copy link
Contributor

The cpp/invalid-pointer-deref query finds problematic dereferences in two steps:

  1. We first identify flow form an allocation to the construction of a pointer that's out-of-bounds, and then
  2. we find flow from the out-of-bounds pointer to a dereference.

This is split up into two configurations, and whenever we do such a split in a query we risk FPs caused by losing all call-context information when we transition between the two configurations when faced with code that roughly looks like this:

int id(int x) {
  return x; // assume the transition between the two configurations happens here.
}

void f1() {
  int x = source();
  int y = id(x);
}

void f2() {
  int x = 0;
  int y = id(x);
  sink(x);
}

Normally we'd fix this problem by using a flow state to collapse the two configurations into one, but this isn't very easy because step 1 in this query is done using the product-flow library (and not just a normal dataflow configurations).

So this PR fixes the above problem by constructing a final configuration that represents the problematic flow path.

@github-actions github-actions bot added the C++ label Jul 10, 2023
@MathiasVP MathiasVP force-pushed the final-config-to-invalid-pointer-deref branch from d010102 to 3fe58d9 Compare July 10, 2023 12:53
@MathiasVP MathiasVP marked this pull request as ready for review July 10, 2023 12:53
@MathiasVP MathiasVP requested a review from a team as a code owner July 10, 2023 12:53
@MathiasVP MathiasVP requested a review from jketema July 10, 2023 12:53
@MathiasVP
Copy link
Contributor Author

@jketema I think this PR is good to go. I need to verify that the lost results are all from fixing call contexts, but I'm pretty confident that this is the case. Looking forward to trying this out on MRVA 🤞.

@MathiasVP MathiasVP marked this pull request as draft July 11, 2023 09:17
@MathiasVP MathiasVP force-pushed the final-config-to-invalid-pointer-deref branch from 3302647 to 63c5684 Compare July 11, 2023 09:24
@MathiasVP MathiasVP marked this pull request as ready for review July 11, 2023 12:18
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this looks incorrect to me. See comments below.


predicate isBarrier(DataFlow::Node n, FlowState state) { none() }

predicate isAdditionalFlowStep(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually doing anything since we're using invalidPointerToDerefSource as a source anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're starting the flow at the allocation just like before. Note that the first column of invalidPointerToDerefSource (which is the one we use to mark the source in FinalConfig):

predicate isSource(DataFlow::Node source, FlowState state) {
  state = TInitial() and
  exists(DataFlow::Node derefSource |
    invalidPointerToDerefSource(source, _, derefSource, _) and
    InvalidPointerToDerefFlow::flow(derefSource, _)
  )
}

is the dataflow node that represents the allocation:

predicate invalidPointerToDerefSource(
  DataFlow::Node source1, PointerArithmeticInstruction pai, DataFlow::Node derefSource, int delta
) {
  exists(
    AllocToInvalidPointerFlow::PathNode1 pSource1, AllocToInvalidPointerFlow::PathNode1 pSink1,
    AllocToInvalidPointerFlow::PathNode2 pSink2, DataFlow::Node sink1, DataFlow::Node sink2,
    int delta0
  |
    pragma[only_bind_out](pSource1.getNode()) = source1 and
    // ...
    AllocToInvalidPointerFlow::flowPath(pSource1, _, pragma[only_bind_into](pSink1),
      pragma[only_bind_into](pSink2)) and
    // ...
  )
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. Why is all this complexity needed, and why can we just use the source from AllocToInvalidPointerConfig?

Copy link
Contributor Author

@MathiasVP MathiasVP Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be a source that actually flows to a sink. If we just write AllocToInvalidPointerConfig::isSourcePair(source1, _, _, _, _) we'll mark all allocations as sources irregardless of all the filtering that the rest of the query has done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is why this final flow configuration is basically "for free" (as can be seen in the DCA run). There are not a lot of sources/sinks now since this configuration is really jus there to make sure the call contexts match up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with more and more functionality piling into invalidPointerToDerefSource.

This PR doesn't do anything to invalidPointerToDerefSource other than renaming a variable and moving another variable from exists to a parameter.

This is a query structuring issue.

In that case, I'll think about how we can change the structure of the query, then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't do anything to invalidPointerToDerefSource other than renaming a variable and moving another variable from exists to a parameter.

Allowing the predicate to be used in one more non-obvious way. Maybe we should start by actually properly naming the predicate (because it being a source of one of the dataflow configurations is probably the least relevant of all its uses at the moment), and also giving its parameters more descriptive names.

More general, every name that ends in a number should probably get a more descriptive name that does not involve number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are definitely some easy changes to make. I've created an internal issue for these things, and I suggest we delay this PR (as well as #13725) until this has been completed.

Copy link
Contributor

@jketema jketema Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could actually properly simplify the query based on the work you did here. In particular:

  • Could we get rid of TMergedPathNode by adding another state and step to FinalConfig that goes to the final load/store instruction?
  • Could we get rid of InvalidPointerToDerefConfig by merging its functionality into FinalConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get rid of TMergedPathNode in 5e06043. I'm still trying to think about how we can get rid of InvalidPointerToDerefConfig by merging its functionality into FinalConfig.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment seemed to have gone missing. Here you go.

Comment on lines +501 to +502
invalidPointerToDerefSource(source, _, derefSource, _) and
InvalidPointerToDerefFlow::flow(derefSource, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct to me, as the paths we display now no longer start from an allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is a side-effect of the misunderstanding I've tried to clear up here: source is indeed the node that represents the allocation, and when we write:

invalidPointerToDerefSource(source, _, derefSource, _) and
InvalidPointerToDerefFlow::flow(derefSource, _)

we're saying: "the source is the node that represents the allocation, and the derefSource must be a node that flows to a dereference (as identified by InvalidPointerToDerefFlow).

Does that make sense?

MathiasVP and others added 5 commits July 12, 2023 11:57
…ef.ql

Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
…ef.ql

Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants