Skip to content

C#: Sign analysis#4193

Merged
tamasvajk merged 3 commits intogithub:mainfrom
tamasvajk:feature/sign-analysis
Sep 22, 2020
Merged

C#: Sign analysis#4193
tamasvajk merged 3 commits intogithub:mainfrom
tamasvajk:feature/sign-analysis

Conversation

@tamasvajk
Copy link
Contributor

@tamasvajk tamasvajk commented Sep 2, 2020

Sign analysis ported from Java. It is going to be used by the ported range analysis. The public API contains four predicates: positive(Expr e), negative(Expr e), strictlyPositive(Expr e), strictlyNegative(Expr e). Range analysis doesn't need an isZero(Expr e), so it's not added for the time being.

The code is a 1:1 migration from the Java implementation. Java had no direct tests on it, I added a single one for it, because I was anyways debugging through the Java code. I added a lot more test for C#.

@tamasvajk tamasvajk added the C# label Sep 2, 2020
}

private predicate containerSizeAccess(PropertyAccess pa) {
propertyOverrides(pa.getTarget(), "System.Collections.Generic.ICollection<>", "Count") or
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go to a helper file, because we use it in ContainerSizeCmpZero.ql too. But I couldn't find a good location for it. (Also, the check is using a slightly different variant.

exists(AssignOperation a | a.getLValue() = f.getAnAccess() | result = exprSign(a))
or
f.fromSource() and not exists(f.getInitializer()) and result = TZero()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java also handles access to the field through reflection.

@tamasvajk tamasvajk requested a review from a team September 2, 2020 15:38
@tamasvajk tamasvajk marked this pull request as ready for review September 2, 2020 15:39
@tamasvajk tamasvajk marked this pull request as draft September 2, 2020 21:00
@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch from 381d906 to 1267db9 Compare September 3, 2020 08:54
@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch 4 times, most recently from 61bf4b5 to 45db4d4 Compare September 3, 2020 14:11
@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch from d3270c8 to eedaab7 Compare September 4, 2020 08:16
@tamasvajk tamasvajk requested a review from a team September 4, 2020 08:40
@tamasvajk tamasvajk marked this pull request as ready for review September 4, 2020 08:48
@tamasvajk tamasvajk requested a review from a team as a code owner September 4, 2020 08:48
@tamasvajk tamasvajk added the Java label Sep 8, 2020
@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch from 6183a69 to d195bb7 Compare September 8, 2020 20:11
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @tamasvajk . Great to have this logic shared with Java.

Comment on lines 257 to 256
not exists(SsaReadPositionBlock bb | getAnExpression(bb) = e) and
result = ssaDefSign(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure why this disjunct is needed. If it is, could we have a test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschackmull can you help me out with this question? This is the disjunct before the refactoring: https://github.com/github/codeql/blob/main/java/ql/src/semmle/code/java/dataflow/SignAnalysis.qll#L482.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the disjuncts surrounding this case then you can see some hints. We split into cases based on Expr e and these first couple of cases all cover the VarAccess cases. The most precise case is if this is a well-behaved read of an SSA variable, then there are also cases for non-SSA variable accesses, which are split into fields and non-fields. However, the SSA case would like to assume the existence of a BasicBlock containing the VarAccess in order to look at the SsaReadPosition, and there are a few cases where the Java SSA library can produce an (SSA-variable, SSA-read) pair outside the CFG, for instance in compile-time evaluated expressions in case constants. So in order for all VarAccess expressions to receive a sign we have to include this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I thought it was weird to have both getARead(v) = e and not exists(SsaReadPositionBlock bb | getAnExpression(bb) = e) at the same time, i.e. an SSA read without a basic block. It sounds like a corner case in Java then.

@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch 3 times, most recently from 4fbca9f to a021769 Compare September 17, 2020 09:49
@tamasvajk
Copy link
Contributor Author

@hvitved I pushed some extra commits to address your review. The unsigned integer improvements are not done yet. I think I covered the rest.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the private imports are potentially breaking changes, so should be reverted.

@tamasvajk
Copy link
Contributor Author

@hvitved I reverted the last commit, and only made the imports private in the new code, so that I don't introduce breaking changes.

@hvitved
Copy link
Contributor

hvitved commented Sep 17, 2020

The unsigned integer improvements are not done yet

If you prefer, we could do that in a follow-up PR.

@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch from 67b76a2 to 92f9bdc Compare September 17, 2020 14:46
@tamasvajk
Copy link
Contributor Author

I don't mind handling the unsigned integers in this PR. I'll work on that tomorrow.

Also, I pushed some extra test cases that are not covered. I'll fix them, and try to find more such cases.

exists(string charlit | charlit = e.(CharacterLiteral).getValue() |
if charlit = "\\0" or charlit = "\\u0000" then result = TZero() else result = TPos()
exists(string charlit | charlit = getCharValue(e) |
if charlit = "\\0" or charlit = "\\u0000" or charlit.regexpMatch("\\u0000")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both disjuncts charlit = "\\u0000" or charlit.regexpMatch("\\u0000"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shared between Java and C#. And I don't know what gets extracted for Java. @aschackmull Can you tell me what gets extracted for (char)0 in Java? See this ticket and the slack discussion for context: https://github.com/github/codeql-coreql-team/issues/539

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(char) 0 is just a cast of a 32 bit literal to an unsigned 16 bit value as far as I know. In Java you can put NUL characters in the source code. You can write (where ^@ represents a real unescaped NUL symbol):

class X {

String normal = "\0";
String unicode = "\u0000";
String nullChar = "^@";
}

I had expected unicode and nullChar to be extracted similarly because as far as I know unicode escape sequences are replaced before lexical analysis. However, in the extractor output the normal and unicode lines are extracted similarly, while the last is different.

The generated TRAP output is

#10000=@"/home/arthur/Devel/code/ql/java/ql/test/query-tests/Nullness/X.java;sourcefile"
files(#10000,"/home/arthur/Devel/code/ql/java/ql/test/query-tests/Nullness/X.java","X","java",0)
#10001=@"/home/arthur/Devel/code/ql/java/ql/test/query-tests/Nullness;folder"
folders(#10001,"/home/arthur/Devel/code/ql/java/ql/test/query-tests/Nullness","Nullness")
#10002=@"/home/arthur/Devel/code/ql/java/ql/test/query-tests;folder"
folders(#10002,"/home/arthur/Devel/code/ql/java/ql/test/query-tests","query-tests")
#10003=@"/home/arthur/Devel/code/ql/java/ql/test;folder"
folders(#10003,"/home/arthur/Devel/code/ql/java/ql/test","test")
#10004=@"/home/arthur/Devel/code/ql/java/ql;folder"
folders(#10004,"/home/arthur/Devel/code/ql/java/ql","ql")
#10005=@"/home/arthur/Devel/code/ql/java;folder"
folders(#10005,"/home/arthur/Devel/code/ql/java","java")
#10006=@"/home/arthur/Devel/code/ql;folder"
folders(#10006,"/home/arthur/Devel/code/ql","ql")
#10007=@"/home/arthur/Devel/code;folder"
folders(#10007,"/home/arthur/Devel/code","code")
#10008=@"/home/arthur/Devel;folder"
folders(#10008,"/home/arthur/Devel","Devel")
#10009=@"/home/arthur;folder"
folders(#10009,"/home/arthur","arthur")
#10010=@"/home;folder"
folders(#10010,"/home","home")
#10011=@"/;folder"
folders(#10011,"/","")
containerparent(#10011,#10010)
containerparent(#10010,#10009)
containerparent(#10009,#10008)
containerparent(#10008,#10007)
containerparent(#10007,#10006)
containerparent(#10006,#10005)
containerparent(#10005,#10004)
containerparent(#10004,#10003)
containerparent(#10003,#10002)
containerparent(#10002,#10001)
containerparent(#10001,#10000)
#10012=@"loc,{#10000},0,0,0,0"
locations_default(#10012,#10000,0,0,0,0)
hasLocation(#10000,#10012)
numlines(#10000,6,5,0)
#10013=@"package;"
packages(#10013,"")
cupackage(#10000,#10013)
#10014=@"class;X"
#10015=@"field;{#10014};nullChar"
#10016=@"class;java.lang.String"
fields(#10015,"nullChar",#10016,#10014,#10015)
#10017=@"/home/arthur/Devel/code/ql/java/ql/test/query-tests/Nullness/X.class;sourcefile"
files(#10017,"/home/arthur/Devel/code/ql/java/ql/test/query-tests/Nullness/X.class","X","class",0)
containerparent(#10001,#10017)
packages(#10013,"")
cupackage(#10017,#10013)
#10018=@"loc,{#10017},0,0,0,0"
locations_default(#10018,#10017,0,0,0,0)
hasLocation(#10017,#10018)
hasLocation(#10015,#10018)
#10019=@"field;{#10014};unicode"
fields(#10019,"unicode",#10016,#10014,#10019)
hasLocation(#10019,#10018)
#10020=@"field;{#10014};normal"
fields(#10020,"normal",#10016,#10014,#10020)
hasLocation(#10020,#10018)
#10021=@"type;void"
primitives(#10021,"void")
#10022=@"unknown;sourcefile"
files(#10022,"","","",0)
#10023=@"loc,{#10022},0,0,0,0"
locations_default(#10023,#10022,0,0,0,0)
hasLocation(#10021,#10023)
#10024=@"callable;{#10014}.<init>(){#10021}"
constrs(#10024,"X","X()",#10021,#10014,#10024)
hasLocation(#10024,#10018)
isDefConstr(#10024)
#10025=@"loc,{#10000},1,7,1,7"
locations_default(#10025,#10000,1,7,1,7)
hasLocation(#10014,#10025)
numlines(#10014,5,4,0)
#10026=*
methods(#10026,"<obinit>","<obinit>()",#10021,#10014,#10026)
#10027=@"modifier;private"
modifiers(#10027,"private")
hasModifier(#10026,#10027)
locations_default(#10025,#10000,1,7,1,7)
hasLocation(#10026,#10025)
#10028=*
stmts(#10028,0,#10026,0,#10026)
locations_default(#10025,#10000,1,7,1,7)
hasLocation(#10028,#10025)
#10029=*
#10030=*
#10031=*
stmts(#10029,14,#10028,0,#10026)
#10032=*
locations_default(#10032,#10000,3,1,3,21)
hasLocation(#10029,#10032)
exprs(#10030,4,#10016,#10029,0)
callableEnclosingExpr(#10030,#10026)
statementEnclosingExpr(#10030,#10029)
#10033=*
locations_default(#10033,#10000,3,1,3,21)
hasLocation(#10030,#10033)
exprs(#10031,60,#10016,#10030,0)
callableEnclosingExpr(#10031,#10026)
statementEnclosingExpr(#10031,#10029)
#10034=*
locations_default(#10034,#10000,3,1,3,21)
hasLocation(#10031,#10034)
variableBinding(#10031,#10020)
#10035=*
exprs(#10035,22,#10016,#10030,1)
callableEnclosingExpr(#10035,#10026)
statementEnclosingExpr(#10035,#10029)
#10036=*
locations_default(#10036,#10000,3,17,3,20)
hasLocation(#10035,#10036)
numlines(#10035,1,1,0)
namestrings("""\0""","^@",#10035)
#10037=*
#10038=*
#10039=*
stmts(#10037,14,#10028,1,#10026)
#10040=*
locations_default(#10040,#10000,4,1,4,26)
hasLocation(#10037,#10040)
exprs(#10038,4,#10016,#10037,0)
callableEnclosingExpr(#10038,#10026)
statementEnclosingExpr(#10038,#10037)
#10041=*
locations_default(#10041,#10000,4,1,4,26)
hasLocation(#10038,#10041)
exprs(#10039,60,#10016,#10038,0)
callableEnclosingExpr(#10039,#10026)
statementEnclosingExpr(#10039,#10037)
#10042=*
locations_default(#10042,#10000,4,1,4,26)
hasLocation(#10039,#10042)
variableBinding(#10039,#10019)
#10043=*
exprs(#10043,22,#10016,#10038,1)
callableEnclosingExpr(#10043,#10026)
statementEnclosingExpr(#10043,#10037)
#10044=*
locations_default(#10044,#10000,4,18,4,25)
hasLocation(#10043,#10044)
numlines(#10043,1,1,0)
namestrings("""\u0000""","^@",#10043)
#10045=*
#10046=*
#10047=*
stmts(#10045,14,#10028,2,#10026)
#10048=*
locations_default(#10048,#10000,5,1,5,22)
hasLocation(#10045,#10048)
exprs(#10046,4,#10016,#10045,0)
callableEnclosingExpr(#10046,#10026)
statementEnclosingExpr(#10046,#10045)
#10049=*
locations_default(#10049,#10000,5,1,5,22)
hasLocation(#10046,#10049)
exprs(#10047,60,#10016,#10046,0)
callableEnclosingExpr(#10047,#10026)
statementEnclosingExpr(#10047,#10045)
#10050=*
locations_default(#10050,#10000,5,1,5,22)
hasLocation(#10047,#10050)
variableBinding(#10047,#10015)
#10051=*
exprs(#10051,22,#10016,#10046,1)
callableEnclosingExpr(#10051,#10026)
statementEnclosingExpr(#10051,#10045)
#10052=*
locations_default(#10052,#10000,5,19,5,21)
hasLocation(#10051,#10052)
numlines(#10051,1,1,0)
namestrings("""^@""","^@",#10051)
locations_default(#10025,#10000,1,7,1,7)
hasLocation(#10024,#10025)
numlines(#10024,1,1,0)
#10053=*
stmts(#10053,0,#10024,0,#10024)
#10054=*
locations_default(#10054,#10000,1,7,1,7)
hasLocation(#10053,#10054)
numlines(#10053,1,1,0)
#10055=*
stmts(#10055,20,#10053,0,#10024)
#10056=*
locations_default(#10056,#10000,1,7,1,7)
hasLocation(#10055,#10056)
numlines(#10055,1,1,0)
#10057=*
stmts(#10057,14,#10053,1,#10024)
#10058=*
locations_default(#10058,#10000,1,7,1,7)
hasLocation(#10057,#10058)
numlines(#10057,1,1,0)
#10059=*
exprs(#10059,61,#10021,#10057,0)
callableEnclosingExpr(#10059,#10024)
statementEnclosingExpr(#10059,#10057)
#10060=*
locations_default(#10060,#10000,1,7,1,7)
hasLocation(#10059,#10060)
numlines(#10059,1,1,0)
callableBinding(#10059,#10026)
#10061=@"class;java.lang.Object"
#10062=@"callable;{#10061}.<init>(){#10021}"
callableBinding(#10055,#10062)
#10063=*
fielddecls(#10063,#10014)
#10064=*
locations_default(#10064,#10000,3,1,3,21)
hasLocation(#10063,#10064)
#10065=*
exprs(#10065,62,#10016,#10063,0)
#10066=*
locations_default(#10066,#10000,3,1,3,6)
hasLocation(#10065,#10066)
numlines(#10065,1,1,0)
#10067=@"loc,{#10000},3,8,3,13"
locations_default(#10067,#10000,3,8,3,13)
hasLocation(#10020,#10067)
fieldDeclaredIn(#10020,#10063,0)
#10068=*
fielddecls(#10068,#10014)
#10069=*
locations_default(#10069,#10000,4,1,4,26)
hasLocation(#10068,#10069)
#10070=*
exprs(#10070,62,#10016,#10068,0)
#10071=*
locations_default(#10071,#10000,4,1,4,6)
hasLocation(#10070,#10071)
numlines(#10070,1,1,0)
#10072=@"loc,{#10000},4,8,4,14"
locations_default(#10072,#10000,4,8,4,14)
hasLocation(#10019,#10072)
fieldDeclaredIn(#10019,#10068,0)
#10073=*
fielddecls(#10073,#10014)
#10074=*
locations_default(#10074,#10000,5,1,5,22)
hasLocation(#10073,#10074)
#10075=*
exprs(#10075,62,#10016,#10073,0)
#10076=*
locations_default(#10076,#10000,5,1,5,6)
hasLocation(#10075,#10076)
numlines(#10075,1,1,0)
#10077=@"loc,{#10000},5,8,5,15"
locations_default(#10077,#10000,5,8,5,15)
hasLocation(#10015,#10077)
fieldDeclaredIn(#10015,#10073,0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had expected unicode and nullChar to be extracted similarly because as far as I know unicode escape sequences are replaced before lexical analysis. However, in the extractor output the normal and unicode lines are extracted similarly, while the last is different.

Really? The TRAP you quote shows the three cases with identical values, right?

namestrings("""\0""","^@",#10035)
namestrings("""\u0000""","^@",#10043)
namestrings("""^@""","^@",#10051)

I think perhaps some of the Java QL code here predates an improvement that I made a long time ago to the handling of the value of string literals, which precisely gives this uniform representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the conclusion in slack was that we'd only need the charlit.regexpMatch("\\u0000") case and that the others currently don't work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschackmull Yes, you're right. The things get extracted the same way in all three cases. I misread things in my terminal before I replaced the NUL character with something more easy to read ;-)

@tamasvajk
Copy link
Contributor Author

@hvitved
Copy link
Contributor

hvitved commented Sep 20, 2020

Differences job: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/439/

This confirmed the performance issue caused by the guards logic change, as expected. I have started new jobs based on a rebased branch:

@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch 2 times, most recently from 8a52f61 to 5bc8c46 Compare September 21, 2020 13:01
@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch 2 times, most recently from 688cd92 to fdc9e01 Compare September 21, 2020 13:20
tamasvajk and others added 2 commits September 21, 2020 16:15
@tamasvajk tamasvajk force-pushed the feature/sign-analysis branch from fdc9e01 to 06dbec7 Compare September 21, 2020 14:15
@tamasvajk
Copy link
Contributor Author

Differences job after rebase, squash: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/446/

@hvitved
Copy link
Contributor

hvitved commented Sep 21, 2020

@tamasvajk
Copy link
Contributor Author

Differences job after rebase, squash: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/446/

And here is one for Java as well: https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/929/

I restarted the C# one: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/448/
The Java one finished okay with no changes.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this merged once the CSharp-Differences job completes (without regressions).

Thanks for you patience @tamasvajk ; this is excellent work.

@tamasvajk tamasvajk merged commit 54c3574 into github:main Sep 22, 2020
result = exprSign(e.(CastExpr).getExpr())
}

private Sign binaryOpLhsSign(Expr e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binaryOpLhsSign and binaryOpRhsSign have different implementation for C# and Java.

}

pragma[noinline]
private predicate binaryOpSigns(Expr e, Sign lhs, Sign rhs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved to the common bit, but the gain would be minimal. I think it's better to keep it together with binaryOpLhsSign and binaryOpRhsSign.

@tamasvajk
Copy link
Contributor Author

Thanks @aschackmull for your findings. I'll go through them during the week...

@tamasvajk tamasvajk mentioned this pull request Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants