-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Summary metrics queries #5271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a first attempt at implementing, for C++, the set of summary queries that we expect all languages to implement to help diagnose extraction failures and build configuration problems. See the spec in [this document](https://docs.google.com/document/d/1V3zpkj0OGh8GEUVwACRx7fiafE5zklujAftZaYUyf9s/edit?usp=sharing). The five queries are: - Total number of source files (including .c/.cpp and header files) - Total number of lines of text across all text files - Total number of lines of code across all text files - Number of lines of text in each source file - Number of lines of code in each source file I've added some simple unit tests that cover all five of these.
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference in meaning between a "text file" and "source file" in the PR description? If there is, does it affect CPP?
Queries LGTM.
|
|
||
| import cpp | ||
|
|
||
| select sum(File f | f.fromSource() | f.getMetrics().getNumberOfLinesOfCode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the benefit of anyone else who is wondering, getNumberOfLinesOfCode() doesn't count blank lines and lines that only contain comments.
cpp/ql/src/Diagnostics/Files.ql
Outdated
| /** | ||
| * @name Total source files | ||
| * @description The total number of source files. | ||
| * @kind metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is @kind metric all the metadata we need for Code Scanning to present the results of these queries in the right place? I note that @kind metric is already in use by LGTM, and the LGTM documentation specifies that such queries have two additional mandatory properties. We have existing @kind metric queries for Python and JavaScript in this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well spotted. I reused the existing @kind metric deliberately, but will look into making those properties optional in the toolchain. If we're not adding these queries to the LGTM suites then we don't need to make LGTM handle them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
@kind metricall the metadata we need for Code Scanning to present the results of these queries in the right place?
It sounds like we'll need a tag and/or a query suite to identify the metrics to use in Code Scanning. Any decision on what to use here, or should we just add it later when the rest of the toolchain is further along?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the drawbacks of reusing @kind metric: we have to change the rules for additional properties and possibly change how the suites are produced. What are the advantages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Go with just
@kind metricfor now, I think that's enough for you to proceed :) - The contents of the Code Scanning suites are determined by https://github.com/github/codeql/blob/main/misc/suite-helpers/code-scanning-selectors.yml. Regardless of what name we use here, we'll have to update that config to ship these new types of queries to Code Scanning. The suite format gives us the ability to include by
kind. - The LGTM suite generation script only includes queries that also have
metricType. - The existing meta-metric queries for JS and Python can easily be excluded by filtering out
@tags meta, or just editing their metadata if those teams are happy with it (LGTM and Code Scanning don't include them anyway). - I think metric is the simplest name for this property, and it gives scope to reuse the metric/treemap queries we already have that fit this purpose. That said, the choice shouldn't block you here: if we encounter trouble making the toolchain work, or need to do a final find-replace with a new name, that's on my team to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing meta-metric queries for JS and Python can easily be excluded by filtering out
@tags meta
I'd be uncomfortable with a query suite defined as exclusion-filter, i.e. "everything except queries with tags x, y, z". A simple typo in a tag could cause a query to be picked up by accident, making its way into production.
Would you please consider an inclusion-filter instead, for example,
- include:
kind:
- metric
tags contain:
- diagnostic(sorry for hijacking the thread, it seemed related to this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the technicalities of how big of an effort it would be to invent a new kind vs. the effort of changing the docs. I also don't know our overall philosophy of how much we try to reuse existing kinds vs. create new ones.
If we do reuse the kind, I'm very much in support of Asger's proposal to let the suites be determined by inclusion filters rather than exclusion filters. It seems like too much of a footgun to require every future LGTM-style metric query to add @tags meta. Authors of such queries will be testing the LGTM-style functionality, and they won't notice that Code Scanning suddenly starts including extra diagnostic data in its logs because omitted a tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're already going with "summary" for the path name and @id, should we use summary as the tag, rather than diagnostic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inclusion filter sounds sensible. I think summary is a good tag name for now since we eventually plan to have diagnostic as a separate kind (for extractor diagnostics), so let's go with that. If you want both of those tag names supported later it's easy to add.
|
|
@jbj per discussion with @AlonaHlobina , this PR now contains only the first LoC summary metric we want to measure. Another metric query, that corresponds to a LoC-count that is relatable to the user, is postponed until further clarification from product. Here, LoC means "newline in any C/C++/header file". This is ready for review now. |
|
Please hold back on merging, there's some news about what a LoC is, I might need to change this query again after all. |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderfully simple. LGTM.
|
Okay, holding back on merging. |
|
@jbj: Ready for re-review. I guess we should also get 👀 from the docs team? |
|
@github/docs-content-codeql, please review the docs (name and description) of this new query. |
|
Hello @github/docs-content-codeql - this PR is ready for docs review. |
Co-authored-by: Jonas Jensen <jbj@github.com>
shati-patel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small comment about the description 😊
Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>
|
@jbj can you re-approve? |
|
Why did the PR checks not fail for this PR? In the JS version of this query we get: |
|
Presumably because you run the odasa-based |
`py/summary/lines-of-code` is just a port of the C++/JS queries added in: - github#5271 (C++) - github#5304 (JS) We are the first to implement the `lines-of-user-code` query, so nothing to compare with in other languages -- but it makes a lot of sense to do for Python 👍
Resolves https://github.com/github/codeql-c-analysis-team/issues/239
This is a first attempt at implementing, for C++, the set of summary queries that we expect all languages to implement to help diagnose extraction failures and build configuration problems. See the spec in this document. The five queries are:
I've added some simple unit tests that cover all five of these.