C++: Summary metrics queries #5271

dbartol · 2021-02-25T18:01:37Z

Resolves https://github.com/github/codeql-c-analysis-team/issues/239

This is a first attempt at implementing, for C++, the set of summary queries that we expect all languages to implement to help diagnose extraction failures and build configuration problems. See the spec in this document. The five queries are:

Total number of source files (including .c/.cpp and header files)
Total number of lines of text across all source files
Total number of lines of code across all source files
Number of lines of text in each source file
Number of lines of code in each source file

I've added some simple unit tests that cover all five of these.

This is a first attempt at implementing, for C++, the set of summary queries that we expect all languages to implement to help diagnose extraction failures and build configuration problems. See the spec in [this document](https://docs.google.com/document/d/1V3zpkj0OGh8GEUVwACRx7fiafE5zklujAftZaYUyf9s/edit?usp=sharing). The five queries are: - Total number of source files (including .c/.cpp and header files) - Total number of lines of text across all text files - Total number of lines of code across all text files - Number of lines of text in each source file - Number of lines of code in each source file I've added some simple unit tests that cover all five of these.

geoffw0

Is there a difference in meaning between a "text file" and "source file" in the PR description? If there is, does it affect CPP?

Queries LGTM.

geoffw0 · 2021-02-26T09:40:52Z

cpp/ql/src/Diagnostics/LinesOfCode.ql

+
+import cpp
+
+select sum(File f | f.fromSource() | f.getMetrics().getNumberOfLinesOfCode())


For the benefit of anyone else who is wondering, getNumberOfLinesOfCode() doesn't count blank lines and lines that only contain comments.

cpp/ql/src/Diagnostics/Files.ql

jbj · 2021-02-26T12:50:49Z

cpp/ql/src/Diagnostics/Files.ql

+/**
+ * @name Total source files
+ * @description The total number of source files.
+ * @kind metric


Is @kind metric all the metadata we need for Code Scanning to present the results of these queries in the right place? I note that @kind metric is already in use by LGTM, and the LGTM documentation specifies that such queries have two additional mandatory properties. We have existing @kind metric queries for Python and JavaScript in this repo.

Oh well spotted. I reused the existing @kind metric deliberately, but will look into making those properties optional in the toolchain. If we're not adding these queries to the LGTM suites then we don't need to make LGTM handle them.

Is @kind metric all the metadata we need for Code Scanning to present the results of these queries in the right place?

It sounds like we'll need a tag and/or a query suite to identify the metrics to use in Code Scanning. Any decision on what to use here, or should we just add it later when the rest of the toolchain is further along?

I can see the drawbacks of reusing @kind metric: we have to change the rules for additional properties and possibly change how the suites are produced. What are the advantages?

Go with just @kind metric for now, I think that's enough for you to proceed :)

The contents of the Code Scanning suites are determined by https://github.com/github/codeql/blob/main/misc/suite-helpers/code-scanning-selectors.yml. Regardless of what name we use here, we'll have to update that config to ship these new types of queries to Code Scanning. The suite format gives us the ability to include by kind.

The LGTM suite generation script only includes queries that also have metricType.

The existing meta-metric queries for JS and Python can easily be excluded by filtering out @tags meta, or just editing their metadata if those teams are happy with it (LGTM and Code Scanning don't include them anyway).

I think metric is the simplest name for this property, and it gives scope to reuse the metric/treemap queries we already have that fit this purpose. That said, the choice shouldn't block you here: if we encounter trouble making the toolchain work, or need to do a final find-replace with a new name, that's on my team to fix.

The existing meta-metric queries for JS and Python can easily be excluded by filtering out @tags meta

I'd be uncomfortable with a query suite defined as exclusion-filter, i.e. "everything except queries with tags x, y, z". A simple typo in a tag could cause a query to be picked up by accident, making its way into production.

Would you please consider an inclusion-filter instead, for example,

- include: kind: - metric tags contain: - diagnostic

(sorry for hijacking the thread, it seemed related to this)

I don't know the technicalities of how big of an effort it would be to invent a new kind vs. the effort of changing the docs. I also don't know our overall philosophy of how much we try to reuse existing kinds vs. create new ones.

If we do reuse the kind, I'm very much in support of Asger's proposal to let the suites be determined by inclusion filters rather than exclusion filters. It seems like too much of a footgun to require every future LGTM-style metric query to add @tags meta. Authors of such queries will be testing the LGTM-style functionality, and they won't notice that Code Scanning suddenly starts including extra diagnostic data in its logs because omitted a tag.

Since we're already going with "summary" for the path name and @id, should we use summary as the tag, rather than diagnostic?

Inclusion filter sounds sensible. I think summary is a good tag name for now since we eventually plan to have diagnostic as a separate kind (for extractor diagnostics), so let's go with that. If you want both of those tag names supported later it's easy to add.

dbartol · 2021-02-26T20:04:22Z

Is there a difference in meaning between a "text file" and "source file" in the PR description? If there is, does it affect CPP?
s/text file/source file/
(I've updated the description)

…metrics"

… now.

criemen · 2021-03-15T11:34:40Z

@jbj per discussion with @AlonaHlobina , this PR now contains only the first LoC summary metric we want to measure.
This metric query returns a LoC-count that is indicative of the database size.

Another metric query, that corresponds to a LoC-count that is relatable to the user, is postponed until further clarification from product.

Here, LoC means "newline in any C/C++/header file".

This is ready for review now.

criemen · 2021-03-15T13:16:00Z

Please hold back on merging, there's some news about what a LoC is, I might need to change this query again after all.

jbj

Wonderfully simple. LGTM.

jbj · 2021-03-15T13:17:38Z

Okay, holding back on merging.

criemen · 2021-03-16T09:57:25Z

@jbj: Ready for re-review. I guess we should also get 👀 from the docs team?

jbj · 2021-03-16T11:26:33Z

@github/docs-content-codeql, please review the docs (name and description) of this new query.

github-actions · 2021-03-16T11:27:04Z

Hello @github/docs-content-codeql - this PR is ready for docs review.

cpp/ql/src/Summary/LinesOfCode.ql

Co-authored-by: Jonas Jensen <jbj@github.com>

shati-patel

Just a small comment about the description 😊

cpp/ql/src/Summary/LinesOfCode.ql

Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>

criemen · 2021-03-19T09:55:12Z

@jbj can you re-approve?

asgerf · 2021-03-22T11:22:11Z

Why did the PR checks not fail for this PR?

In the JS version of this query we get:

[WARN] ql/javascript/ql/src/Summary/LinesOfCode.ql: Missing required property @metricType.
[ERROR] ql/javascript/ql/src/Summary/LinesOfCode.ql: expected result pattern(s) are not present for query kind "metric" (Expected metric result pattern.)

criemen · 2021-03-22T11:28:45Z

Presumably because you run the odasa-based check_queries target in your CI job, whereas C++ uses a CodeQL CLI based codeql_check_queries target.
Try changing check_queries( to codeql_check_queries( in build approx. line 153.

The query didn't work with the old `odasa` toolchain, which is still used in internal Jenkins jobs. This reverts commit 98c1aa5, reversing changes made to 0732f20.

`py/summary/lines-of-code` is just a port of the C++/JS queries added in: - github#5271 (C++) - github#5304 (JS) We are the first to implement the `lines-of-user-code` query, so nothing to compare with in other languages -- but it makes a lot of sense to do for Python 👍

dbartol added the C++ label Feb 25, 2021

dbartol requested review from adityasharad and jbj February 25, 2021 18:01

dbartol requested a review from a team as a code owner February 25, 2021 18:01

geoffw0 reviewed Feb 26, 2021

View reviewed changes

jbj reviewed Feb 26, 2021

View reviewed changes

asgerf mentioned this pull request Mar 1, 2021

JS: Implement new metric queries for line counting #5304

Merged

MathiasVP added the no-change-note-required This PR does not need a change note label Mar 1, 2021

C++: Update naming of queries and paths to use "summary" instead of "…

863497c

…metrics"

criemen self-assigned this Mar 10, 2021

criemen added 3 commits March 10, 2021 11:15

C++: Update summary metrics query format.

16a3dfa

C++: Update summary queries.

97ab842

C++: Only include sum of LoC in the new non-alert summary queries for…

f75b969

… now.

jbj previously approved these changes Mar 15, 2021

View reviewed changes

C++: Update query to latest spec.

fa3ac30

criemen dismissed jbj’s stale review via fa3ac30 March 16, 2021 09:56

jbj added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Mar 16, 2021

jbj previously approved these changes Mar 16, 2021

View reviewed changes

cpp/ql/src/Summary/LinesOfCode.ql Outdated Show resolved Hide resolved

cpp/ql/src/Summary/LinesOfCode.ql Outdated Show resolved Hide resolved

Apply suggestions from code review

755fec4

Co-authored-by: Jonas Jensen <jbj@github.com>

criemen dismissed jbj’s stale review via 755fec4 March 16, 2021 12:21

shati-patel reviewed Mar 17, 2021

View reviewed changes

cpp/ql/src/Summary/LinesOfCode.ql Outdated Show resolved Hide resolved

Update cpp/ql/src/Summary/LinesOfCode.ql

783a63a

Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>

jbj approved these changes Mar 19, 2021

View reviewed changes

jbj merged commit 98c1aa5 into main Mar 19, 2021

jbj deleted the files-query branch March 19, 2021 11:56

jbj mentioned this pull request Mar 23, 2021

Revert "Merge pull request #5271 from github/files-query" #5495

Closed


		import cpp

		select sum(File f \| f.fromSource() \| f.getMetrics().getNumberOfLinesOfCode())

C++: Summary metrics queries #5271

C++: Summary metrics queries #5271

Uh oh!

Conversation

dbartol commented Feb 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbartol commented Feb 26, 2021

Uh oh!

criemen commented Mar 15, 2021

Uh oh!

criemen commented Mar 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

jbj commented Mar 15, 2021

Uh oh!

criemen commented Mar 16, 2021

Uh oh!

jbj commented Mar 16, 2021

Uh oh!

github-actions bot commented Mar 16, 2021

Uh oh!

Uh oh!

Uh oh!

shati-patel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

criemen commented Mar 19, 2021

Uh oh!

asgerf commented Mar 22, 2021

Uh oh!

criemen commented Mar 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

dbartol commented Feb 25, 2021 •

edited

Loading

criemen commented Mar 15, 2021 •

edited

Loading