-
Notifications
You must be signed in to change notification settings - Fork 74
Implement Memory1 (RULE-8-7-1)
#967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…terminator, remove file pointer cases 1. Add headers, Adding missing headers: For obvious reasons. 2. Remove cases without null terminator: Both clang and g++ do not permit strings to be allocated that are declared to be shorter than the actual initializing expression. Since this is a C++ rule, we rule them out. 3. File pointer manipulation functions (e.g. fgets): Not required by the rule.
…ub/codeql-coding-standards into jeongsoolee09/MISRA-C++-2023-Memory
Too many of the negative offsets alerts were false positives. We leave it for future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements MISRA C++ 2023 Rule 8-7-1 (Memory1 package) which validates pointer arithmetic to ensure pointers don't exceed array bounds. The PR updates the rules.csv to organize memory-related rules into separate packages (Memory1-Memory6) and adds two queries for detecting invalid pointer arithmetic.
Changes:
- Implements two queries for RULE-8-7-1: PointerArithmeticFormsAnInvalidPointer (path-problem) and PointerArgumentToCstringFunctionIsInvalid (problem)
- Adds a comprehensive OutOfBounds.qll module (1358 lines) for buffer overflow analysis
- Updates rules.csv to assign memory-related rules to Memory1-Memory6 packages
- Adds Memory1.json rule package description file and Memory1.qll exclusions file
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| rules.csv | Updates package names from "Memory" to "Memory1" through "Memory6" for memory-related rules |
| rule_packages/cpp/Memory1.json | Adds package description for Memory1 with metadata for both queries |
| cpp/misra/src/rules/RULE-8-7-1/PointerArithmeticFormsAnInvalidPointer.ql | Implements path-problem query for detecting invalid pointer arithmetic |
| cpp/misra/src/rules/RULE-8-7-1/PointerArgumentToCstringFunctionIsInvalid.ql | Implements problem query for invalid cstring function arguments |
| cpp/misra/test/rules/RULE-8-7-1/test.cpp | Comprehensive test file with 519 lines covering various pointer arithmetic scenarios |
| cpp/misra/test/rules/RULE-8-7-1/*.expected | Expected results files for both queries |
| cpp/misra/test/rules/RULE-8-7-1/*.qlref | Query reference files |
| cpp/common/src/codingstandards/cpp/OutOfBounds.qll | New module providing buffer overflow analysis infrastructure |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/Memory1.qll | Auto-generated exclusions file for Memory1 queries |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/RuleMetadata.qll | Updates to include Memory1 query metadata |
| cpp/common/test/includes/standard-library/*.h | Adds missing C standard library function declarations |
| cpp/common/test/includes/standard-library/cstdlib | Adds using declarations for malloc, calloc, realloc |
Comments suppressed due to low confidence (1)
cpp/misra/src/rules/RULE-8-7-1/PointerArithmeticFormsAnInvalidPointer.ql:171
- Similar to the PointerAddExpr case, this predicate uses getAnOperand() which could match either operand. For PointerSubExpr, you need to specifically get the right operand (the value being subtracted). The current implementation could incorrectly negate the pointer operand instead of the offset value.
exists(PointerSubExpr pointerSubtraction | pointerSubtraction = this.asPointerArithmetic() |
result = -pointerSubtraction.getAnOperand().getValue().toInt()
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import codingstandards.cpp.exclusions.c.RuleMetadata | ||
|
|
||
| from | ||
| OOB::BufferAccessLibraryFunctionCall fc, string message, Expr bufferArg, string bufferArgStr, | ||
| Expr sizeOrOtherBufferArg, string otherStr | ||
| where | ||
| not isExcluded(fc, OutOfBoundsPackage::libraryFunctionArgumentOutOfBoundsQuery()) and |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query imports from codingstandards.cpp.exclusions.c.RuleMetadata but the OutOfBoundsPackage::libraryFunctionArgumentOutOfBoundsQuery is defined for C, not C++. This is a MISRA C++ rule (RULE-8-7-1), so it should either use the C++ exclusions system or verify that using the C exclusions is intentional. This could cause issues with the exclusion system not properly recognizing this as a C++ query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be not isExcluded(Memory1Package::theQueryName())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 8435f3c!
| } | ||
|
|
||
| /** | ||
| * The |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete documentation: The class documentation starts with "The" but is incomplete. Please complete the documentation explaining what this class represents, such as "A heap allocation function call that is cast to a specific pointer type."
| * The | |
| * A heap allocation function call that is cast to a specific pointer type. |
| result = pointerAddition.getAnOperand().getValue().toInt() // TODO: only get the number being added | ||
| ) | ||
| or | ||
| exists(PointerSubExpr pointerSubtraction | pointerSubtraction = this.asPointerArithmetic() | | ||
| result = -pointerSubtraction.getAnOperand().getValue().toInt() |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getOffset predicate has a TODO comment indicating incomplete implementation. For PointerAddExpr, the current logic uses getAnOperand() which could match either the pointer or the integer operand. This should specifically get the right operand (the offset value) to avoid incorrect results. Consider using getRightOperand() or a similar specific accessor to ensure the correct operand is used.
| result = pointerAddition.getAnOperand().getValue().toInt() // TODO: only get the number being added | |
| ) | |
| or | |
| exists(PointerSubExpr pointerSubtraction | pointerSubtraction = this.asPointerArithmetic() | | |
| result = -pointerSubtraction.getAnOperand().getValue().toInt() | |
| result = pointerAddition.getRightOperand().getValue().toInt() | |
| ) | |
| or | |
| exists(PointerSubExpr pointerSubtraction | pointerSubtraction = this.asPointerArithmetic() | | |
| result = -pointerSubtraction.getRightOperand().getValue().toInt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's handle this todo, but not how copilot is suggesting, so that we can handle both p + n and n + p.
MichaelRFairhurst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, Jeongsoo! The work you've put into this really shows, its elegant and focused. In terms of the taint tracking edge cases, I think we can handle those cleanly on your foundation here if we focus on the array-to-pointer conversion cases, which we can talk more about later! Also, I have to say the tests you've made are awesomely comprehensive, nicely done. That is huge and really shows all your attention to detail!
| /** | ||
| * This module provides classes and predicates for analyzing the size of buffers | ||
| * or objects from their base or a byte-offset, and identifying the potential for | ||
| * expressions accessing those buffers to overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a direct copy? We should probably state that, and/or, list modifications that have been made.
| import cpp | ||
| import codingstandards.cpp.OutOfBounds // for OOB::problems | ||
| import codingstandards.cpp.Exclusions // for isExcluded(Element, Query) | ||
| import codingstandards.cpp.exclusions.c.RuleMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be codingstandards.cpp.exclusions.cpp.RuleMetadata, or deleted?
| import codingstandards.cpp.exclusions.c.RuleMetadata | ||
|
|
||
| from | ||
| OOB::BufferAccessLibraryFunctionCall fc, string message, Expr bufferArg, string bufferArgStr, | ||
| Expr sizeOrOtherBufferArg, string otherStr | ||
| where | ||
| not isExcluded(fc, OutOfBoundsPackage::libraryFunctionArgumentOutOfBoundsQuery()) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be not isExcluded(Memory1Package::theQueryName())
| /** | ||
| * A call to a function that dynamically allocates memory on the heap. | ||
| */ | ||
| class HeapAllocationFunctionCall extends FunctionCall { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should put this logic in common somewhere, maybe standardlibrary/memory?
| CallocFunctionCall() { this.isCallocCall() } | ||
|
|
||
| override int getMinNumBytes() { | ||
| result = lowerBound(this.getArgument(0)) * lowerBound(this.getArgument(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use the minimum, or the maximum?
I'd suggest we run this on MRVA and see how many false positives we get. If it is a lot, I'd suggest using upperBound().
| * Gets the offset of this pointer formation as calculated in relation to the base pointer. | ||
| */ | ||
| int getOffset() { | ||
| result = this.asArrayExpr().getArrayOffset().getValue().toInt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of getArrayOffset().getValue().toInt(), which only handles constants, we likely want to use range analysis, either upperBound (noisiest) or lowerBound (quietest).
| result = pointerAddition.getAnOperand().getValue().toInt() // TODO: only get the number being added | ||
| ) | ||
| or | ||
| exists(PointerSubExpr pointerSubtraction | pointerSubtraction = this.asPointerArithmetic() | | ||
| result = -pointerSubtraction.getAnOperand().getValue().toInt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's handle this todo, but not how copilot is suggesting, so that we can handle both p + n and n + p.
| import semmle.code.cpp.ir.IR | ||
| import semmle.code.cpp.ir.dataflow.internal.SsaInternals as Ssa | ||
|
|
||
| predicate operandToInstructionTaintStep(Operand opFrom, Instruction instrTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, as I run this on some c++ databases, it seems that this is adding too many edges, and we may need to try a different approach :(
For instance, I'm getting an error in the following matrix math code on opencv
double A[2][9];
A[k][4] = ...;
The alert is "this pointer has offset 4 when the minimum possible length of the object might be 2."
I think the issue may be related to this:
double _Qy[2][4] /* Node 1 */
= { /* Node 2 */
{c, 0, -s}, /* Node 3 */
{0, 1, 0}, /* Node 4 */
{s, 0, c} /* Node 5 */
};
This taint predicate causes Node 1 to taint Node 2, and Node 2 taints nodes 3, 4, and 5. This is a pretty big problem, it means that any path to _Qy[x] will also have a path to _Qy[0][x] -- basically, we won't be able to distinguish multidimensional arrays at all.
There are many other edges being added that also seem concerning. In the results for opencv, many of the paths are nearly 100 nodes long. Another example is the following path:
// 1.
uint8_t m[1][6] = { ... }; // &m has size 1
// 2.
cv::Mat images_u = Mat(1, 6, CV_8UC1, m); // &images_u.data has size 1
// 3.
const Mat* images = &images_u; // images->data has size 1
// Note, the following code executes between 3. and 4.
std::vector<uchar*>& _ptrs; // declare vector _ptrs
// some amount of elements get put in _ptrs
// 4.
_ptrs[i] = images[j].data + c*esz1; // _ptrs[i] has unknown size
// A few things happen here:
// - An alert should probably occur, because images[j].data + c*esz1 might produce an invalid pointer
// - The size of _ptrs[i] is now unknown.
// - The size of _ptrs is not known, and not based on step 3
// 5.
T** ptrs = (T**)&_ptrs[0];
// Two things happen in the above line
// - _ptrs[0] is a pointer of unknown size that flows from step 4
// - &_ptrs[0] takes the address of that pointer, producing a new pointer to a buffer of unknown size
// 6.
ptrs[2];
// Here we get an alert: minimum possible length of object might be 1. This alert flows from node 1
// However, the size of `ptrs` is not based on node 1.
// The size of `ptrs` is based on how many elements get put in `_ptrs` between steps 3 and 4
The above logic looks exactly like what we would expect from taint analysis, but isn't what we want for precise buffer size tracing. This makes me think that the edges being added via taint analysis are not the right edges.
Let me know if you want to pair program on this. I know these indirect data flow nodes have been nasty, so if you want to let me bash my head in for a bit on it we can see what are other options may be together.
Description
Implement Memory1 (
RULE-8-7-1) and add rule package description files for the rest of the rules (Memory2-Memory6).Change request type
.ql,.qll,.qlsor unit tests)Rules with added or modified queries
RULE-8-7-1Release change checklist
A change note (development_handbook.md#change-notes) is required for any pull request which modifies:
If you are only adding new rule queries, a change note is not required.
Author: Is a change note required?
🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.
Reviewer: Confirm that either a change note is not required or the change note is required and has been added.
Query development review checklist
For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:
Author
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Reviewer
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.