Open
Conversation
…180819) ... by extracting the check for space character and marking it as `LLVM_LIKELY`. This increases performance because the space is by far the most common horizontal character, so in most cases, this change allows to replace a lookup table check with a simple comparison, reducing latency and helping the cache. This does not reduce instruction count, as a lookup table and a comparison are both a single instruction. However, it _does_ reduce cycles in a consistent manner, around `0.2` - `0.3`%: [benchmark](https://llvm-compile-time-tracker.com/compare.php?from=3192fe2c7b08912cc72c86471a593165b615dc28&to=faa899a6ce518c1176f2bf59f199eb42e59d840e&stat=cycles). I tested this locally and am able to confirm this is not noise (at least not entirely, it does feel weird that this impacts `O3` more than `O0`...), as I achieved almost `2`% faster PP speed in my tests.
When evaluating whether an allocated type contains a pointer to generate the `alloc_token` metadata, `typeContainsPointer` incorrectly stopped recursion upon encountering an `AtomicType`. This resulted in types like `_Atomic(int *)` (or `std::atomic<int *>` under libc++) being incorrectly evaluated as not containing a pointer. Add support for `AtomicType` in `typeContainsPointer` by recursively checking the contained type. Add tests for structs containing `_Atomic(int *)` and `_Atomic(int)`.
When this document was converted from rst to markdown, the contents didn't get updated correctly.
When cc1 runs out-of-process and crashes, sys::ExecuteAndWait returns -2 for signal-killed children. The resignaling block added in 15488a7 only handled CommandRes > 128, so the driver would exit normally with code 1 instead of dying by signal.
Currently if there are operations between the loops we get a dominance issue as the delinearlized index is added after the operations. This PR fixes that. For testing we also add a transform pattern that makes a direct call to coalesceLoops as the existing pattern calls coalescePerfectlyNestedSCFForLoops which does not consider the loop nest perfectly nested if there are operations between them which is safer for that usage. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…183572) `BitCastOp::fold` called `Type::getIntOrFloatBitWidth()` on the source element type without first verifying it satisfies `isIntOrFloat()`. When the source vector has `index` element type (e.g. `vector<16xindex>`), the assertion `only integers and floats have a bitwidth` fires. Add an `srcElemType.isIntOrFloat()` guard to the condition so that the constant-folding path is skipped for non-integer/float element types. Fixes #177835
…oison (#183596) When `constFoldBinaryOp<IntegerAttr>` is called with a `ub.poison` operand, it propagates the poison attribute as its result. The fold method for `arith.addui_extended` then attempted to cast this result to `TypedAttr` via `llvm::cast<TypedAttr>(sumAttr)`, which failed with an assertion because `PoisonAttr` does not implement the `TypedAttr` interface. Fix this by checking whether the folded sum is a poison attribute before the cast. When poison is detected, it is propagated to both the sum and overflow results. Fixes #181534
…s of partial specializations (#183348) This fixes a helper so it implements retrieval of the argument replaced for a template parameter for partial spcializations. This was left out of the original patch, since it's quite hard to actually test. This helper implements the retrieval for variable templates, but only for completeness sake, as no current users rely on this, as I don't think a similar test case is possible to implement with variable templates. This fixes a regression introduced in #161029 which will be backported to llvm-22, so there are no release notes. Fixes #181062 Fixes #181410
) BF16 source operands use F32 inline constant values, so set OP_SEL to select the high half of the constant, since BF16 encoding matches the high 16 bits of F32 encoding. This behaviour is different from F16 source operands which use F16 constant values in the low 16 bits. Fixes: #183337
This is another instance of the logic from #183159. If we know one source is not-infinity, and the other source is less than or equal to 1, this cannot overflow. Special case llvm.amdgcn.trig.preop, as a substitute for proper range tracking. This almost enables pruning edge case handling in trig function implementations, if not for the recursion depth limit (but that's a problem for another day).
As in title. Only `reassoc` pattern was supplied -- for completeness all should be supplied. Make FastMathFlag ctor public as well.
The Metal Shader converter can output shader reflection information into a JSON file. This connects the -Fre flag (DXC's flag for reflection) to the Metal Shader Converter tool step to produce the JSON file. As a temporary state the -Fre flag will error when used without the -metal flag. This is required to address llvm/offload-test-suite#452
Summary: This is needed on some platforms like Windows when the generated command line becomes too large. This seems to be occurring in practice so we need to support this. Uses the same basic support clang does. No test because there isn't any current infrastructure to support it, will likely be "tested" by ROCBLAS builds not failing anymore on Windows.
We have accumulated four places where variables were only being used in asserts. This change silences the warnings for that.
#180563) Fixes #154713. The crash was due to `Index` sometimes being an unsigned 64-bit integer which was being zero-extended to a signed 64-bit, triggering an assertion failure in `APSInt::getExtValue`. This patch zero-extends it to a unsigned 64-bit integer instead, since `HandleLValueVectorElement` takes in a `uint64_t` anyway.
In CGOpenMPRuntimeGPU::translateParameter, reference-type captured variables were translated to pointer parameters with two address-space annotations: 1. LangAS::opencl_global on the pointee (for map'd variables), which correctly produces ptr addrspace(1) in NVPTX IR. 2. getLangASFromTargetAS(NVPTX_local_addr=5) on the pointer itself, annotating the parameter as living in NVPTX local (stack) memory. The second annotation is incorrect at the Clang type-system level: EmitParmDecl only supports parameters to be in LangAS::Default (or the special cases for OpenCL). Temporarily add an assert in EmitParmDecl that catches parameters with non-default address spaces in non-OpenCL compilations, and fix the violation by dropping the NVPTX_local_addr addAddressSpace call. Should fix the issue noticed in #181256 (comment), allowing removing that special case there for OpenMP, though I haven't tested the combination yet. That PR would fix EmitParmDecl to actually support non-default address spaces from Sema, and will remove this assert again. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…s in `JSONFormat` This changes fixes the diagnostic infrastructure in `JSONFormat` implementation to pass model objects (`EntityId`, `EntityLinkage`, `BuildNamespace`, `NestedBuildNamespace`, `SummaryName`) directly to `ErrorBuilder` instead of manually extracting their components. This relies on existing `llvm::format_provider` specializations for these objects. To support consistent string conversion for `BuildNamespaceKind` and `EntityLinkageType`, across both serialization and `operator<<`, `toString`/`fromString` functions have been introduced in an internal header `ModelStringConversions.h`. `EntityLinkage::LinkageType` is promoted to a standalone enum class `EntityLinkageType` at namespace scope, following the same pattern as `BuildNamespaceKind`. Tests have been added for `operator<<` and `format_provider` for all affected types, and a new `ModelStringConversionsTest.cpp` directly unit-tests the `toString/fromString` functions including round-trip and unknown-input cases.
This patch fixes a HexagonConstPropagation assert when evaluating sign-bit CONST32/CONST64 immediates (e.g. 0x80000000) after ConstantInt stopped implicitly truncating, by allowing truncation for that signed case.
…to wi pass (#181917) This PR adds distribution pattern for xegpu.load & store ops for the new sg-to-wi pass
Fix incremental build failure when using PCH build and clang-cache together.
…es (#182362) Fixes #182122 --- This patch resolves a crash when parsing `#pragma clang attribute` arguments for attributes that forbid arguments. The root cause is that the `#pragma` attribute path doesn't pass `EndLoc` to https://github.com/llvm/llvm-project/blob/413cafa4624eb37e586e266f44abd64896e1c598/clang/lib/Parse/ParsePragma.cpp#L1982-L1985 unlike the normal attribute parsing flow https://github.com/llvm/llvm-project/blob/413cafa4624eb37e586e266f44abd64896e1c598/clang/lib/Parse/ParseDeclCXX.cpp#L4706 Without `EndLoc`, argument parsing cannot update the parsed end token https://github.com/llvm/llvm-project/blob/413cafa4624eb37e586e266f44abd64896e1c598/clang/lib/Parse/ParseDecl.cpp#L621-L622 and `fix-it` gets an invalid end location https://github.com/llvm/llvm-project/blob/0dafeb97a4687c29f5182fa0239c7fa39ee23091/clang/lib/Parse/ParseDeclCXX.cpp#L4537 This change makes the pragma path pass `EndLoc` the same way as the regular flow does, preventing the crash and preserving valid fix-it ranges.
This patch fixes the REPL's output from interleaving with the status line by locking the stream before printing.
VPInstructionWithType directly allows modeling the loaded type.
…rs (#182354) Added reserved_smem_offset_{begin|end|cap|0} intrinsics to expose shared memory special registers and NVPTX TableGen support for these intrinsics.
Move materialization of the symbolic UF directly to unrollByUF. At this point, unrolling materializes the decision and it is natural to also materialize the symbolic UF here.
…83891) `DenseElementsAttr` supports only a hard-coded list of element types: `int`, `index`, `float`, `complex`. This commit generalizes the `DenseElementsAttr` infrastructure: it now supports arbitrary element types, as long as they implement the new `DenseElementTypeInterface`. The `DenseElementTypeInterface` has the following helper functions: - `getDenseElementBitSize`: Query the size of an element in bits. (When storing an element in memory, each element is padded to a full byte. This is an existing limitation of the `DenseElementsAttr`; with an exception for `i1`.) - `convertToAttribute`: Attribute factory / deserializer. Converts bytes into an MLIR attribute. The attribute provides the assembly format / printer for a single element. - `convertFromAttribute`: Serializer. Converts an MLIR attribute into bytes. Note: `convertToAttribute` / `convertFromAttribute` are mainly for writing test cases. For performance reasons, `DenseElementsAttr` users should work with raw bytes / elements and avoid any API that materializes MLIR attributes. However, MLIR attributes typically have human-readable parsers/printers, making them suitable for lit tests and debugging. This PR introduces an additional assembly format for `DenseElementsAttrs`. There are now two formats. (The existing one is kept for compatibility reasons.) - Literal-first (existing): `dense<[1, 2, 3]> : tensor<3xi32>` - Type-first (new): `dense<tensor<3xi32> : [1 : i32, 2 : i32, 3 : i32]>` The new syntax is needed to disambiguate between "literal" (e.g., `1`) and attribute (e.g., `1 : i32`) when parsing the first token. In the literal-first syntax, we only parse literals. In the type-first syntax, we only parse attributes. The existing `int`, `index`, `float`, `complex` types also implement the `DenseElementTypeInterface`. This allows us to implement `DenseElementsAttr::get` and `AttributeElementIterator::operator*` in a generic way. RFC: https://discourse.llvm.org/t/rfc-allow-custom-element-types-in-denseelementattr/89656 This is a re-upload of #179122.
…ic index (#183783) VectorInsertOpConversion crashes with an assertion failure when inserting a sub-vector at a dynamic position into a multi-dimensional vector. The pattern calls getAsIntegers() on the position, which asserts that all fold results are compile-time constant attributes. The existing guard (checking llvm::IsaPred<Attribute>) only covered the case where a scalar is inserted into the innermost dimension (the extractvalue path). The guard was missing for the insertvalue path when inserting a sub-vector at a dynamic position into a nested aggregate. Fix: add the same guard before the llvm.insertvalue creation to return failure() gracefully when any position index is dynamic, matching the behavior of VectorExtractOpConversion. Fixes #177829
Fix-forward for #183541. Two callsites to target_link_libraries were not migrated to the keyword signature. Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>
…T_VECTOR_ELT nodes (#183918)
Enable shift64 hazard recognition for gfx9 cores. --------- Signed-off-by: John Lu <John.Lu@amd.com>
Implement ImplicitValueInitExpr for ComplexType
This is LLDB version of https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/ppcfbsd-kern.c. This enables selecting ppc64le and reading registers from PCB structure on core dump and live kernel debugging. FPU registers aren't supported yet due to pcb structure issue, but this change still achieves feature parity with KGDB. Trapframe unwinding support will be implemented in future. Test files using core dump from ppc64le will be implemented once other kernel debugging improvements are done. --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
…inator (#183930) WarpExecuteOnLane0Op::verify() called getTerminator() which performed an unconditional cast<gpu::YieldOp> on the block's last operation. When the op body was written with a different terminator (e.g. affine.yield), the cast asserted immediately instead of emitting a verifier diagnostic. Fix by using dyn_cast in verify() before calling getTerminator(), and emitting a proper error message when the terminator is not gpu.yield. Add a regression test to invalid.mlir. Fixes #181450
…EXTRACT_VECTOR_ELT nodes (#183934)
Due to special handling of Whitesmiths when parsing, the additional level(s) needed for the block, when used with IndentAccessModifiers, were not being applied. Consequently, when calculating the access modifier indent offset, the modifiers were being placed at the class level. This change ensures that the additional level(s) are not omitted for Whitesmiths.
…p file (#173966) Command qModuleInfo (GDB server protocol) can be used to request metadata of shared libraries stored in a ZIP archive on the target. This is typically used for retrieving SO files bundled in a APK file on Android. Requesting the last entry in the ZIP file often fails because of a bug in the entry search mechanism. This PR fixes this. NOTES: * The bug appears only if the entry in the zip file has no extra field or comment * This is part on an effort to get lldb working for debugging Swift on Android: swiftlang#10831
…hey have copyable node Need to recalculate the deps for all buildvector nodes with copyable deps to prevent a compiler crash during scheduling of instructions
Currently for thin-lto, the imported static global values (functions, variables, etc) will be promoted/renamed from e.g., foo() to foo.llvm.(). Such a renaming caused difficulties in live patching since function name is changed ([1]). It is possible that some global value names have to be promoted to avoid name collision and linker failure. But in practice, majority of name promotions can be avoided. In [2], the suggestion is that thin-lto pre-link decides whether a particular global value needs name promotion or not. If yes, later on in thinBackend() the name will be promoted. I compiled a particular linux kernel version (latest bpf-next tree) and found 1216 global values with suffix .llvm.. With this patch, the number of promoted functions is 2, 98% reduction from the original kernel build. If some native objects are not participating with LTO, name promotions have to be done to avoid potential linker issues. So the current implementation cannot be on by default. But in certain cases, e.g., linux kernel build, people can enable lld flag --lto-whole-program-visibility to reduce the number of functions like foo.llvm.(). For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a few other rare cases, reducing the number of renaming due to promotion, is not implemented as lld flag '-lto-whole-program-visibility' is not supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull request only supports llvm-lto2 style workflow. The feature is off by default. To enable the future, lld flag '-lto-whole-program-visibility' and llvm flag '-always-rename-promoted-locals=false' are needed. The link [3] has more context for the pull request discussions. [1] https://lpc.events/event/19/contributions/2212 [2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400 [3] #178587
This is LLDB version of https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/riscv-fbsd-kern.c. This enables selecting riscv64 and reading registers from PCB structure on core dump and live kenrel debugging while trapframe unwinding support will be implemented in future. Test files using core dump from riscv64 will be implemented once other kernel debugging improvements are done. --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
… type (#183942) When parsing a spirv.struct type, any MLIR type was accepted as a member type without validation. This caused a crash in TypeExtensionVisitor and TypeCapabilityVisitor which unconditionally used cast<SPIRVType> on struct element types, asserting when a non-SPIR-V type (e.g., vector<2x2xi1>) was encountered. Fix the parser to reject non-SPIR-V member types with a proper error message. Fixes #179675
Changes `formatter_bytecode.compile_file` to return a `BytecodeSection` value. The `BytecodeSection` holds the data that needs to be emitted to an `__lldbformatters` section. The `BytecodeSection` currently provides `write_binary`, but will be updated in a follow up commit to include `write_source` which will allow the data to be emitted as C source code, or Swift source code. This will make it easier to integrate into build systems, as it's easier to get data into a binary via source code, than as a raw binary file.
…173852) Trivial change to prevent all warnings from being printed on a single line in the VS Code debug console.
Initialize DemandedElts mask when the index is constant and inbounds, otherwise check all elements.
This provides a more helpful message to the user when passing invalid command line options
…ion (#183718) Starting with AMD PAL metadata v3.6, pipeline ELFs cannot have a `.shader_functions` section. However, dynamic VGPR retry helpers use the `AMDGPU_CS_ChainPreserve` calling convention, which LLVM previously treated as a module entrypoint, incorrectly emitting this metadata.
…ration test (#183664) This PR adds the `convert_region_types` API to `ConversionPatternRewriter` and introduces a new integration test, `bf.py`, which demonstrates how to combine a Python-defined dialect, the dialect conversion API, the pass manager, and the execution engine to build a pure-Python JIT compilation pipeline.
… generated docs (#183938) The builtin documentation emitter previously sorted all categories purely alphabetically, which placed the "Undocumented" section before categories like "WMMA" in the generated RST. This made the output confusing since stub entries appeared before real documentation. Push the "Undocumented" category to the end of the output so that all documented categories appear first, regardless of their names.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )