Skip to content

[pull] main from llvm:main#5690

Open
pull[bot] wants to merge 1034 commits intoEricsson:mainfrom
llvm:main
Open

[pull] main from llvm:main#5690
pull[bot] wants to merge 1034 commits intoEricsson:mainfrom
llvm:main

Conversation

@pull
Copy link

@pull pull bot commented Feb 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot locked and limited conversation to collaborators Feb 20, 2026
@pull pull bot added the ⤵️ pull label Feb 20, 2026
bulbazord and others added 28 commits February 26, 2026 10:48
…180819)

... by extracting the check for space character and marking it as
`LLVM_LIKELY`. This increases performance because the space is by far
the most common horizontal character, so in most cases, this change
allows to replace a lookup table check with a simple comparison,
reducing latency and helping the cache.

This does not reduce instruction count, as a lookup table and a
comparison are both a single instruction. However, it _does_ reduce
cycles in a consistent manner, around `0.2` - `0.3`%:
[benchmark](https://llvm-compile-time-tracker.com/compare.php?from=3192fe2c7b08912cc72c86471a593165b615dc28&to=faa899a6ce518c1176f2bf59f199eb42e59d840e&stat=cycles).
I tested this locally and am able to confirm this is not noise (at least
not entirely, it does feel weird that this impacts `O3` more than
`O0`...), as I achieved almost `2`% faster PP speed in my tests.
When evaluating whether an allocated type contains a pointer to generate
the `alloc_token` metadata, `typeContainsPointer` incorrectly stopped
recursion upon encountering an `AtomicType`. This resulted in types like
`_Atomic(int *)` (or `std::atomic<int *>` under libc++) being
incorrectly evaluated as not containing a pointer.

Add support for `AtomicType` in `typeContainsPointer` by recursively
checking the contained type.

Add tests for structs containing `_Atomic(int *)` and `_Atomic(int)`.
When this document was converted from rst to markdown, the contents
didn't get updated correctly.
When cc1 runs out-of-process and crashes, sys::ExecuteAndWait returns -2
for signal-killed children. The resignaling block added in 15488a7
only handled CommandRes > 128, so the driver would exit normally with
code 1 instead of dying by signal.
Currently if there are operations between the loops we get a dominance
issue as the delinearlized index is added after the operations. This PR
fixes that.

For testing we also add a transform pattern that makes a direct call to
coalesceLoops as the existing pattern calls
coalescePerfectlyNestedSCFForLoops which does not consider the loop nest
perfectly nested if there are operations between them which is safer for
that usage.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…183572)

`BitCastOp::fold` called `Type::getIntOrFloatBitWidth()` on the source
element type without first verifying it satisfies `isIntOrFloat()`. When
the source vector has `index` element type (e.g. `vector<16xindex>`),
the assertion `only integers and floats have a bitwidth` fires.

Add an `srcElemType.isIntOrFloat()` guard to the condition so that the
constant-folding path is skipped for non-integer/float element types.

Fixes #177835
…oison (#183596)

When `constFoldBinaryOp<IntegerAttr>` is called with a `ub.poison`
operand, it propagates the poison attribute as its result. The fold
method for `arith.addui_extended` then attempted to cast this result to
`TypedAttr` via `llvm::cast<TypedAttr>(sumAttr)`, which failed with an
assertion because `PoisonAttr` does not implement the `TypedAttr`
interface.

Fix this by checking whether the folded sum is a poison attribute before
the cast. When poison is detected, it is propagated to both the sum and
overflow results.

Fixes #181534
…s of partial specializations (#183348)

This fixes a helper so it implements retrieval of the argument replaced
for a template parameter for partial spcializations.

This was left out of the original patch, since it's quite hard to
actually test.

This helper implements the retrieval for variable templates, but only
for completeness sake, as no current users rely on this, as I don't
think a similar test case is possible to implement with variable
templates.

This fixes a regression introduced in #161029 which will be backported
to llvm-22, so there are no release notes.

Fixes #181062
Fixes #181410
)

BF16 source operands use F32 inline constant values, so set OP_SEL to
select the high half of the constant, since BF16 encoding matches the
high 16 bits of F32 encoding. This behaviour is different from F16
source operands which use F16 constant values in the low 16 bits.

Fixes: #183337
This is another instance of the logic from #183159. If we know
one source is not-infinity, and the other source is less than or
equal to 1, this cannot overflow. Special case llvm.amdgcn.trig.preop,
as a substitute for proper range tracking. This almost enables pruning
edge case handling in trig function implementations, if not for the
recursion depth limit (but that's a problem for another day).
As in title. Only `reassoc` pattern was supplied -- for completeness all
should be supplied. Make FastMathFlag ctor public as well.
The Metal Shader converter can output shader reflection information into
a JSON file. This connects the -Fre flag (DXC's flag for reflection) to
the Metal Shader Converter tool step to produce the JSON file. As a
temporary state the -Fre flag will error when used without the -metal
flag.

This is required to address
llvm/offload-test-suite#452
Summary:
This is needed on some platforms like Windows when the generated command
line becomes too large. This seems to be occurring in practice so we
need to support this. Uses the same basic support clang does.

No test because there isn't any current infrastructure to support it,
will likely be "tested" by ROCBLAS builds not failing anymore on
Windows.
We have accumulated four places where variables were only being used in
asserts. This change silences the warnings for that.
#180563)

Fixes #154713.

The crash was due to `Index` sometimes being an unsigned 64-bit integer
which was being zero-extended to a signed 64-bit, triggering an
assertion failure in `APSInt::getExtValue`. This patch zero-extends it
to a unsigned 64-bit integer instead, since `HandleLValueVectorElement`
takes in a `uint64_t` anyway.
In CGOpenMPRuntimeGPU::translateParameter, reference-type captured
variables were translated to pointer parameters with two address-space
annotations:

1. LangAS::opencl_global on the pointee (for map'd variables), which
correctly produces ptr addrspace(1) in NVPTX IR.
2. getLangASFromTargetAS(NVPTX_local_addr=5) on the pointer itself,
annotating the parameter as living in NVPTX local (stack) memory.

The second annotation is incorrect at the Clang type-system level:
EmitParmDecl only supports parameters to be in LangAS::Default (or the
special cases for OpenCL).

Temporarily add an assert in EmitParmDecl that catches parameters with
non-default address spaces in non-OpenCL compilations, and fix the
violation by dropping the NVPTX_local_addr addAddressSpace call.

Should fix the issue noticed in

#181256 (comment),
allowing removing that special case there for OpenMP, though I haven't
tested the combination yet. That PR would fix EmitParmDecl to actually
support non-default address spaces from Sema, and will remove this
assert again.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…s in `JSONFormat`

This changes fixes the diagnostic infrastructure in `JSONFormat`
implementation to pass model objects (`EntityId`, `EntityLinkage`,
`BuildNamespace`, `NestedBuildNamespace`, `SummaryName`) directly to
`ErrorBuilder` instead of manually extracting their components. This
relies on existing `llvm::format_provider` specializations for these
objects.

To support consistent string conversion for `BuildNamespaceKind` and
`EntityLinkageType`, across both serialization and `operator<<`,
`toString`/`fromString` functions have been introduced in an internal
header `ModelStringConversions.h`.

`EntityLinkage::LinkageType` is promoted to a standalone enum class
`EntityLinkageType` at namespace scope, following the same pattern as
`BuildNamespaceKind`.

Tests have been added for `operator<<` and `format_provider` for all
affected types, and a new `ModelStringConversionsTest.cpp` directly
unit-tests the `toString/fromString` functions including round-trip and
unknown-input cases.
This patch fixes a HexagonConstPropagation assert when evaluating
sign-bit CONST32/CONST64 immediates (e.g. 0x80000000) after ConstantInt
stopped implicitly truncating, by allowing truncation for that signed
case.
…to wi pass (#181917)

This PR adds distribution pattern for xegpu.load & store ops for the new
sg-to-wi pass
Fix incremental build failure when using PCH build and clang-cache
together.
…es (#182362)

Fixes #182122

--- 

This patch resolves a crash when parsing `#pragma clang attribute`
arguments for attributes that forbid arguments. The root cause is that
the `#pragma` attribute path doesn't pass `EndLoc` to


https://github.com/llvm/llvm-project/blob/413cafa4624eb37e586e266f44abd64896e1c598/clang/lib/Parse/ParsePragma.cpp#L1982-L1985

unlike the normal attribute parsing flow


https://github.com/llvm/llvm-project/blob/413cafa4624eb37e586e266f44abd64896e1c598/clang/lib/Parse/ParseDeclCXX.cpp#L4706

Without `EndLoc`, argument parsing cannot update the parsed end token


https://github.com/llvm/llvm-project/blob/413cafa4624eb37e586e266f44abd64896e1c598/clang/lib/Parse/ParseDecl.cpp#L621-L622

and `fix-it` gets an invalid end location


https://github.com/llvm/llvm-project/blob/0dafeb97a4687c29f5182fa0239c7fa39ee23091/clang/lib/Parse/ParseDeclCXX.cpp#L4537

This change makes the pragma path pass `EndLoc` the same way as the
regular flow does, preventing the crash and preserving valid fix-it
ranges.
This patch fixes the REPL's output from interleaving with the status
line by locking the stream before printing.
VPInstructionWithType directly allows modeling the loaded type.
…rs (#182354)

Added reserved_smem_offset_{begin|end|cap|0} intrinsics to expose shared
memory special registers and NVPTX TableGen support for these
intrinsics.
fhahn and others added 30 commits February 28, 2026 12:44
Move materialization of the symbolic UF directly to unrollByUF. At this
point, unrolling materializes the decision and it is natural to also
materialize the symbolic UF here.
…83891)

`DenseElementsAttr` supports only a hard-coded list of element types:
`int`, `index`, `float`, `complex`. This commit generalizes the
`DenseElementsAttr` infrastructure: it now supports arbitrary element
types, as long as they implement the new `DenseElementTypeInterface`.

The `DenseElementTypeInterface` has the following helper functions:
- `getDenseElementBitSize`: Query the size of an element in bits. (When
storing an element in memory, each element is padded to a full byte.
This is an existing limitation of the `DenseElementsAttr`; with an
exception for `i1`.)
- `convertToAttribute`: Attribute factory / deserializer. Converts bytes
into an MLIR attribute. The attribute provides the assembly format /
printer for a single element.
- `convertFromAttribute`: Serializer. Converts an MLIR attribute into
bytes.

Note: `convertToAttribute` / `convertFromAttribute` are mainly for
writing test cases. For performance reasons, `DenseElementsAttr` users
should work with raw bytes / elements and avoid any API that
materializes MLIR attributes. However, MLIR attributes typically have
human-readable parsers/printers, making them suitable for lit tests and
debugging.

This PR introduces an additional assembly format for
`DenseElementsAttrs`. There are now two formats. (The existing one is
kept for compatibility reasons.)
- Literal-first (existing): `dense<[1, 2, 3]> : tensor<3xi32>`
- Type-first (new): `dense<tensor<3xi32> : [1 : i32, 2 : i32, 3 : i32]>`

The new syntax is needed to disambiguate between "literal" (e.g., `1`)
and attribute (e.g., `1 : i32`) when parsing the first token. In the
literal-first syntax, we only parse literals. In the type-first syntax,
we only parse attributes.

The existing `int`, `index`, `float`, `complex` types also implement the
`DenseElementTypeInterface`. This allows us to implement
`DenseElementsAttr::get` and `AttributeElementIterator::operator*` in a
generic way.

RFC:

https://discourse.llvm.org/t/rfc-allow-custom-element-types-in-denseelementattr/89656

This is a re-upload of #179122.
…ic index (#183783)

VectorInsertOpConversion crashes with an assertion failure when
inserting a sub-vector at a dynamic position into a multi-dimensional
vector. The pattern calls getAsIntegers() on the position, which asserts
that all fold results are compile-time constant attributes.

The existing guard (checking llvm::IsaPred<Attribute>) only covered the
case where a scalar is inserted into the innermost dimension (the
extractvalue path). The guard was missing for the insertvalue path when
inserting a sub-vector at a dynamic position into a nested aggregate.

Fix: add the same guard before the llvm.insertvalue creation to return
failure() gracefully when any position index is dynamic, matching the
behavior of VectorExtractOpConversion.

Fixes #177829
Fix-forward for #183541.
Two callsites to target_link_libraries were not migrated to the
keyword signature.

Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>
…ypes" (#183917)

Reverts #183891

Reverting a second time. The build bot failure seems to be
non-deterministic.
Enable shift64 hazard recognition for gfx9 cores.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
Implement ImplicitValueInitExpr for ComplexType
…lt mode (#183700)

Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
This is LLDB version of
https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/ppcfbsd-kern.c.
This enables selecting ppc64le and reading registers from PCB structure
on core dump and live kernel debugging. FPU registers aren't supported
yet due to pcb structure issue, but this change still achieves feature
parity with KGDB. Trapframe unwinding support will be implemented in
future. Test files using core dump from ppc64le will be implemented once
other kernel debugging improvements are done.

---------

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
…inator (#183930)

WarpExecuteOnLane0Op::verify() called getTerminator() which performed an
unconditional cast<gpu::YieldOp> on the block's last operation. When the
op body was written with a different terminator (e.g. affine.yield), the
cast asserted immediately instead of emitting a verifier diagnostic.

Fix by using dyn_cast in verify() before calling getTerminator(), and
emitting a proper error message when the terminator is not gpu.yield.

Add a regression test to invalid.mlir.

Fixes #181450
Due to special handling of Whitesmiths when parsing, the additional
level(s) needed for the block, when used with IndentAccessModifiers,
were not being applied. Consequently, when calculating the access
modifier indent offset, the modifiers were being placed at the class
level.

This change ensures that the additional level(s) are not omitted for
Whitesmiths.
…p file (#173966)

Command qModuleInfo (GDB server protocol) can be used to request
metadata of shared libraries stored in a ZIP archive on the target. This
is typically used for retrieving SO files bundled in a APK file on
Android.

Requesting the last entry in the ZIP file often fails because of a bug
in the entry search mechanism. This PR fixes this.

NOTES:
* The bug appears only if the entry in the zip file has no extra field
or comment
* This is part on an effort to get lldb working for debugging Swift on
Android: swiftlang#10831
…hey have copyable node

Need to recalculate the deps for all buildvector nodes with copyable
deps to prevent a compiler crash during scheduling of instructions
Currently for thin-lto, the imported static global values (functions,
variables, etc) will be promoted/renamed from e.g., foo() to
foo.llvm.(). Such a renaming caused difficulties in live patching
since function name is changed ([1]).

It is possible that some global value names have to be promoted to avoid
name collision and linker failure. But in practice, majority of name
promotions can be avoided.

In [2], the suggestion is that thin-lto pre-link decides whether
a particular global value needs name promotion or not. If yes, later on
in thinBackend() the name will be promoted.

I compiled a particular linux kernel version (latest bpf-next tree)
and found 1216 global values with suffix .llvm.. With this patch,
the number of promoted functions is 2, 98% reduction from the
original kernel build.

If some native objects are not participating with LTO, name promotions
have to be done to avoid potential linker issues. So the current
implementation cannot be on by default. But in certain cases, e.g., linux kernel
build, people can enable lld flag --lto-whole-program-visibility to reduce the
number of functions like foo.llvm.().

For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a
few other rare cases, reducing the number of renaming due to promotion,
is not implemented as lld flag '-lto-whole-program-visibility' is not
supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull
request only supports llvm-lto2 style workflow.

The feature is off by default. To enable the future, lld flag
'-lto-whole-program-visibility'  and llvm flag
'-always-rename-promoted-locals=false' are needed.

The link [3] has more context for the pull request discussions.

[1] https://lpc.events/event/19/contributions/2212
[2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400
[3] #178587
This is LLDB version of
https://cgit.freebsd.org/ports/tree/devel/gdb/files/kgdb/riscv-fbsd-kern.c.
This enables selecting riscv64 and reading registers from PCB structure
on core dump and live kenrel debugging while trapframe unwinding support
will be implemented in future. Test files using core dump from riscv64
will be implemented once other kernel debugging improvements are done.

---------

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
… type (#183942)

When parsing a spirv.struct type, any MLIR type was accepted as a member
type without validation. This caused a crash in TypeExtensionVisitor and
TypeCapabilityVisitor which unconditionally used cast<SPIRVType> on
struct element types, asserting when a non-SPIR-V type (e.g.,
vector<2x2xi1>) was encountered.

Fix the parser to reject non-SPIR-V member types with a proper error
message.

Fixes #179675
Changes `formatter_bytecode.compile_file` to return a `BytecodeSection`
value. The `BytecodeSection` holds the data that needs to be emitted to
an `__lldbformatters` section.

The `BytecodeSection` currently provides `write_binary`, but will be
updated in a follow up commit to include `write_source` which will allow
the data to be emitted as C source code, or Swift source code. This will
make it easier to integrate into build systems, as it's easier to get
data into a binary via source code, than as a raw binary file.
…173852)

Trivial change to prevent all warnings from being printed on a single
line in the VS Code debug console.
Initialize DemandedElts mask when the index is constant and inbounds, otherwise check all elements.
This provides a more helpful message to the user when passing invalid command
line options
…ion (#183718)

Starting with AMD PAL metadata v3.6, pipeline ELFs cannot have a
`.shader_functions` section. However, dynamic VGPR retry helpers use
the `AMDGPU_CS_ChainPreserve` calling convention, which LLVM previously
treated as a module entrypoint, incorrectly emitting this metadata.
…ration test (#183664)

This PR adds the `convert_region_types` API to
`ConversionPatternRewriter` and introduces a new integration test,
`bf.py`, which demonstrates how to combine a Python-defined dialect, the
dialect conversion API, the pass manager, and the execution engine to
build a pure-Python JIT compilation pipeline.
… generated docs (#183938)

The builtin documentation emitter previously sorted all categories
purely alphabetically, which placed the "Undocumented" section before
categories like "WMMA" in the generated RST. This made the output
confusing since stub entries appeared before real documentation.

Push the "Undocumented" category to the end of the output so that all
documented categories appear first, regardless of their names.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.