CLI for measuring execute_cuda encoding perf #6381

a10y · 2026-02-09T22:07:36Z

Overview of changes

ergonomics/API focused changes

Introduced a new LaunchStrategy on the execution context. This by default will launch kernels and not track any timing information, but it is pluggable. For example in benchmarks we replace this with a TimedLaunchedStrategy which executes the kernels in blocking mode and logs their execution time.
Centralized the entrypoint for launching all kernels. They are now forced to be dispatched off of the execution context using the ctx.launch_kernel() method, which accepts a closure that is used to populate kernel arguments

A lot of test and benchmark code needed to be updated to use the new launch methods.

~~Fused FOR + BP~~

This has been shelved for a FLUP since this was too big

* I've updated the BP kernel generator to generate bp as FFOR, i.e. fused bitpacking with FOR. In practice, this is just adding a const T reference param. By default the execution for BitPackedArray passes zero, but there is a specialization in the ForArray execution tree where if it detects one of its descendants is BP, it fuses itself with the bit unpacking

GPU tracing tool

There's a new binary in vortex-test-e2e-cuda-scan which takes as input a Vortex file.

It will recompress the file using only GPU-supported encodings, scan it back, and collect timings for how long each column scan took. The results are printed as either pretty text, or as JSON to stdout, which can be piped into duckdb or similar for analysis

Example usage:

FLAT_LAYOUT_INLINE_ARRAY_NODE=true RUST_LOG=vortex_cuda=trace,info cargo run --release --bin vortex-test-e2e-cuda-scan -- ./vortex-bench/data/tpch/1.0/vortex-file-compressed/lineitem_0.vortex

vortex-cuda/gpu-scan-cli/src/main.rs

a10y · 2026-02-11T16:55:57Z

vortex-cuda/benches/for_cuda.rs

 {
    let reference = <T as From<u8>>::from(REFERENCE_VALUE);
    let data: Vec<T> = (0..len)
-        .map(|i| <T as From<u8>>::from((i % 256) as u8) + reference)


this was overflowing before?

a10y · 2026-02-12T19:38:19Z

vortex-cuda/benches/dict_cuda.rs

-                    let mut total_time = Duration::ZERO;
+                    let mut cuda_ctx = CudaSession::create_execution_ctx(&VortexSession::empty())
+                        .vortex_expect("failed to create execution context")
+                        .with_launch_strategy(Arc::new(timed));


see here: instead of replicating the full launch setup in benchmark code, we can just stub in a launcher that collects timing information across runs

a10y · 2026-02-12T19:47:59Z

vortex-cuda/src/kernel/mod.rs

-    }};
+/// Implementations can add tracing, async callbacks, or other behavior
+/// around kernel launches.
+pub trait LaunchStrategy: Debug + Send + Sync + 'static {


this is where LaunchStrategy is defined and impled

Signed-off-by: Andrew Duffy <andrew@a10y.dev> fixup Signed-off-by: Andrew Duffy <andrew@a10y.dev>

vortex-cuda/src/kernel/encodings/runend.rs

joseph-isaacs · 2026-02-13T10:02:45Z

vortex-cuda/src/executor.rs

+    pub fn launch_kernel<'a, F>(
+        &'a mut self,
+        function: &'a CudaFunction,
+        len: usize,
+        build_args: F,
+    ) -> VortexResult<()>
+    where
+        F: FnOnce(&mut LaunchArgs<'a>),
+    {
+        let mut launcher = self.launch_builder(function);
+        build_args(&mut launcher);
+
+        let events = launch_cuda_kernel_impl(&mut launcher, self.strategy.event_flags(), len)?;
+        self.strategy.on_complete(&events, len)?;
+
+        drop(events);


joseph-isaacs · 2026-02-13T10:03:06Z

vortex-cuda/gpu-scan-cli/src/main.rs

this is an odd place for a binary?

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/gpu-scan-measure branch from 0249a54 to a4c923c Compare February 9, 2026 22:51

a10y marked this pull request as ready for review February 9, 2026 22:51

a10y added changelog/chore A trivial change changelog/skip Do not list PR in the changelog and removed changelog/chore A trivial change labels Feb 9, 2026

a10y requested a review from joseph-isaacs February 9, 2026 22:51

joseph-isaacs reviewed Feb 10, 2026

View reviewed changes

vortex-cuda/gpu-scan-cli/src/main.rs Show resolved Hide resolved

a10y force-pushed the aduffy/gpu-scan-measure branch 5 times, most recently from e52bb67 to 21537a6 Compare February 10, 2026 20:08

a10y commented Feb 11, 2026

View reviewed changes

a10y force-pushed the aduffy/gpu-scan-measure branch 3 times, most recently from 052da59 to 249c24c Compare February 12, 2026 14:33

a10y commented Feb 12, 2026

View reviewed changes

measure scans

7b61bd6

Signed-off-by: Andrew Duffy <andrew@a10y.dev> fixup Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/gpu-scan-measure branch from 780efdb to 7b61bd6 Compare February 12, 2026 20:05

joseph-isaacs reviewed Feb 13, 2026

View reviewed changes

vortex-cuda/src/kernel/encodings/runend.rs Outdated Show resolved Hide resolved

joseph-isaacs reviewed Feb 13, 2026

View reviewed changes

a10y added 2 commits February 13, 2026 09:59

no pub *Executor

046a526

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

move crate

88b5197

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI for measuring execute_cuda encoding perf #6381

CLI for measuring execute_cuda encoding perf #6381

a10y commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

a10y Feb 11, 2026

Uh oh!

a10y Feb 12, 2026

Uh oh!

a10y Feb 12, 2026

Uh oh!

Uh oh!

joseph-isaacs Feb 13, 2026

Uh oh!

joseph-isaacs Feb 13, 2026

Uh oh!

a10y Feb 13, 2026

Uh oh!

a10y Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLI for measuring execute_cuda encoding perf #6381

Are you sure you want to change the base?

CLI for measuring execute_cuda encoding perf #6381

Conversation

a10y commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview of changes

Uh oh!

Uh oh!

a10y Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

a10y Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

a10y Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joseph-isaacs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

a10y Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

a10y Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

a10y commented Feb 9, 2026 •

edited

Loading