perf: replace np.vectorize with vectorized string ops for label names by MaykThewessen · Pull Request #617 · PyPSA/linopy

MaykThewessen · 2026-03-13T21:16:28Z

Summary

Add vectorized_label_names() helper that uses np.char.add() for simple prefix-based label names
Replace np.vectorize(print_variable/print_constraint) in to_highspy(), to_gurobipy(), and to_mosek()
Falls back to np.vectorize for explicit_coordinate_names=True mode (which requires per-label lookups)

Motivation

np.vectorize is documented as a convenience wrapper, not a performance tool — it calls a Python function per element. For the default (non-explicit-names) path, the printer functions are trivial string concatenation (f"x{var}", f"c{cons}") that can be expressed as np.char.add(prefix, labels.astype(str)).

The improvement is modest on modern numpy/Python (~1.2x for 2M labels), but the vectorized approach is also cleaner and avoids the np.vectorize footgun for future maintainers.

Context

See discussion in #198 (comment) for the broader performance analysis.

Test plan

test_io.py::test_to_highspy passes (verifies HiGHS direct API export)
test_optimization.py highs-direct tests pass (24/25 — one pre-existing failure)
Verified output equality between old and new approaches on 593K + 1.38M labels

🤖 Generated with Claude Code

MaykThewessen · 2026-03-13T21:20:23Z

Benchmark Results

Tested on Python 3.14, numpy 2.x, macOS ARM64 with realistic LP sizes (593K variable labels + 1.38M constraint labels):

593K cols + 1.38M rows:
  np.vectorize: 0.560s
  np.char.add:  0.454s
  Speedup:      1.2x

The speedup is modest (~1.2x) because:

Python 3.14 has significantly improved function call overhead, making np.vectorize faster than in older versions
The dominant cost is labels.astype(str) conversion, which both approaches share

The bigger performance win comes from PR #616 (caching MatrixAccessor properties) — vlabels and clabels were being recomputed 3-4 times per solve call, each involving DataFrame flattening + vector creation. That redundant recomputation was the real source of the ~15s overhead we observed in our profiling.

This PR is still worthwhile for code clarity (np.char.add makes the intent explicit) and avoiding the np.vectorize footgun, but the performance delta is smaller than initially estimated.

FBumann · 2026-03-14T17:56:01Z

@MaykThewessen Thanks for your contribution.
To document the performance improvements you experienced, we need a way of replicating them.
Could you add some sort of benchmark script that lets us reproduce your claims? Take #564 as a reference.
We also are working on a benchmarking suite here: #567
Maybe one of those models can be used.
The same goes for #616 #618 #619

Covers the code paths optimised by these PRs: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add for label string concatenation - PyPSA#618 sparse matrix slicing in MatrixAccessor.A - PyPSA#619 numpy solution unpacking Reproduces benchmark results on PyPSA SciGrid-DE (24–500 snapshots) and a synthetic model. Supports JSON output and --compare mode for cross-branch comparison. Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json --label "after" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MaykThewessen · 2026-03-17T20:24:56Z

@FBumann Thanks for the pointer to #564. I've added benchmark/scripts/benchmark_matrix_gen.py to all four PRs (#616, #617, #618, #619) — it covers the shared code path (matrix generation pipeline) affected by all four changes.

Reproduce with:

# On master (baseline):
python benchmark/scripts/benchmark_matrix_gen.py -o before.json --label "master"

# On PR branch (after):
python benchmark/scripts/benchmark_matrix_gen.py -o after.json --label "with-PRs"

# Compare:
python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Benchmark results on this PR's branch (Python 3.14.3, numpy 2.4.3, macOS ARM64, PyPSA SciGrid-DE):

Snapshots	Variables	Constraints	`flat_cons`	`A_matrix`	Full pipeline
24	59,640	261,298	0.15s	0.16s	0.34s
100	248,500	1,088,862	0.94s	0.92s	2.04s
200	497,000	2,177,762	2.41s	2.83s	3.71s
500	1,242,500	5,444,462	7.57s	8.01s	14.80s

The dominant cost is flat_cons (constraint label generation, affected by #617) and A_matrix sparse slicing (affected by #618). These are measured without the caching from #616 — with caching, flat_vars/flat_cons are only computed once per solve rather than 3–4 times.

Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 path python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Uses np.char.add for the common case (no explicit_coordinate_names), which is ~1.2x faster than np.vectorize on large label arrays. Falls back to np.vectorize when a custom printer is provided.

Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

FBumann · 2026-03-18T07:49:21Z

@MaykThewessen I tested this in python 3.11:

import numpy as np
import time

labels = np.arange(200_000)

def print_variable(v):
    return f"x{v}"

t0 = time.perf_counter()
for _ in range(20):
    np.vectorize(print_variable)(labels).astype(object)
t_vec = (time.perf_counter() - t0) / 20

t0 = time.perf_counter()
for _ in range(20):
    np.char.add("x", labels.astype(str)).astype(object)
t_char = (time.perf_counter() - t0) / 20

print(f"np.vectorize: {t_vec*1000:.1f}ms")
print(f"np.char.add:  {t_char*1000:.1f}ms")
print(f"ratio: {t_vec/t_char:.2f}x")

Shows a meaningful regression

np.vectorize: 38.0ms
np.char.add:  65.3ms
ratio: 0.58x

I would exclude this PR, as even the promised speedup is very modest.

MaykThewessen · 2026-03-18T08:29:25Z

@FBumann Thanks for testing on Python 3.11 — you're right, np.char.add is slower there. The speedup I measured (1.2x) was on Python 3.14 where function call overhead has been significantly reduced, making np.vectorize already fast enough that np.char.add barely wins.

Since linopy targets Python 3.10+, a change that regresses on 3.11 isn't viable. Closing this PR.

The other three PRs (#616, #618, #619) don't have this version-dependent behavior — they use cached_property, np.ix_ sparse slicing, and numpy dense-array lookup respectively, all of which are improvements regardless of Python version.

MaykThewessen · 2026-03-18T09:23:01Z

@FBumann Thanks for testing this — you're right, I can confirm the regression on Python 3.11.

On Python 3.14 + numpy 2.4.3 I see the opposite (np.char.add 1.28x faster), but that's not representative of the majority of users. Since the improvement is Python-version dependent and you measured a meaningful regression on 3.11, I agree this PR should be excluded.

I'll close this PR. Thanks for the careful review!

FBumann mentioned this pull request Mar 14, 2026

chore: benchmarks #567

Open

3 tasks

FBumann added the performance label Mar 17, 2026

This was referenced Mar 17, 2026

perf: cache MatrixAccessor properties to avoid redundant recomputation #616

Open

perf: use single-step sparse matrix slicing in MatrixAccessor #618

Closed

perf: use numpy array lookup for solution unpacking #619

Open

MaykThewessen force-pushed the perf/vectorize-label-names branch from 324f707 to 199f51c Compare March 17, 2026 20:38

MaykThewessen force-pushed the perf/vectorize-label-names branch from bcd1bcb to d15e3f2 Compare March 17, 2026 20:57

MaykThewessen and others added 2 commits March 17, 2026 22:13

perf: replace np.vectorize with vectorized string ops for label names

f95ef0e

Uses np.char.add for the common case (no explicit_coordinate_names), which is ~1.2x faster than np.vectorize on large label arrays. Falls back to np.vectorize when a custom printer is provided.

MaykThewessen force-pushed the perf/vectorize-label-names branch from 656d283 to 0f323d2 Compare March 17, 2026 21:13

[pre-commit.ci] auto fixes from pre-commit.com hooks

0a87b88

for more information, see https://pre-commit.ci

FBumann closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace np.vectorize with vectorized string ops for label names#617

perf: replace np.vectorize with vectorized string ops for label names#617
MaykThewessen wants to merge 3 commits intoPyPSA:masterfrom
MaykThewessen:perf/vectorize-label-names

MaykThewessen commented Mar 13, 2026

Uh oh!

MaykThewessen commented Mar 13, 2026

Uh oh!

FBumann commented Mar 14, 2026 •

edited

Loading

Uh oh!

MaykThewessen commented Mar 17, 2026

Uh oh!

FBumann commented Mar 18, 2026 •

edited

Loading

Uh oh!

MaykThewessen commented Mar 18, 2026

Uh oh!

MaykThewessen commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaykThewessen commented Mar 13, 2026

Summary

Motivation

Context

Test plan

Uh oh!

MaykThewessen commented Mar 13, 2026

Benchmark Results

Uh oh!

FBumann commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaykThewessen commented Mar 17, 2026

Uh oh!

FBumann commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaykThewessen commented Mar 18, 2026

Uh oh!

MaykThewessen commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBumann commented Mar 14, 2026 •

edited

Loading

FBumann commented Mar 18, 2026 •

edited

Loading