perf: replace np.vectorize with vectorized string ops for label names#617
perf: replace np.vectorize with vectorized string ops for label names#617MaykThewessen wants to merge 3 commits intoPyPSA:masterfrom
Conversation
Benchmark ResultsTested on Python 3.14, numpy 2.x, macOS ARM64 with realistic LP sizes (593K variable labels + 1.38M constraint labels): The speedup is modest (~1.2x) because:
The bigger performance win comes from PR #616 (caching This PR is still worthwhile for code clarity ( |
|
@MaykThewessen Thanks for your contribution. |
Covers the code paths optimised by these PRs: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add for label string concatenation - PyPSA#618 sparse matrix slicing in MatrixAccessor.A - PyPSA#619 numpy solution unpacking Reproduces benchmark results on PyPSA SciGrid-DE (24–500 snapshots) and a synthetic model. Supports JSON output and --compare mode for cross-branch comparison. Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json --label "after" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@FBumann Thanks for the pointer to #564. I've added Reproduce with: # On master (baseline):
python benchmark/scripts/benchmark_matrix_gen.py -o before.json --label "master"
# On PR branch (after):
python benchmark/scripts/benchmark_matrix_gen.py -o after.json --label "with-PRs"
# Compare:
python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.jsonBenchmark results on this PR's branch (Python 3.14.3, numpy 2.4.3, macOS ARM64, PyPSA SciGrid-DE):
The dominant cost is |
324f707 to
199f51c
Compare
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bcd1bcb to
d15e3f2
Compare
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 path python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Uses np.char.add for the common case (no explicit_coordinate_names), which is ~1.2x faster than np.vectorize on large label arrays. Falls back to np.vectorize when a custom printer is provided.
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
656d283 to
0f323d2
Compare
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four performance code paths: - PyPSA#616 cached_property on MatrixAccessor (flat_vars / flat_cons) - PyPSA#617 np.char.add label string concatenation - PyPSA#618 single-step sparse matrix slicing - PyPSA#619 numpy dense-array solution unpacking Reproduce with: python benchmark/scripts/benchmark_matrix_gen.py -o results.json python benchmark/scripts/benchmark_matrix_gen.py --include-solve # PR PyPSA#619 python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
@MaykThewessen I tested this in python 3.11: import numpy as np
import time
labels = np.arange(200_000)
def print_variable(v):
return f"x{v}"
t0 = time.perf_counter()
for _ in range(20):
np.vectorize(print_variable)(labels).astype(object)
t_vec = (time.perf_counter() - t0) / 20
t0 = time.perf_counter()
for _ in range(20):
np.char.add("x", labels.astype(str)).astype(object)
t_char = (time.perf_counter() - t0) / 20
print(f"np.vectorize: {t_vec*1000:.1f}ms")
print(f"np.char.add: {t_char*1000:.1f}ms")
print(f"ratio: {t_vec/t_char:.2f}x")Shows a meaningful regression I would exclude this PR, as even the promised speedup is very modest. |
|
@FBumann Thanks for testing on Python 3.11 — you're right, Since linopy targets Python 3.10+, a change that regresses on 3.11 isn't viable. Closing this PR. The other three PRs (#616, #618, #619) don't have this version-dependent behavior — they use |
|
@FBumann Thanks for testing this — you're right, I can confirm the regression on Python 3.11. On Python 3.14 + numpy 2.4.3 I see the opposite (np.char.add 1.28x faster), but that's not representative of the majority of users. Since the improvement is Python-version dependent and you measured a meaningful regression on 3.11, I agree this PR should be excluded. I'll close this PR. Thanks for the careful review! |
Summary
vectorized_label_names()helper that usesnp.char.add()for simple prefix-based label namesnp.vectorize(print_variable/print_constraint)into_highspy(),to_gurobipy(), andto_mosek()np.vectorizeforexplicit_coordinate_names=Truemode (which requires per-label lookups)Motivation
np.vectorizeis documented as a convenience wrapper, not a performance tool — it calls a Python function per element. For the default (non-explicit-names) path, the printer functions are trivial string concatenation (f"x{var}",f"c{cons}") that can be expressed asnp.char.add(prefix, labels.astype(str)).The improvement is modest on modern numpy/Python (~1.2x for 2M labels), but the vectorized approach is also cleaner and avoids the
np.vectorizefootgun for future maintainers.Context
See discussion in #198 (comment) for the broader performance analysis.
Test plan
test_io.py::test_to_highspypasses (verifies HiGHS direct API export)test_optimization.pyhighs-direct tests pass (24/25 — one pre-existing failure)🤖 Generated with Claude Code