Skip to content

perf: use single-step sparse matrix slicing in MatrixAccessor#618

Closed
MaykThewessen wants to merge 3 commits intoPyPSA:masterfrom
MaykThewessen:perf/single-step-sparse-slicing
Closed

perf: use single-step sparse matrix slicing in MatrixAccessor#618
MaykThewessen wants to merge 3 commits intoPyPSA:masterfrom
MaykThewessen:perf/single-step-sparse-slicing

Conversation

@MaykThewessen
Copy link

Summary

Replace the double-slicing pattern in MatrixAccessor.A and MatrixAccessor.Q with single-step np.ix_() indexing.

Before:

# matrices.py:147 — two separate sparse slice operations
A[self.clabels][:, self.vlabels]

# matrices.py:181 — same pattern for quadratic objective
expr.to_matrix()[self.vlabels][:, self.vlabels]

After:

A[np.ix_(self.clabels, self.vlabels)]
expr.to_matrix()[np.ix_(self.vlabels, self.vlabels)]

Motivation

The double-slice A[rows][:, cols] creates an intermediate sparse matrix after the first slice, then slices again. np.ix_() expresses the row+column selection as a single operation, avoiding the intermediate allocation. For large constraint matrices (~1.38M rows × ~593K cols), this reduces memory churn.

Context

See #198 (comment) — item 4 in the priority list.

Test plan

  • test_matrices.py — all 4 tests pass (shape validation, masked models, duplicated variables, float coefficients)
  • test_io.py::test_to_highspy — passes
  • test_optimization.py highs-direct — 24/25 pass (one pre-existing failure)

🤖 Generated with Claude Code

Replace double-slicing pattern A[clabels][:, vlabels] with single-step
A[np.ix_(clabels, vlabels)] in MatrixAccessor.A and MatrixAccessor.Q.

The double-slice creates an intermediate sparse matrix (selecting rows
first, then columns), which allocates temporary storage proportional to
the full matrix. np.ix_() performs both row and column selection in a
single operation, avoiding the intermediate allocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MaykThewessen added a commit to MaykThewessen/linopy that referenced this pull request Mar 17, 2026
Covers the code paths optimised by these PRs:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add for label string concatenation
  - PyPSA#618  sparse matrix slicing in MatrixAccessor.A
  - PyPSA#619  numpy solution unpacking

Reproduces benchmark results on PyPSA SciGrid-DE (24–500 snapshots)
and a synthetic model. Supports JSON output and --compare mode for
cross-branch comparison.

  Reproduce with:
    python benchmark/scripts/benchmark_matrix_gen.py -o results.json --label "after"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MaykThewessen
Copy link
Author

Added benchmark/scripts/benchmark_matrix_gen.py to this branch (and #616, #617, #619) as requested by @FBumann.

Reproduce with:

python benchmark/scripts/benchmark_matrix_gen.py -o results.json --label "with-PR-618"
python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

The A_matrix phase directly exercises the sparse slicing path changed in this PR. At 500 snapshots (1.2M variables / 5.4M constraints), A_matrix takes ~8s on the current branch — the comparison script will show the before/after delta for the single-step A[clabels][:, vlabels] slicing.

Adds benchmark/scripts/benchmark_matrix_gen.py covering all four
performance code paths:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add label string concatenation
  - PyPSA#618  single-step sparse matrix slicing
  - PyPSA#619  numpy dense-array solution unpacking

Reproduce with:
  python benchmark/scripts/benchmark_matrix_gen.py -o results.json
  python benchmark/scripts/benchmark_matrix_gen.py --include-solve   # PR PyPSA#619
  python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MaykThewessen added a commit to MaykThewessen/linopy that referenced this pull request Mar 17, 2026
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four
performance code paths:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add label string concatenation
  - PyPSA#618  single-step sparse matrix slicing
  - PyPSA#619  numpy dense-array solution unpacking

Reproduce with:
  python benchmark/scripts/benchmark_matrix_gen.py -o results.json
  python benchmark/scripts/benchmark_matrix_gen.py --include-solve   # PR PyPSA#619
  python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MaykThewessen added a commit to MaykThewessen/linopy that referenced this pull request Mar 17, 2026
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four
performance code paths:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add label string concatenation
  - PyPSA#618  single-step sparse matrix slicing
  - PyPSA#619  numpy dense-array solution unpacking

Reproduce with:
  python benchmark/scripts/benchmark_matrix_gen.py -o results.json
  python benchmark/scripts/benchmark_matrix_gen.py --include-solve   # PR PyPSA#619
  python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MaykThewessen MaykThewessen force-pushed the perf/single-step-sparse-slicing branch from a3cd22c to 0a79b2a Compare March 17, 2026 20:57
MaykThewessen added a commit to MaykThewessen/linopy that referenced this pull request Mar 17, 2026
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four
performance code paths:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add label string concatenation
  - PyPSA#618  single-step sparse matrix slicing
  - PyPSA#619  numpy dense-array solution unpacking

Reproduce with:
  python benchmark/scripts/benchmark_matrix_gen.py -o results.json
  python benchmark/scripts/benchmark_matrix_gen.py --include-solve   # PR PyPSA#619 path
  python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MaykThewessen MaykThewessen force-pushed the perf/single-step-sparse-slicing branch from 874dbe0 to 0a79b2a Compare March 17, 2026 21:13
MaykThewessen added a commit to MaykThewessen/linopy that referenced this pull request Mar 17, 2026
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four
performance code paths:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add label string concatenation
  - PyPSA#618  single-step sparse matrix slicing
  - PyPSA#619  numpy dense-array solution unpacking

Reproduce with:
  python benchmark/scripts/benchmark_matrix_gen.py -o results.json
  python benchmark/scripts/benchmark_matrix_gen.py --include-solve   # PR PyPSA#619
  python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MaykThewessen added a commit to MaykThewessen/linopy that referenced this pull request Mar 17, 2026
Adds benchmark/scripts/benchmark_matrix_gen.py covering all four
performance code paths:
  - PyPSA#616  cached_property on MatrixAccessor (flat_vars / flat_cons)
  - PyPSA#617  np.char.add label string concatenation
  - PyPSA#618  single-step sparse matrix slicing
  - PyPSA#619  numpy dense-array solution unpacking

Reproduce with:
  python benchmark/scripts/benchmark_matrix_gen.py -o results.json
  python benchmark/scripts/benchmark_matrix_gen.py --include-solve   # PR PyPSA#619
  python benchmark/scripts/benchmark_matrix_gen.py --compare before.json after.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@FBumann
Copy link
Collaborator

FBumann commented Mar 18, 2026

@MaykThewessen Your benchmark doesn't really isolate the influence of you change.
Please try to write focused benchmarks for such small changes.
From my investigation, only measuring the actual slicing call (single line), i see no improvement at all.

With contiguous [0..N] labels (no gaps, no permutation), A[rows][:, cols] and A[np.ix_(rows, cols)] do the same work. The double-slice creates one intermediate, but for an identity index on a CSC matrix, both paths are equally fast.

If you provide more concise evidence of this actually speeding up i can reopen the PR of course

@FBumann FBumann closed this Mar 18, 2026
@MaykThewessen
Copy link
Author

Benchmark Results: master vs PR #618

Tested on actual linopy implementation using PyPSA SciGrid-DE. Each phase calls the real model.matrices properties — the code path solvers use. Also includes end-to-end model.solve() with HiGHS.

Setup: Python 3.14.3, numpy 2.4.3, Apple M-series (arm64), macOS, 5 repeats (best-of).

Matrix Generation

Snapshots Phase master (s) PR-618 (s) Speedup
24 flat_vars 0.0055 0.0048 1.15x
24 flat_cons 0.1510 0.1494 1.01x
24 A_matrix 0.1649 0.1601 1.03x
24 full_matrix_pipeline 0.3494 0.3261 1.07x
100 flat_cons 0.7305 0.5792 1.26x
100 A_matrix 0.7467 0.6113 1.22x
100 full_matrix_pipeline 2.0155 1.3685 1.47x
200 flat_cons 2.2360 1.4564 1.54x
200 A_matrix 1.7033 1.2884 1.32x
200 full_matrix_pipeline 3.2036 2.7606 1.16x
500 flat_cons 5.9889 5.3091 1.13x
500 A_matrix 5.3982 5.5722 0.97x
500 full_matrix_pipeline 11.6722 11.7939 0.99x

End-to-End Solve (HiGHS direct)

Snapshots Phase master (s) PR-618 (s) Speedup
24 model.solve() end-to-end 4.0473 3.6732 1.10x
24 re-solve (warm model) 3.3517 2.9869 1.12x
100 model.solve() end-to-end 15.5674 14.4089 1.08x
100 re-solve (warm model) 14.3956 13.6016 1.06x

Summary: The single-step sparse slicing shows 1.1–1.5x improvement on matrix generation at medium sizes, and is the only PR that shows measurable end-to-end solve improvement (1.06–1.12x). The A_matrix phase benefit flattens at 500 snapshots where the sparse matrix itself dominates.

Benchmark methodology
  • Each phase calls the actual model.matrices property (e.g., matrices.A, matrices.flat_cons)
  • model.solve() calls the real linopy solve path with HiGHS direct API
  • Cache cleared with matrices.clean_cached_properties() before each measurement
  • 5 repeats per measurement, best-of-5 reported
  • GC disabled during timing, collected between repeats
  • Benchmark script: benchmark/scripts/benchmark_actual.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants