Tune memory for long simulations¶

Advanced

Prerequisites: GPU acceleration tutorial

100+ second simulations, large case2000+ grids, 16-worker batches — memory pressure is normal. This page is a graduated checklist of optimisations.

Tier 1 — YAML knobs (no code changes)¶

Reduce stored trajectory points¶

solver:
  options:
    dense_n: 50          # torch: output points per leg (default 200) → reduce to 50

Trim batch outputs¶

output:
  keep_failed: false     # don't keep rejected samples (default already false)
  format: hdf5           # 2-3× smaller than npz
  metadata: parquet      # 10× smaller than csv

Tier 2 — torch `chunk_seconds`¶

If you've moved to the torch backend:

solver:
  kind: torch_dopri5
  options:
    chunk_seconds: 0.5     # split each leg into 0.5 s windows

Effect (case39 + 10 s sim):

`chunk_seconds`	Peak VRAM	Speed
`null` (default)	1.0 GB	100 %
`1.0`	0.5 GB	95 %
`0.5`	0.25 GB	90 %
`0.1`	50 MB	70 %

Full coverage: GPU acceleration — out-of-memory.

Tier 3 — Cap batch concurrency¶

Each joblib worker copies the case data + solver state. More workers ⇒ more RAM.

output:
  parallel:
    n_jobs: 4              # cut from 16 to 4-8 on big cases
    batch_size: 1          # less in-flight data

Tier 4 — Disable nested BLAS threading¶

joblib workers × OpenBLAS multi-threading multiplies resource use. Cap with env vars:

export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1
python -m pylectra run examples/batch_case39.yaml

Each sim is single-threaded but with many workers → CPU is fully used and total memory is much lower.

Tier 5 — Chunked persistence¶

Very long batches (10 000+) are risky to run as one job — slice into 100-sample chunks:

from pylectra.run import run
import os

base_seed = 42
chunk_size = 100
total = 10000

for offset in range(0, total, chunk_size):
    out_dir = f"./batch_chunks/chunk_{offset:06d}"
    if os.path.exists(out_dir):
        continue            # skip completed
    run("examples/batch_case39.yaml",
        scenarios={"count": chunk_size, "seed": base_seed + offset},
        output={"directory": out_dir})

Then merge metadata with pandas:

import pandas as pd, glob
metas = pd.concat([pd.read_parquet(f) for f in glob.glob("./batch_chunks/*/metadata.parquet")])
metas["sample_id"] = metas.index    # re-number to avoid collisions

Tier 6 — Drop intermediate state¶

If your Python script doesn't release SimulationResult objects, N results pile up:

results = []                         # ✗ grows without bound
for cfg in configs:
    out = run(cfg, plot=False)
    results.append(out.result.max_angle_deviation_deg)   # keep only the scalar

# Or
del out                              # explicit del
import gc
gc.collect()                         # force GC

Tier 7 — Use a smaller case¶

case2000+ is dominated by the time-series themselves — each sample HDF5 is hundreds of MB. If you only need scalar metrics (max angle deviation, CCT), keep just the metadata:

output:
  format: hdf5
  keep_failed: false
  # Custom "metadata-only" mode requires small BatchRunner changes;
  # planned for a future release.

Current workaround: run full pipeline, then rm samples/*.h5 after the fact.

Tier 8 — Different case representation¶

For mass small-signal sweeps (skip_integration: true), you only need case + model parameters + equilibrium — no time-series. Per-sample metadata + eigenvalues is just a few KB.

mode: batch
skip_integration: true
small_signal: {kind: finite_difference}
output:
  format: npz             # simpler than hdf5
  metadata: parquet

Monitor memory¶

# Linux / macOS
htop                       # live; F10 to quit

# Windows
tasklist | findstr python

# Inline in your script
import psutil, os
proc = psutil.Process(os.getpid())
print(f"RSS = {proc.memory_info().rss / 1024**3:.2f} GB")

Troubleshooting¶

"MemoryError" / Killed by OS¶

Diagnosis order:

Is n_jobs too large? Halve it.
How big is the case? pylectra info reports total memory vs. case size.
Is dense_n too large (torch)? Reduce to 50.
Are you accumulating results into a Python list?
Last resort: chunk the run + del + gc.collect().

Swap usage spikes¶

The OS is paging to disk — performance plummets. Immediately lower n_jobs; otherwise a 10-minute batch turns into 10 hours.

Next steps¶

GPU acceleration tutorial — full chunk_seconds reference.
Parallel batch tuning — n_jobs / BLAS threading.
FAQ — memory-related troubleshooting.