Tune memory for long simulations¶
Advanced
Prerequisites: GPU acceleration tutorial
100+ second simulations, large case2000+ grids, 16-worker batches — memory pressure is normal. This page is a graduated checklist of optimisations.
Tier 1 — YAML knobs (no code changes)¶
Reduce stored trajectory points¶
Trim batch outputs¶
output:
keep_failed: false # don't keep rejected samples (default already false)
format: hdf5 # 2-3× smaller than npz
metadata: parquet # 10× smaller than csv
Tier 2 — torch chunk_seconds¶
If you've moved to the torch backend:
Effect (case39 + 10 s sim):
chunk_seconds |
Peak VRAM | Speed |
|---|---|---|
null (default) |
1.0 GB | 100 % |
1.0 |
0.5 GB | 95 % |
0.5 |
0.25 GB | 90 % |
0.1 |
50 MB | 70 % |
Full coverage: GPU acceleration — out-of-memory.
Tier 3 — Cap batch concurrency¶
Each joblib worker copies the case data + solver state. More workers ⇒ more RAM.
Tier 4 — Disable nested BLAS threading¶
joblib workers × OpenBLAS multi-threading multiplies resource use. Cap with env vars:
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1
python -m pylectra run examples/batch_case39.yaml
Each sim is single-threaded but with many workers → CPU is fully used and total memory is much lower.
Tier 5 — Chunked persistence¶
Very long batches (10 000+) are risky to run as one job — slice into 100-sample chunks:
from pylectra.run import run
import os
base_seed = 42
chunk_size = 100
total = 10000
for offset in range(0, total, chunk_size):
out_dir = f"./batch_chunks/chunk_{offset:06d}"
if os.path.exists(out_dir):
continue # skip completed
run("examples/batch_case39.yaml",
scenarios={"count": chunk_size, "seed": base_seed + offset},
output={"directory": out_dir})
Then merge metadata with pandas:
import pandas as pd, glob
metas = pd.concat([pd.read_parquet(f) for f in glob.glob("./batch_chunks/*/metadata.parquet")])
metas["sample_id"] = metas.index # re-number to avoid collisions
Tier 6 — Drop intermediate state¶
If your Python script doesn't release SimulationResult objects, N results pile up:
results = [] # ✗ grows without bound
for cfg in configs:
out = run(cfg, plot=False)
results.append(out.result.max_angle_deviation_deg) # keep only the scalar
# Or
del out # explicit del
import gc
gc.collect() # force GC
Tier 7 — Use a smaller case¶
case2000+ is dominated by the time-series themselves — each sample HDF5 is hundreds of MB. If you only need scalar metrics (max angle deviation, CCT), keep just the metadata:
output:
format: hdf5
keep_failed: false
# Custom "metadata-only" mode requires small BatchRunner changes;
# planned for a future release.
Current workaround: run full pipeline, then rm samples/*.h5 after the fact.
Tier 8 — Different case representation¶
For mass small-signal sweeps (skip_integration: true), you only need case + model parameters + equilibrium — no time-series. Per-sample metadata + eigenvalues is just a few KB.
mode: batch
skip_integration: true
small_signal: {kind: finite_difference}
output:
format: npz # simpler than hdf5
metadata: parquet
Monitor memory¶
# Linux / macOS
htop # live; F10 to quit
# Windows
tasklist | findstr python
# Inline in your script
import psutil, os
proc = psutil.Process(os.getpid())
print(f"RSS = {proc.memory_info().rss / 1024**3:.2f} GB")
Troubleshooting¶
"MemoryError" / Killed by OS¶
Diagnosis order:
- Is
n_jobstoo large? Halve it. - How big is the case?
pylectra inforeports total memory vs. case size. - Is
dense_ntoo large (torch)? Reduce to 50. - Are you accumulating results into a Python list?
- Last resort: chunk the run +
del+gc.collect().
Swap usage spikes¶
The OS is paging to disk — performance plummets. Immediately lower n_jobs; otherwise a 10-minute batch turns into 10 hours.
Next steps¶
- GPU acceleration tutorial — full
chunk_secondsreference. - Parallel batch tuning — n_jobs / BLAS threading.
- FAQ — memory-related troubleshooting.