Benchmark a pipeline against ground truth#
The point of a forward simulation is that you know the answer. This recipe runs your analysis pipeline on a simulated movie and scores what it recovered against the ground truth, using the recovery metrics.
1. Simulate, then run your pipeline#
from minisim import simulate
rec = simulate(spec)
movie = rec.observed # (frame, height, width) sensor counts
gt = rec.ground_truth
# Your analysis pipeline (minian, CaImAn, suite2p, ...) returns:
# A_est: (n_units, height, width) spatial footprints
# C_est: (n_units, frame) calcium traces
# S_est: (n_units, frame) deconvolved spikes
A_est, C_est, S_est = run_my_pipeline(movie)
2. Match estimated cells to true cells#
Recovery is only meaningful once you know which estimated cell corresponds to
which true cell. hungarian_match() solves the optimal
one-to-one assignment by spatial overlap (IoU). Match against A_observed (what
is actually recoverable through the optics), not A_planted (the optics-free
ideal).
from minisim import hungarian_match
match = hungarian_match(A_est, gt.A_observed)
match.recall() # fraction of true cells recovered (IoU >= 0.5)
match.precision() # fraction of estimated cells that are real
match.mean_iou # mean spatial overlap over matched pairs
Recall should be scored against the detectable cells, not every planted cell:
a cell too deep or too dim to appear in the movie is not a fair miss. Use
detectable_subset() for the honest denominator:
det = gt.detectable_subset()
match = hungarian_match(A_est, det.A_observed)
print(f"recall over detectable cells: {match.recall():.2f}")
3. Score the recovered traces and spikes#
match.pairing is the list of (est_idx, true_idx) pairs; feed it straight to
the temporal metrics.
import numpy as np
from minisim import trace_pearson, spike_precision_recall
r = trace_pearson(C_est, gt.C, match.pairing) # one Pearson r per matched pair
print(f"median trace correlation: {float(np.nanmedian(r)):.2f}")
spikes = spike_precision_recall(S_est, gt.S, match.pairing, tol_frames=2)
print(f"spike precision={spikes.precision:.2f}, recall={spikes.recall:.2f}")
4. Score motion recovery (optional)#
If your pipeline estimates a rigid-motion trajectory and the spec has a
BrainMotion step, compare against gt.shifts with
shift_rmse():
from minisim import shift_rmse
rmse_px = shift_rmse(shifts_est, gt.shifts) # per-frame (dy, dx), in pixels
Scaling up#
To trace a metric across a physical axis (recall vs depth, vs NA, vs density), drive this same scoring from a parameter sweep and collect the results into a DataFrame.