stochastic-rs
Concepts

Seeding & RNG

The uniform `new(args, &seed)` constructor pattern, the `SeedExt` strategies (`Unseeded` / `Deterministic`), in-place reseeding, and dual-stream RNG.

Seeding & RNG

Every stochastic source in stochastic-rs — both processes (ProcessExt) and distributions (DistributionExt) — uses the same seed-handling pattern: a single canonical constructor new(args, &seed) that takes a SeedExt strategy. There is no with_seed(u64) or from_seed_source(&seed) — those collapsed into new in 2.2.0.

The two seed strategies

use stochastic_rs::simd_rng::Deterministic;
use stochastic_rs::simd_rng::Unseeded;
StrategyBehaviourWhen to use
UnseededAuto-seeded from a globally-unique sequence (see contention).Default production sampling, no need for reproducibility.
Deterministic(u64)Reproducible stream rooted at the seed.Unit tests, regression baselines, kalibration sweeps.
use stochastic_rs::stochastic::diffusion::gbm::Gbm;

// auto-seeded — each constructed RNG draws a fresh, globally-unique seed
let gbm_a = Gbm::<f64, _>::new(0.05, 0.2, 1_000, Some(100.0), Some(1.0), Unseeded);

// reproducible — same `seed` parameter ⇒ same path
let gbm_b = Gbm::<f64, _>::new(0.05, 0.2, 1_000, Some(100.0), Some(1.0), Deterministic::new(42));

Auto-seeding and contention

Unseeded draws each RNG's seed from one global golden-ratio sequence (base, base+γ, base+2γ, …, each mixed through SplitMix64). To keep that draw off a shared atomic on every construction, each thread reserves a block of 2182^{18} consecutive steps and walks its block locally:

global atomic ──reserve 2^18·γ──► thread A's block ──► seed, seed+γ, …  (no atomic)
              ──reserve 2^18·γ──► thread B's block ──► disjoint range

The block ranges are disjoint and γ is odd, so every seed is still a distinct multiple of γ — the same global-uniqueness contract as a per-construction atomic, but with one atomic per 2182^{18} seeds instead of one per seed. Before this scheme the shared counter was hit once per RNG construction; under parallel Monte-Carlo that cache line ping-ponged across cores and serialised the workers. Removing it is the main reason short-path parallel sampling got faster (see ProcessExt → performance). The generator itself, the SplitMix64 seed expansion, and the whole Deterministic path are unchanged — Deterministic never touches this counter, so seeded streams are bit-identical to before.

SeedExt

SeedExt is the trait that lets a process or distribution accept any seed strategy uniformly:

pub trait SeedExt: Clone + Send + Sync + 'static {
    fn rng(&self) -> SimdRng;                        // single-stream RNG
    fn derive(&self) -> Self;                        // child seed for sub-components
    fn rng_ext<R: SimdRngExt>(&self) -> R;           // generic backing RNG (see below)
    fn reseed(&self, seed: u64) {}                   // optional in-place reset
}

Unseeded and Deterministic are the two stock implementations. Custom seed sources are possible (e.g. an atomically-shared counter for test-suite reproducibility) but the two built-ins cover almost every use case.

Reseeding in place

SeedExt::reseed(seed) lets you swap a Deterministic source's internal state without rebuilding the process. The typical use case is a kalibration loop that sweeps many seeds for the same parameterised process:

let fbm = Fbm::<f64, _>::new(0.7, 1_000, Some(1.0), Deterministic::new(0));

// Replay 100 reproducible paths, one per seed, with zero re-allocation
for s in 1..=100u64 {
    fbm.seed.reseed(s);
    let path = fbm.sample();
    // ... feed path into your kalibration objective ...
}
Sourcereseed(s) behaviour
UnseededSilently no-op (auto-seeded streams have nothing to assign).
DeterministicAtomically state.store(s) — next stream-advance starts from s.

The same Deterministic::new(s) would then produce a bit-for-bit identical stream — this is the invariant that lets calibration replay specific paths.

Reproducibility invariants

The smoke test examples/seed_smoke_test.rs validates these end-to-end:

  1. Unseeded × 2 — two instances produce divergent paths.
  2. Deterministic(seed) × 2 — bit-for-bit identical paths.
  3. Different seeds — divergent paths.
  4. Same instance, repeated sample() — each call advances internal state, so successive paths differ.
  5. reseed(s) + sample — reproduces what Deterministic::new(s) would produce.
  6. Unseeded.reseed(s) — no-op, no panic, sample still works.

These are serial invariants. The sampler-level behaviour — distinct paths under reuse, no seed collision across the 2182^{18} block boundary, and parallel independence for sample_par / sample_map — is validated separately in tests/sampler_v3_rng.rs. Parallel sampling is statistically independent but not bit-reproducible across runs (worker scheduling decides which derived seed lands on which path); use a serial sample() loop when you need bit-exact replay. See ProcessExt → parallel sampling and determinism.

SimdRngExt — generic backing RNG

SimdRngExt is the trait that lets a distribution be generic over its backing SIMD RNG. The default is the single-stream SimdRng, the experimental dual-stream variant SimdRngDual is reachable through the dual-stream-rng cargo feature.

pub trait SimdRngExt: Sized + Send + 'static {
    const HAS_PAIR_ILP: bool = false;

    fn new() -> Self;
    fn from_seed(seed: u64) -> Self;
    fn next_i32x8(&mut self) -> i32x8;
    fn next_i32x8_pair(&mut self) -> (i32x8, i32x8) { /* default = two single calls */ }
    fn next_i32(&mut self) -> i32;
    fn next_f64(&mut self) -> f64;
    fn next_f32(&mut self) -> f32;
    fn fill_uniform_f64(&mut self, out: &mut [f64]);
    fn fill_uniform_f32(&mut self, out: &mut [f32]);
}

SimdNormal<T, N, R> and SimdExp<T, R> carry the R: SimdRngExt parameter explicitly so the same struct serves both backends:

// default (single-stream) — no opt-in needed
let n = SimdNormal::<f64>::new(0.0, 1.0, &Deterministic::new(42));

// dual-stream via the type alias (feature `dual-stream-rng`)
let n_dual = SimdNormalDual::<f64>::new(0.0, 1.0, &Deterministic::new(42));

dual-stream-rng feature

Opt-in via cargo feature:

[dependencies]
stochastic-rs = { version = "2.2", features = ["dual-stream-rng"] }

Unlocks SimdRngDual, SimdNormalDual, SimdExpDual, SimdExpZigDual.

Why dual-stream — the idea

Xoshiro256++ (and its 32-bit sibling) is a non-cryptographic, fast, high-quality PRNG whose next-state function is a fixed recurrence s_{k+1} = f(s_k). Generating a SIMD batch is straightforward (four parallel 64-bit lanes from one ymm state), but generating two consecutive batches from the same engine is a hard serial dependency: the second batch's state update can only start after the first one's result is committed. On a modern out-of-order CPU this dependency chain limits how many RNG ops you can issue per cycle, even though the execution units sit idle.

The Ziggurat sampler in SimdNormal / SimdExp makes the bottleneck worse: every 8-lane batch needs 16 scalar table lookups (kn[iz], wn[iz]) before the SIMD multiply / FMA / store can fire. Those loads are also serial against the RNG output that produced iz. So one batch's pipeline looks like:

  xoshiro step ──► iz = hz & 127 ──► 16 scalar loads ──► SIMD math ──► store

       └─ next xoshiro step can NOT begin until this state is written

The dual-stream engine sidesteps this by carrying two independent state pairs (engine_a, engine_b), each with its own xoshiro recurrence. The Ziggurat fast-path then issues one batch from a and another from b per iteration, and the OoO core is free to interleave them:

  engine_a step ──► iz_a ──► 16 loads for a ──► SIMD math a ──► store a
  engine_b step ──► iz_b ──► 16 loads for b ──► SIMD math b ──► store b
       ↑                                                       ↑
       └───── these two streams have no data dependency ───────┘

Result on Apple Silicon (M-series, criterion cargo bench --bench dual_stream_compare --features dual-stream-rng, 2026-06-11):

nsingle (SimdNormal)dual (SimdNormalDual)Δ
64120.6 ns117.5 ns−2.6%
256494.2 ns467.9 ns−5.3%
4 0967.87 µs7.42 µs−5.7%
65 536126.2 µs122.4 µs−3.0%
1 048 5762.02 ms1.97 ms−2.2%

The gain is modest on 128-bit NEON because the scalar table-gather chain dominates the batch cost; wider-SIMD hardware is expected to benefit more. Exponential fills measure at parity. Uniform fills are not engine-bound (the per-chunk body is just a direct SIMD store, no table lookups, no rejection sampling), so they show no speedup beyond noise — the OoO core already saturates its store ports without help.

Trade-offs

  • Deterministic streams diverge. SimdRngDual::from_seed(s) does not reproduce SimdRng::from_seed(s)'s bit-exact output — it seeds two independent engines through SplitMix64 with different offsets, so the lane interleaving is fundamentally different. Statistical properties are identical (KS-test validated for Normal / Exp / LogNormal). A pinned-seed regression test that asserts specific numbers will fail after switching backends.
  • Doubled RNG state. Two Xoshiro256++ engines (256 bits each) + two Xoshiro128++ engines (128 bits each) = roughly 768 bits of state per SimdRngDual, vs ~384 bits for SimdRng. Negligible unless you allocate millions of instances.
  • Architecture-dependent gain. Measured on Apple Silicon. On CPUs with smaller reorder buffers (older x86, embedded ARM) the win is likely smaller or zero.
  • Non-Ziggurat distributions are unaffected. The Cauchy / Pareto / Beta / Gamma / NIG paths bottleneck on the wide SIMD transcendentals (ln, exp, cos, sin, powf), not on the RNG — switching backends does not help them.

The feature is default-off. Production code continues to use the single-stream SimdRng; opt in only if your workload is Ziggurat-dominated (Normal, Exp, LogNormal) and you can tolerate the deterministic-stream change.

End-to-end example

use stochastic_rs::prelude::*;
use stochastic_rs::simd_rng::Deterministic;
use stochastic_rs::simd_rng::Unseeded;
use stochastic_rs::stochastic::process::fbm::Fbm;

// 1. Auto-seeded production sampling
let fbm = Fbm::<f64, _>::new(0.7, 1_000, Some(1.0), Unseeded);
let path = fbm.sample();

// 2. Reproducible kalibration sweep — same instance, different seeds
let fbm = Fbm::<f64, _>::new(0.7, 1_000, Some(1.0), Deterministic::new(0));
let paths: Vec<_> = (1..=100u64)
    .map(|s| {
        fbm.seed.reseed(s);
        fbm.sample()
    })
    .collect();

// 3. Reproducible replay — Deterministic with the same seed always
// produces the same path, even on different process instances:
let a = Fbm::<f64, _>::new(0.7, 1_000, Some(1.0), Deterministic::new(42)).sample();
let b = Fbm::<f64, _>::new(0.7, 1_000, Some(1.0), Deterministic::new(42)).sample();
assert_eq!(a.as_slice(), b.as_slice());

See also: ProcessExt for the trait itself, and the smoke test for an executable specification.

On this page