stochastic-rs
Getting started

Installation (Rust)

Add stochastic-rs to your Rust project — umbrella crate or per-sub-crate, with the right Cargo features and CPU / SIMD / GPU options.

Installation (Rust)

Umbrella crate (everything)

[dependencies]
stochastic-rs = "2.0.0"

Then:

use stochastic_rs::prelude::*;
use stochastic_rs::stochastic::diffusion::gbm::Gbm;
use stochastic_rs::quant::pricing::heston::HestonPricer;

The umbrella re-exports everything via pub use from the sub-crates, so existing v1.x import paths keep working.

Per-sub-crate (lean)

For minimal compile time and dependency surface, depend only on the sub-crates you need:

[dependencies]
stochastic-rs-distributions = "2.0.0"   # SIMD distribution sampling
stochastic-rs-stochastic    = "2.0.0"   # 120+ process types
stochastic-rs-copulas       = "2.0.0"   # bivariate / multivariate copulas
stochastic-rs-stats         = "2.0.0"   # estimators
stochastic-rs-quant         = "2.0.0"   # pricing / calibration / vol surface
stochastic-rs-ai            = "2.0.0"   # neural surrogates (candle)
stochastic-rs-viz           = "2.0.0"   # plotly grid plotter

Topology:

stochastic-rs-core (simd_rng)
 └→ stochastic-rs-distributions (FloatExt, SimdFloatExt, distributions)
     ├→ stochastic-rs-stochastic (ProcessExt + 120+ processes)
     ├→ stochastic-rs-copulas
     └→ stochastic-rs-stats
         └→ stochastic-rs-quant (PricerExt, calibration, vol surface)
             ├→ stochastic-rs-ai
             └→ stochastic-rs-viz

Cargo features

FeatureOwner cratePulls inUse when
aiumbrellastochastic-rs-ai, candle-coreNN volatility surrogates
vizumbrellastochastic-rs-viz, plotlyQuick HTML plots
openblasstats, quant, copulas, stochasticndarray-linalg/openblas-systemMLE, multivariate copulas, Cholesky-heavy estimators
openblas-staticsamendarray-linalg/openblas-staticVendored OpenBLAS — needed for the Windows wheel CI
cuda-nativestochasticcudarc, cuFFT, fused PhiloxDirect CUDA backend for FGN / fBM (NVIDIA, CUDA 12.x)
gpustochasticcubecl, gpu-fftPortable GPU kernel framework (CPU + GPU runtime)
gpu-cudastochasticcubecl-cudacubecl over CUDA (NVIDIA)
gpu-wgpustochasticcubecl-wgpucubecl over WebGPU (NVIDIA / AMD / Apple via wgpu)
metalstochasticmetal (Apple framework)Direct Metal backend for FGN / fBM on macOS
acceleratestochasticApple Accelerate (vDSP)macOS-native FFT acceleration (no toolchain install)
mimalloc / jemallocumbrellamimalloc / tikv-jemallocatorDrop-in allocator for long-running MC workloads
pythonumbrella + stochastic-rs-pypyo3, numpyBuilding the Python wheel via maturin

Default build (cargo build) is feature-light and links no GPU, no BLAS, and no Python. Pick features explicitly for the workload at hand.

SIMD support

Numerical hot paths (FGN Davies-Harte, all Simd* distributions, fill_slice / fill_slice_fast) use the wide crate for portable SIMD. Lane widths in this codebase are uniformly 8-lane types:

  • f32x8 — 8 × f32 = 256 bits (AVX2 / NEON-pair)
  • f64x8 — 8 × f64 = 512 bits (AVX-512, or 2 × AVX2 / 2 × NEON fallback)
  • i32x8 — for the integer Box-Muller / ziggurat tables

wide selects the actual SIMD instructions at build time based on the active target features. The default x86-64 toolchain targets only SSE4.2, which means f32x8 / f64x8 compile to scalar loops. To unlock real SIMD, opt into a higher CPU baseline (next subsection).

Target archDefault ISAWhat wide emits without extra flags
x86_64-… (Linux, MSVC)SSE4.2 (v1)Scalar fallback (no AVX)
x86_64-… with +avx2AVX2Full 256-bit SIMD on f32x8
x86_64-… with +avx512fAVX-512Full 512-bit SIMD on f64x8
aarch64-apple-darwinNEON128-bit NEON, two-pump for 256-bit ops

Native CPU optimization

Default builds target the plain x86-64 / aarch64 baseline so the resulting binary or wheel runs on any CPU of the same architecture. For SIMD-heavy paths (SimdNormal::fill_slice_fast, Fgn Davies-Harte, sample_par, …) the gap between v1 and a tuned target is large enough to be worth raising the floor:

# Local dev / benchmarks: every feature the build host supports.
# The resulting binary only runs on this exact CPU family.
RUSTFLAGS="-C target-cpu=native" cargo build --release
RUSTFLAGS="-C target-cpu=native" cargo bench

# Higher x86-64 baselines (binary runs on any CPU meeting the level):
#   v2 — SSE4.2 + POPCNT     (x86_64 CPUs since ~2009)
#   v3 — AVX2 + BMI2 + FMA   (x86_64 CPUs since ~2013–2015)
#   v4 — AVX-512             (most server CPUs only; absent from all
#                             AMD Zen 1–3 and Intel client 12th-gen+)
RUSTFLAGS="-C target-cpu=x86-64-v3" cargo build --release

Public distribution (PyPI wheels, openly shared Docker images): keep the default x86-64 baseline. pip wheel tags don't dispatch by CPU feature level, so a v3 wheel will SIGILL on any pre-2013 hardware (AMD Bulldozer/Piledriver, Sandy/Ivy Bridge, Atom variants). Use v2/v3/v4 only for deployments where you've verified every target host clears the level — typical examples are an internal Docker fleet, a homogeneous HPC cluster, or a CI runner pinned to a known SKU. Use target-cpu=native only for local dev and benchmarks.

RUSTFLAGS busts the build cache. Every distinct value triggers a full workspace rebuild, and the env var fully replaces (does not merge with) [build] rustflags = […] in any .cargo/config.toml. For persistent local optimisation, prefer a [target.<triple>] rustflags entry in ~/.cargo/config.toml so it composes with project configs.

GPU support

FGN and fBM ship four independent GPU / accelerator backends — pick the one that matches your hardware and toolchain.

Direct binding via cudarc + cuFFT

  • a fused Philox RNG kernel. No .cu files, no nvcc required — the kernels ship as Rust strings and JIT through cudarc.

Requires NVIDIA CUDA Toolkit 12.x and a compatible GPU.

cargo build --features cuda-native
cargo bench --features cuda-native --bench fgn_cuda_native
use stochastic_rs::stochastic::noise::fgn::Fgn;

let fgn = Fgn::<f32>::new(/* hurst */ 0.7, 65536, None);
let path  = fgn.sample_cuda_native(1)?;     // single path on GPU
let batch = fgn.sample_cuda_native(1024)?;  // 1024 paths in one launch

gpu / gpu-cuda / gpu-wgpu — cubecl portable kernels

cubecl is a CPU/GPU portable kernel framework. Useful when you want the same kernel to run on CUDA, WebGPU, and a CPU debug runtime.

# CUDA backend (NVIDIA)
cargo build --features gpu-cuda

# WebGPU backend (NVIDIA / AMD / Apple via wgpu — also runs in browsers)
cargo build --features gpu-wgpu
let path = fgn.sample_gpu(1)?;   // routes through whichever cubecl backend is active

metal — direct Metal (macOS)

Direct binding via the metal crate. Targets Apple Silicon (M1/M2/M3/M4) and Intel Macs with discrete / integrated GPUs.

cargo build --features metal
cargo bench --features metal --bench fgn_metal
let path = fgn.sample_metal(1)?;

accelerate — Apple Accelerate (macOS, no GPU)

Routes the FFT through Apple's vDSP (part of the Accelerate framework shipped with macOS — no extra install). Lower latency than the GPU paths for medium-n workloads where launch overhead dominates.

cargo build --features accelerate
cargo bench --features accelerate --bench fgn_accelerate
let path = fgn.sample_accelerate(1)?;

Choosing a backend

BackendBest forLatency floorThroughput ceiling
CPU SIMDSmall n (≤ 4 k), single path≈ 8 µsrayon × cores
accelerateMedium n (4 k–16 k), single path on macOS≈ 30 µsone core
metalLarge n + batches on macOS≈ 80 µsfull GPU
cuda-native / gpu-cudaLarge n (≥ 16 k) + batches≈ 80 µsfull GPU
gpu-wgpuCross-platform, browser / WASM targets≈ 100 µsfull GPU

Concrete numbers and the cross-over point are on the Benchmarks page.

OpenBLAS (required for the openblas feature)

The openblas feature pulls in ndarray-linalg for linear algebra (MLE, multivariate copulas, factor models, cointegration, HMM). It needs a system OpenBLAS with LAPACK.

Linux (Debian / Ubuntu)

sudo apt install libopenblas-dev

Linux (Fedora / RHEL)

sudo dnf install openblas-devel

macOS

brew install openblas
export OPENBLAS_DIR=$(brew --prefix openblas)

Windows

The openblas-src crate does not currently support static linking on the MSVC target. For source builds use vcpkg with a prebuilt LAPACK binary; the Windows wheel CI job uses openblas-static with a vendored binary (the published Windows wheel omits the 15 BLAS-backed classes — see Python bindings for the exact list).

cargo build --features openblas

Verify the install

use stochastic_rs::prelude::*;
use stochastic_rs::stochastic::diffusion::ou::Ou;

fn main() {
    let p = Ou::<f64>::new(2.0, 0.0, 1.0, 1_000, Some(0.0), Some(1.0));
    let path = p.sample();
    println!("OU path of length {}", path.len());
}

Run with:

cargo run --release

If this prints OU path of length 1000, you are good. Continue with the Quickstart.

On this page