API Reference

SMLMBoxer.SMLMBoxerModule
SMLMBoxer

High-performance particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.

API Overview

For a comprehensive overview of the API, use help mode:

?SMLMBoxer.api

Or access the complete API documentation programmatically:

docs = SMLMBoxer.api()
source
SMLMBoxer.BoxerConfigType
BoxerConfig

Configuration for ROI detection via getboxes().

Use either PSF-aware interface (psfsigma + minphotons) or advanced interface (sigmasmall + sigmalarge + minval). PSF-aware is recommended.

Fields

PSF-Aware Interface (Recommended)

  • psf_sigma::Union{Float64,Nothing}: PSF sigma in microns (e.g., 0.13 for 130nm PSF). Requires camera for pixel conversion. When set, overrides sigmasmall/sigmalarge/minval.
  • min_photons::Float64: Minimum photons for detection (default: 500.0)

Advanced Interface (Direct Control)

  • sigma_small::Float64: Small Gaussian sigma in pixels (default: 1.0)
  • sigma_large::Float64: Large Gaussian sigma in pixels (default: 2.0)
  • minval::Float64: DoG intensity threshold (default: 0.0)

Box Parameters

  • boxsize::Int: ROI box size in pixels (default: 7)
  • overlap::Float64: Max overlap between detections in pixels (default: 2.0)

Backend Parameters

  • backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)
  • auto_timeout::Float64: Max wait for GPU in :auto mode before CPU fallback (default: 300.0)
  • gpu_timeout::Float64: Max wait for GPU in :gpu mode (default: Inf)
  • on_wait::Union{Function,Nothing}: Optional callback (elapsed, available, required) -> nothing for GPU wait progress (default: nothing)

Examples

# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)

# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)

# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)
source
SMLMBoxer.BoxesInfoType
BoxesInfo

Metadata returned alongside ROIBatch from getboxes().

Fields

  • backend::Symbol: Compute backend used (:gpu or :cpu)
  • elapsed_s::Float64: Wall time in seconds
  • device_id::Int: GPU device ID (0-based), or -1 for CPU
  • n_rois::Int: Number of ROIs detected
  • batch_size::Int: Frames per batch during processing
  • n_batches::Int: Number of batches processed
  • memory_per_batch::Int: Estimated memory per batch in bytes
source
SMLMBoxer.GetBoxesArgsType
GetBoxesArgs

Internal structure for getboxes parameters. Users should call getboxes() with keyword arguments rather than constructing this directly.

Primary Interface (Recommended)

  • psf_sigma::Real: PSF sigma in microns (physical units, e.g., 0.13 for 130nm PSF) Requires camera to be provided for pixel size conversion.
  • min_photons::Real: Minimum total photons for detection (default: 500.0)

When psf_sigma is provided:

  • Converted to pixels using camera pixel size
  • sigmasmall = 1.0 × psfsigma_pixels (automatically calculated)
  • sigmalarge = 2.0 × psfsigma_pixels (automatically calculated)
  • minval = photonstodogthreshold(minphotons, psfsigmapixels) (automatically calculated)

Advanced Interface (Direct Control)

  • sigma_small::Real: Small Gaussian sigma in pixels (default: 1.0)
  • sigma_large::Real: Large Gaussian sigma in pixels (default: 2.0)
  • minval::Real: DoG intensity threshold (default: 0.0)

Other Parameters

  • imagestack: Input image stack
  • camera: Camera object (IdealCamera or SCMOSCamera)
  • boxsize::Int: ROI box size in pixels (default: 7)
  • overlap::Real: Maximum overlap between detections in pixels (default: 2.0)
  • backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)
  • auto_timeout::Real: Max wait seconds for :auto mode before CPU fallback (default: 300.0)
  • gpu_timeout::Real: Max wait seconds for :gpu mode (default: Inf)
  • on_wait: Optional callback(elapsed, available, required) for wait progress
source
SMLMBoxer._gpu_maxima2coordsMethod
_gpu_maxima2coords(localmaximage::CuArray)

GPU-accelerated coordinate extraction using sparse compaction.

Instead of transferring the entire 4D array to CPU (e.g., 1 GB for 512x512x1x1000), uses GPU findall (prefix-sum compaction) to find nonzero indices on-device, then transfers only the sparse indices and values (~1.2 MB for ~100K maxima).

Arguments

  • localmaximage: 4D CuArray (nrows, ncols, 1, nframes) with nonzero values at maxima

Returns

  • coords: Vector{Matrix{Float32}} — same format as maxima2coords
source
SMLMBoxer._process_with_batchingMethod
_process_with_batching(imagestack, args, kernelsize, max_free_mem; use_gpu, batch_cleanup=nothing)

Process imagestack with memory-aware batching. Handles both single-batch (fits in memory) and multi-batch (too large) cases.

Returns (coords, batch_size, n_batches, memory_per_batch).

Arguments

  • imagestack: 4D image stack (ny, nx, 1, nframes)
  • args: GetBoxesArgs with filter parameters
  • kernelsize: Kernel size for local max detection
  • max_free_mem: Available memory in bytes
  • use_gpu: Whether to use GPU for processing
  • batch_cleanup: Optional function called after each batch (e.g., for GC)
source
SMLMBoxer.apiMethod

SMLMBoxer.jl API Reference

Particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.

Exports

Total exports: 6

  • getboxes - Main detection function
  • BoxerConfig - Configuration struct for detection parameters
  • BoxesInfo - Metadata struct returned alongside ROIBatch
  • recommend_batch_size - Memory-aware batch sizing utility
  • ROIBatch - Re-exported from SMLMData.jl
  • SingleROI - Re-exported from SMLMData.jl

Note: SMLMBoxer.api() is available but not exported to avoid conflicts with other JuliaSMLM packages.

Key Concepts

Difference of Gaussians (DoG) Filtering

Detects blob-like features by subtracting two Gaussian-blurred versions of the image:

  • sigma_small: Matches PSF size for optimal blob detection
  • sigma_large: Suppresses background (typically 2× sigma_small)
  • minval: Intensity threshold after filtering

PSF-Aware Detection (Recommended)

Specify physical PSF parameters (in microns), automatically converted to optimal filter settings:

  • psf_sigma: PSF sigma in microns (e.g., 0.13 for 130nm PSF)
  • min_photons: Total photon threshold (automatically converted to DoG intensity threshold)

Variance-Weighted Filtering (sCMOS)

When SCMOSCamera is provided, implements SMITE-style inverse variance weighting:

  • Each pixel weighted by gaussian_kernel / variance where variance = readnoise²
  • Low-noise pixels: high weight (strong detection influence)
  • High-noise pixels: low weight (reduced false positives)
  • GPU-accelerated via KernelAbstractions.jl (device-agnostic kernels)

GPU Acceleration and Scheduling

  • Standard DoG: NNlib with cuDNN backend (10-100x speedup)
  • Variance-weighted: KernelAbstractions custom kernels (same code for CPU/GPU)
  • Multi-GPU support: NVML-based polling selects GPU with most free memory across all devices

Unified GPU retry loop handles multi-process contention:

  1. Poll all GPUs via NVML (no CUDA context creation) for sufficient free memory and low utilization
  2. Acquire GPU context and run processing
  3. On any failure (no memory, TOCTOU context race, runtime OOM): release memory via GC.gc() + CUDA.reclaim(), re-poll NVML with remaining timeout
  4. On timeout: :auto falls back to CPU, :gpu errors
  5. On success: reclaim GPU memory pool so finished jobs don't block other processes

Backend modes:

  • :cpu - Always CPU, no GPU involvement
  • :gpu - Require GPU, retry until gpu_timeout (default: Inf), error if unavailable
  • :auto - Try GPU, retry until auto_timeout (default: 300s), fall back to CPU

Wait progress callback:

config = BoxerConfig(
    psf_sigma=0.13,
    backend=:auto,
    on_wait=(elapsed, available, required) -> @info "Waiting for GPU" elapsed available required
)

Configuration

BoxerConfig

Configuration struct for ROI detection parameters. Supports @kwdef construction with defaults.

@kwdef struct BoxerConfig
    # PSF-aware interface (recommended)
    psf_sigma::Union{Float64,Nothing} = nothing  # PSF sigma in microns
    min_photons::Float64 = 500.0                 # Minimum photons for detection

    # Advanced interface (direct control)
    sigma_small::Float64 = 1.0    # Small Gaussian sigma in pixels
    sigma_large::Float64 = 2.0    # Large Gaussian sigma in pixels
    minval::Float64 = 0.0         # DoG intensity threshold

    # Box parameters
    boxsize::Int = 7              # ROI size in pixels
    overlap::Float64 = 2.0        # Max overlap between detections

    # Backend parameters
    backend::Symbol = :auto       # :cpu, :gpu, or :auto
    auto_timeout::Float64 = 300.0  # Max wait for GPU in :auto mode
    gpu_timeout::Float64 = Inf    # Max wait in :gpu mode
    on_wait::Union{Function,Nothing} = nothing  # Optional wait progress callback
end

Usage:

# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)

# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)

# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)

Core Function

getboxes - Two Calling Conventions

Config-based (recommended for reusable settings):

getboxes(imagestack, camera, config::BoxerConfig) -> (ROIBatch, BoxesInfo)

Kwargs-based (convenient for one-off calls):

getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)

Both conventions are equivalent - kwargs are forwarded to a BoxerConfig internally.

Main detection function. Applies DoG filtering, finds local maxima, extracts ROI patches.

Arguments:

  • imagestack::AbstractArray{<:Real} - Input image stack (2D or 3D)
  • camera::Union{AbstractCamera,Nothing} - Camera object (IdealCamera or SCMOSCamera)
  • config::BoxerConfig - Configuration struct (config-based convention)

Kwargs (kwargs-based convention):

PSF-Aware Interface (Recommended):

  • psf_sigma::Real - PSF sigma in microns (requires camera for pixel size conversion)
  • min_photons::Real - Minimum total photons for detection (default: 500.0)

Advanced Interface (Direct Control):

  • sigma_small::Real - Small Gaussian sigma in pixels (default: 1.0)
  • sigma_large::Real - Large Gaussian sigma in pixels (default: 2.0)
  • minval::Real - DoG intensity threshold (default: 0.0)

Other Parameters:

  • boxsize::Int - ROI size in pixels (default: 7)
  • overlap::Real - Maximum overlap between detections in pixels (default: 2.0)
  • backend::Symbol - Compute backend: :cpu, :gpu, or :auto (default: :auto)
    • :cpu - Always use CPU
    • :gpu - Require GPU, wait for memory if needed (waits forever by default)
    • :auto - Try GPU with timeout, fall back to CPU if memory unavailable
  • auto_timeout::Real - Max seconds to wait for GPU in :auto mode (default: 300.0)
  • gpu_timeout::Real - Max seconds to wait in :gpu mode (default: Inf)
  • on_wait::Function - Optional callback (elapsed, available, required) -> nothing for wait progress

Returns: Tuple of (ROIBatch, BoxesInfo)

ROIBatch with fields:

  • data - ROI stack (boxsize × boxsize × n_rois)
  • x_corners - Vector of x (column) corner positions
  • y_corners - Vector of y (row) corner positions
  • frame_indices - Vector of frame indices for each ROI
  • camera - Camera object (provided or default IdealCamera)
  • roi_size - Size of each ROI (square)

BoxesInfo with fields:

  • backend - Compute backend used (:gpu or :cpu)
  • elapsed_s - Wall time in seconds
  • device_id - GPU device ID (0-based), or -1 for CPU
  • n_rois - Number of ROIs detected
  • batch_size - Frames per batch during processing
  • n_batches - Number of batches processed
  • memory_per_batch - Estimated memory per batch in bytes

recommend_batch_size(height, width; backend=:auto, memory_fraction=0.8) -> Int

Returns the recommended maximum number of frames to load at once given memory constraints.

Use this when processing very large datasets to determine optimal chunk size before calling getboxes().

Arguments:

  • height::Int - Image height in pixels
  • width::Int - Image width in pixels
  • backend::Symbol - Compute backend: :cpu, :gpu, or :auto (default: :auto)
  • memory_fraction::Real - Fraction of free memory to use (default: 0.8)

Returns: Maximum recommended number of frames to load at once

Memory Model: The processing pipeline requires approximately 6× the raw image size:

  • Input imagestack
  • Filtered stack (DoG output)
  • Local maxima detection intermediates
  • Coordinate arrays
  • Box extraction workspace
  • Broadcast temporaries

Example:

using SMLMBoxer

# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")

# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
    chunk_end = min(chunk_start + max_frames - 1, total_frames)
    imagestack = load_frames(chunk_start:chunk_end)
    (roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)
    # ... process results
end

Re-Exported Types

ROIBatch (from SMLMData.jl)

Container for multiple ROIs from particle detection. Supports iteration and indexing.

# Iteration
for roi in roi_batch
    # roi is a SingleROI with .data, .corner, .frame_idx, .camera
    process(roi.data)
end

# Indexing
roi = roi_batch[1]  # Returns SingleROI

SingleROI (from SMLMData.jl)

Individual ROI with image data, position, frame index, and camera calibration.

Common Workflows

PSF-Aware Detection (Recommended)

using SMLMBoxer, SMLMData

# Create camera with physical pixel size (100nm pixels)
camera = IdealCamera(1:256, 1:256, 0.1f0)

# Config-based (recommended for reusable settings)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
(roi_batch, info) = getboxes(imagestack, camera, config)

# OR kwargs-based (convenient for one-off calls)
(roi_batch, info) = getboxes(imagestack, camera;
    psf_sigma = 0.13,        # 130nm PSF in microns
    min_photons = 500.0,     # Minimum 500 photons
    boxsize = 11)

# Access results
n_detections = length(roi_batch.x_corners)
boxes = roi_batch.data              # 11×11×n ROI patches
positions_x = roi_batch.x_corners   # Column positions
positions_y = roi_batch.y_corners   # Row positions
frames = roi_batch.frame_indices

# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")

sCMOS Variance-Weighted Detection

using SMLMData

# Create sCMOS camera with readnoise map (Float32 for type consistency)
readnoise_map = Float32.(load_readnoise_calibration("camera_calib.mat"))
camera = SCMOSCamera(256, 256, 0.1f0, readnoise_map)

# Variance-weighted detection (automatically enabled)
(roi_batch, info) = getboxes(imagestack, camera;
    psf_sigma = 0.13,
    min_photons = 300.0,  # Lower threshold possible with noise weighting
    backend = :auto)      # GPU with CPU fallback

Advanced: Direct Parameter Control

# Expert mode: bypass PSF-aware interface

# Config-based
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0, boxsize=9, overlap=1.5)
(roi_batch, info) = getboxes(imagestack, nothing, config)

# OR kwargs-based
(roi_batch, info) = getboxes(imagestack;
    sigma_small = 1.5,   # Custom filter sigma (pixels)
    sigma_large = 3.0,
    minval = 10.0,       # Direct intensity threshold
    boxsize = 9,
    overlap = 1.5)

Processing Individual ROIs

(roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)

# Iterate over ROIs
for roi in roi_batch
    # SingleROI fields:
    # - roi.data: Image patch
    # - roi.corner: (x, y) corner position
    # - roi.frame_idx: Frame index
    # - roi.camera: Camera ROI calibration

    fit_gaussian(roi.data)
end

# Direct indexing
first_roi = roi_batch[1]

Internal Functions (Not Exported)

These functions are documented but not part of the public API. Use at your own risk as they may change.

Filtering:

  • dog_filter(imagestack, args) - Apply DoG filter (routes to standard or variance-weighted)
  • dog_filter_variance_weighted(imagestack, σ_small, σ_large, args) - Variance-weighted DoG
  • convolve(imagestack, kernel; use_gpu) - Standard convolution via NNlib
  • convolve_variance_weighted(imagestack, variance_map, σ, use_gpu) - Variance-weighted via KA
  • gaussian_2d(sigma, kernelsize) - Create 2D Gaussian kernel
  • dog_kernel(sigma_small, sigma_large) - Create DoG kernel

Local Maxima Detection:

  • findlocalmax(imagestack, kernelsize; minval, use_gpu) - Find local maximum coordinates
  • genlocalmaximage(imagestack, kernelsize; minval, use_gpu) - Generate local max image

Coordinate Processing:

  • maxima2coords(imagestack) - Convert non-zero pixels to coordinates
  • removeoverlap(coords, args) - Remove overlapping detections

Box Extraction:

  • getboxstack(imagestack, coords, args) - Extract ROI patches at coordinates
  • fillbox!(box, imagestack, row, col, im, boxsize) - Fill single ROI patch

Helper Functions:

  • get_pixel_size(camera) - Extract pixel size from camera
  • photons_to_dog_threshold(min_photons, psf_sigma) - Convert photon threshold to DoG threshold
  • pixels_to_microns(pixel_coords, camera) - Coordinate conversion
  • extract_camera_roi(camera, row_range, col_range) - Extract camera calibration for ROI
  • get_variance_map(camera, imagesize) - Compute variance map from camera
  • reshape_for_flux(arr) - Reshape array for NNlib convolution

Algorithm Details

Detection Pipeline

  1. Filtering: Apply DoG filter (standard or variance-weighted based on camera type)
  2. Local Maxima: Find peaks above threshold using max pooling
  3. Overlap Removal: Eliminate overlapping detections (keep higher intensity)
  4. Box Extraction: Cut out ROI patches around each detection
  5. ROIBatch Construction: Package results with camera calibration

PSF-Aware Parameter Conversion

When psf_sigma (microns) is provided:

# Convert to pixels
psf_sigma_pixels = psf_sigma / pixel_size

# Automatic filter sizing
sigma_small = 1.0 × psf_sigma_pixels  # Match PSF
sigma_large = 2.0 × psf_sigma_pixels  # Background suppression

# Photon threshold → DoG intensity threshold
σ_eff = √(psf_sigma² + sigma_small²)  # Effective sigma after filtering
peak_filtered = min_photons / (2π × σ_eff²)
minval = 0.65 × peak_filtered  # DoG reduction factor

Variance-Weighted Convolution

For sCMOS cameras with readnoise map:

# At each pixel (i,j):
weightsum = sum(gaussian_weight / variance[ii,jj] * input[ii,jj])
varsum = sum(gaussian_weight / variance[ii,jj])
output[i,j] = weightsum / varsum

Implements optimal inverse variance weighting for spatially-varying noise.

Performance Notes

  • GPU Memory Management: Automatically batches frames if image stack exceeds GPU memory
  • Type Stability: All inputs converted to Float32 at entry point
  • Multi-GPU NVML Polling: Scans all GPUs via NVML without creating CUDA contexts. Checks free memory, process contention, and compute utilization. First GPU with sufficient memory and low contention wins.
  • Contention-Safe Retry: Unified retry loop handles all GPU failure modes (insufficient memory, TOCTOU context race, runtime OOM). Releases memory and re-polls with remaining timeout budget.
  • Memory Pool Reclaim: Calls GC.gc() + CUDA.reclaim() after both successful and failed GPU processing to return memory to the system, preventing finished jobs from blocking other processes.
  • Jittered Backoff: NVML polling uses jittered sleep intervals to avoid thundering herd when multiple processes compete for GPUs.
  • Backend Abstraction: KernelAbstractions enables same code for CPU/GPU variance weighting
  • Typical Speedup: 10-100x with GPU depending on image size and number of frames

api() returns this documentation as a plain String.

source
SMLMBoxer.convolveMethod
convolve(imagestack, kernel; use_gpu=false)

Convolve imagestack with given kernel using NNlib.

Arguments

  • imagestack: Input array of image data (H, W, 1, F)
  • kernel: Kernel to convolve with (K, K)

Keyword Arguments

  • use_gpu: Whether to use GPU

Returns

  • filtered_stack: Convolved image stack
source
SMLMBoxer.convolve_variance_weightedMethod
convolve_variance_weighted(imagestack, variance_map, sigma, use_gpu)

Apply variance-weighted Gaussian convolution using KernelAbstractions. Device-agnostic: works on CPU and GPU with same code.

Follows same pattern as convolve(): stays on GPU if use_gpu=true, letting interface.jl handle memory batching for both paths uniformly.

Arguments

  • imagestack: Input image (rows, cols, 1, frames) - CPU or GPU array
  • variance_map: Variance at each pixel (rows, cols)
  • sigma: Gaussian sigma
  • use_gpu: Use GPU if available

Returns

  • Variance-weighted filtered image (CuArray if use_gpu, Array otherwise)
source
SMLMBoxer.dog_filterMethod
dog_filter(imagestack, args)

Apply DoG filter to imagestack based on args. Uses variance-weighted filtering if sCMOS camera is provided.

Arguments

  • imagestack: Input array of image data
  • args: Arguments with sigma values and camera

Returns

  • filtered_stack: Filtered image stack
source
SMLMBoxer.dog_filter_variance_weightedMethod
dog_filter_variance_weighted(imagestack, sigma_small, sigma_large, args)

Apply variance-weighted DoG filter using sCMOS variance map. Implements SMITE-style inverse variance weighting during convolution.

Arguments

  • imagestack: Input image data (rows, cols, 1, frames)
  • sigma_small: Sigma for small Gaussian
  • sigma_large: Sigma for large Gaussian
  • args: GetBoxesArgs with camera

Returns

  • filtered_stack: Variance-weighted filtered image
source
SMLMBoxer.dog_kernelMethod

dog_kernel(s1, s2)

Compute difference of Gaussian kernels.

Arguments

  • sigma_small: Sigma for small Gaussian
  • sigma_large: Sigma for large Gaussian

Returns

  • dog: Difference of Gaussians kernel
source
SMLMBoxer.estimate_gpu_memoryMethod
estimate_gpu_memory(imagestack, camera) -> Int

Estimate GPU memory required for processing imagestack.

Arguments

  • imagestack: Input image array (H, W, ..., F)
  • camera: Camera object (affects memory multiplier)

Returns

  • Estimated bytes needed for GPU processing

Memory Model

Standard DoG path: 6x input size

  • Input, filteredsmall, filteredlarge, DoG result, localmax temps, GC margin

Variance-weighted (SCMOSCamera): 8x input size

  • Additional workspace for per-pixel variance weighting (in-place DoG saves one copy)
source
SMLMBoxer.estimate_gpu_memory_per_frameMethod
estimate_gpu_memory_per_frame(height, width, camera) -> Int

Estimate GPU memory required per frame.

Arguments

  • height: Image height in pixels
  • width: Image width in pixels
  • camera: Camera object (affects memory multiplier)

Returns

  • Estimated bytes needed per frame for GPU processing
source
SMLMBoxer.extract_camera_roiMethod
extract_camera_roi(camera::AbstractCamera, row_range, col_range)

Extract a camera ROI with calibration data for the specified pixel region.

Arguments

  • camera: Source camera object
  • row_range: Range of rows to extract
  • col_range: Range of columns to extract

Returns

  • Camera object of the same type with ROI calibration data
source
SMLMBoxer.fillbox!Method

fillbox!(box, imagestack, row, col, im, boxsize)

Fill a box with a crop from the imagestack.

Arguments

  • box: Array to fill with box crop
  • imagestack: Input image stack
  • row, col, im: Coords for crop
  • boxsize: Size of box

Returns

  • boxcoords: Upper Left corners of boxes N x (row, col, im)
source
SMLMBoxer.find_best_gpuMethod
find_best_gpu() -> Int

Find the GPU with the most free memory and switch to it.

Returns the device index (0-based). On single-GPU systems, returns 0 immediately.

Uses NVML to query free memory on each device without creating CUDA contexts, avoiding cuDevicePrimaryCtxRetain OOM errors under multi-process contention. Only calls CUDA.device!() once on the selected device.

Example

best = find_best_gpu()  # Switches to best GPU
# Now all CUDA operations use that GPU
source
SMLMBoxer.findlocalmaxMethod

findlocalmax(imagestack, kernelsize; minval=0.0, use_gpu=false)

Find the coordinates of local maxima in an image.

Arguments

  • imagestack: An array of real numbers representing the image data.
  • kernelsize: The size of the kernel used to identify local maxima.

Keyword Arguments

  • minval: The minimum value a local maximum must have to be considered valid (default: 0.0).
  • use_gpu: Whether or not to use GPU acceleration (default: false).

Returns

  • coords: The coordinates of the local maxima in the image.
source
SMLMBoxer.gaussian_2dMethod

gaussian_2d(sigma, ksize)

Create a 2D Gaussian kernel.

Arguments

  • sigma: Standard deviation
  • kernelsize: Kernel size

Returns

  • kernel: Normalized 2D Gaussian kernel
source
SMLMBoxer.genlocalmaximageMethod

genlocalmaximage(imagestack, kernelsize; minval=0.0, use_gpu=false)

Generate an image highlighting the local maxima using NNlib max pooling.

Arguments

  • imagestack: An array of real numbers representing the image data (H, W, 1, F).
  • kernelsize: The size of the kernel used to identify local maxima.

Keyword Arguments

  • minval: The minimum value a local maximum must have to be considered valid (default: 0.0).
  • use_gpu: Whether or not to use GPU acceleration (default: false).

Returns

  • localmaximage: An image with local maxima highlighted.
source
SMLMBoxer.get_effective_gainMethod
get_effective_gain(camera::AbstractCamera)

Get effective gain for converting photons to image ADU units (ADU/photon).

For IdealCamera: returns 1.0 (assumes image is in photon units) For SCMOSCamera: returns QE / gain (photons → ADU conversion factor)

  • SMLMData defines gain as e⁻/ADU (electrons per ADU)
  • Physical conversion: photon → QE electrons → electrons/gain ADU
  • So: ADU/photon = QE / gain

Used to convert photon-based thresholds to image-unit thresholds.

source
SMLMBoxer.get_pixel_sizeMethod
get_pixel_size(camera::AbstractCamera)

Extract pixel size from camera pixel edges (in microns). Assumes approximately square pixels - returns x-direction pixel size.

For non-square pixels, pixelsizex and pixelsizey may differ slightly. This function returns pixelsizex for simplicity.

Arguments

  • camera: Camera object (IdealCamera or SCMOSCamera)

Returns

  • Pixel size in microns (x-direction)
source
SMLMBoxer.get_variance_mapMethod
get_variance_map(camera::AbstractCamera, imagesize)

Compute variance map from camera calibration.

Arguments

  • camera: Camera object with noise calibration
  • imagesize: Tuple of (nrows, ncols) for the image

Returns

  • Variance map (variance = readnoise²) matching image dimensions
source
SMLMBoxer.getboxesMethod
getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)

Detect particles/blobs in a multidimensional image stack and return ROI batch with location tracking and processing metadata.

Arguments

  • imagestack::AbstractArray{<:Real}: The input image stack. Should be 2D or 3D.
  • camera::Union{AbstractCamera,Nothing}: Optional camera object (IdealCamera or SCMOSCamera) from SMLMData. If not provided, a default IdealCamera is created.

Primary Interface (Recommended - PSF-Aware)

  • psf_sigma::Real: PSF sigma in microns (physical units, e.g., 0.13 for 130nm PSF). Automatically converted to pixels using camera pixel size and sets optimal DoG filter parameters. Requires camera to be provided.
  • min_photons::Real: Minimum total photons for detection (default: 500.0). Automatically converted to appropriate intensity threshold.

Advanced Interface (Direct Control)

For expert users who want direct control over filter parameters:

  • sigma_small::Real: Small Gaussian sigma in pixels (default: 1.0).
  • sigma_large::Real: Large Gaussian sigma in pixels (default: 2.0).
  • minval::Real: DoG filter intensity threshold (default: 0.0).

Note: If psf_sigma is provided, it overrides sigmasmall/sigmalarge/minval.

Other Parameters

  • boxsize::Int: Size of the box to cut out around each local maximum in pixels (default: 7).
  • overlap::Real: Maximum overlap allowed between boxes in pixels (default: 2.0).
  • backend::Symbol: Compute backend - :cpu, :gpu, or :auto (default: :auto).
    • :cpu - Always use CPU
    • :gpu - Require GPU, wait for memory if needed (waits forever by default)
    • :auto - Try GPU with timeout, fall back to CPU if memory unavailable
  • auto_timeout::Real: Max seconds to wait for GPU memory in :auto mode (default: 300.0).
  • gpu_timeout::Real: Max seconds to wait for GPU memory in :gpu mode (default: Inf).
  • on_wait::Function: Optional callback (elapsed, available, required) -> nothing for wait progress.

Returns

Tuple of (ROIBatch, BoxesInfo):

ROIBatch with the following fields:

  • data: ROI stack (boxsize × boxsize × n_rois) containing image patches
  • x_corners: Vector of x (column) corner positions in camera coordinates
  • y_corners: Vector of y (row) corner positions in camera coordinates
  • frame_indices: Vector of frame indices for each ROI
  • camera: Camera object (provided or default IdealCamera)
  • roi_size: Size of each ROI (square)

BoxesInfo with the following fields:

  • backend: Compute backend used (:gpu or :cpu)
  • elapsed_s: Wall time in seconds
  • device_id: GPU device ID (0-based), or -1 for CPU
  • n_rois: Number of ROIs detected
  • batch_size: Frames per batch during processing
  • n_batches: Number of batches processed
  • memory_per_batch: Estimated memory per batch in bytes

Details on filtering

The image stack is convolved with a difference of Gaussians (DoG) filter to identify blobs and local maxima. The DoG is computed from two Gaussian kernels with standard deviations sigma_small and sigma_large.

When using the PSF-aware interface with psf_sigma (in microns):

  • psf_sigma is converted to pixels using camera pixel size
  • sigmasmall = 1.0 × psfsigma_pixels (matches PSF for optimal blob detection)
  • sigmalarge = 2.0 × psfsigma_pixels (background suppression)
  • minval is automatically calculated from min_photons accounting for PSF spreading and DoG response

Variance-Weighted Filtering (sCMOS)

When an SCMOSCamera is provided, the package uses variance-weighted filtering based on the SMITE algorithm. Each pixel's contribution to the convolution is weighted by:

weight = gaussian_kernel / variance

where variance = readnoise². This implements optimal inverse variance weighting:

  • Low-noise pixels receive high weight (strong influence on detection)
  • High-noise pixels receive low weight (reduced influence, avoiding false positives)

This significantly improves detection sensitivity in sCMOS data with spatially-varying noise.

GPU Acceleration: Variance-weighted filtering uses KernelAbstractions.jl for device-agnostic computation. The same kernel code runs on both CPU and GPU, automatically selected based on backend. This provides GPU acceleration for sCMOS cameras (10-100x speedup on large images).

Standard Filtering (IdealCamera or no camera)

Standard DoG convolution is used when no camera is provided or with IdealCamera. The convolution is performed via NNlib (using cuDNN on GPU) or CPU, depending on backend.

After filtering, local maxima above minval are identified. Boxes are cut out around each maximum, excluding overlaps.

Examples

# Recommended: PSF-aware detection with physical units
camera = IdealCamera(1:256, 1:256, 0.1f0)  # 256×256 pixels, 100nm pixel size

(roi_batch, info) = getboxes(imagestack, camera;
    psf_sigma = 0.13,              # PSF sigma in microns (physical units)
    min_photons = 500.0,           # Detect emitters with ≥500 photons
    boxsize = 11)

# Access results
boxes = roi_batch.data             # (11 × 11 × n_rois)
x_corners = roi_batch.x_corners    # x (col) positions
y_corners = roi_batch.y_corners    # y (row) positions
frames = roi_batch.frame_indices

# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")

# Advanced: Direct control over filter parameters
(roi_batch, info) = getboxes(imagestack;
    sigma_small = 1.5,  # Custom small Gaussian sigma
    sigma_large = 3.0,  # Custom large Gaussian sigma
    minval = 10.0)      # Custom intensity threshold

# Iterate over ROIs
for roi in roi_batch
    # roi is a SingleROI with .data, .corner, .frame_idx
    process(roi.data)
end
source
SMLMBoxer.getboxstackMethod

getboxstack(imagestack, coords, args::GetBoxesArgs)

Cut out box regions from imagestack centered on coords.

Arguments

  • imagestack: Input image stack
  • coords: Coords of box centers
  • args: Parameters

Returns

  • boxstack: Array with box crops from imagestack
  • boxcoords: Upper left corners of boxes
  • camera_rois: Camera ROIs for each box (if camera provided)
source
SMLMBoxer.maxima2coordsMethod

maxima2coords(imagestack)

Get coordinates of all non-zero pixels in input stack

Arguments

  • imagestack: Input image stack

Returns

  • coords: List of coords for each frame (always Float32)
source
SMLMBoxer.photons_to_dog_thresholdMethod
photons_to_dog_threshold(min_photons, psf_sigma; effective_gain=1.0)

Convert total photon count threshold to DoG filter intensity threshold in image units (ADU).

Arguments

  • min_photons: Minimum signal photons above background for detection
  • psf_sigma: PSF sigma in pixels
  • effective_gain: Camera gain factor (QE × gain) to convert photons → ADU (default: 1.0)

Returns

  • minval: DoG filter intensity threshold in image units (ADU)

Physics

For a 2D Gaussian PSF with total photons N and sigma σpsf, the peak intensity is: Ipeak = N / (2π σ_psf²) [photons/pixel]

After convolution with the small Gaussian filter (sigmasmall = 1.0 × psfsigma), the effective sigma becomes: σeff = √(σpsf² + sigma_small²)

The peak after filtering is: Ifiltered = N / (2π σeff²) [photons/pixel]

The DoG response (small - large Gaussian) has a lower peak than the small Gaussian alone. For sigmalarge = 2×sigmasmall, the DoG peak is approximately 0.65× the small Gaussian peak.

For raw camera data in ADU, the threshold is scaled by effective_gain = QE × gain.

source
SMLMBoxer.pixels_to_micronsMethod
pixels_to_microns(pixel_coords, camera::AbstractCamera)

Convert pixel coordinates (row, col) to micron coordinates (x, y) using camera geometry.

Arguments

  • pixel_coords: N×2 matrix of (row, col) coordinates
  • camera: Camera object with pixeledgesx and pixeledgesy

Returns

  • N×2 matrix of (x, y) coordinates in microns
source
SMLMBoxer.poll_gpu_nvmlMethod
poll_gpu_nvml(required_bytes; timeout=30.0, poll=0.5, on_wait=nothing) -> (Bool, Int)

Poll ALL GPUs via NVML until one has sufficient free memory and low contention. No CUDA context creation needed - safe under multi-process contention.

Arguments

  • required_bytes: Minimum bytes needed (1.5x safety margin applied internally)
  • timeout: Maximum seconds to poll (default 30.0, use Inf for unlimited)
  • poll: Seconds between checks (default 0.5)
  • on_wait: Optional callback(elapsed, best_available, required) called each poll

Returns

  • (true, device_id) if a GPU became available, (false, -1) if timeout reached

Contention Detection

A GPU is considered contended when other processes are present AND either:

  • Free memory is insufficient (< required × 1.5)
  • Compute utilization exceeds 90%

When other processes are present but memory is sufficient and utilization is low, the GPU is still considered available.

source
SMLMBoxer.recommend_batch_sizeMethod
recommend_batch_size(height, width; backend=:auto, memory_fraction=0.8) -> Int

Return recommended maximum number of frames to load at once given memory constraints.

This helps users decide how much data to load before calling getboxes(). For very large datasets, loading data in chunks of this size ensures efficient processing without running out of memory.

Arguments

  • height::Int: Image height in pixels
  • width::Int: Image width in pixels
  • backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)
  • memory_fraction::Real: Fraction of free memory to use (default: 0.8)

Returns

  • Maximum recommended number of frames to load at once

Memory Model

The processing pipeline requires approximately 6× the raw image size:

  • Input imagestack
  • Filtered stack (DoG output)
  • Local maxima detection intermediates
  • Coordinate arrays
  • Box extraction workspace
  • Broadcast temporaries

Example

using SMLMBoxer

# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")

# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
    chunk_end = min(chunk_start + max_frames - 1, total_frames)
    imagestack = load_frames(chunk_start:chunk_end)
    roi_batch = getboxes(imagestack, camera; psf_sigma=0.13)
    # ... process results
end
source
SMLMBoxer.removeoverlapMethod

removeoverlap(coords, args)

Remove overlapping coords based on distance.

Arguments

  • coords: List of coords
  • args: Parameters

Returns

  • coords: Coords with overlaps removed
source
SMLMBoxer.reshape_for_fluxMethod

reshapeforflux(arr::AbstractArray)

Reshape array to have singleton dims for NNlib convolution.

Arguments

  • arr: Input array, must be 2D or 3D

Returns

  • Reshaped array with added singleton dimensions
source
SMLMBoxer.select_backendMethod
select_backend(backend::Symbol, required_bytes;
               auto_timeout=300.0, gpu_timeout=Inf, on_wait=nothing) -> (Symbol, Int)

Select compute backend with two-layer GPU contention handling.

Layer 1 (NVML polling): Scans all GPUs via NVML without creating CUDA contexts. Polls through timeout with jittered backoff. First GPU with sufficient free memory and low contention wins. Safe under multi-process contention.

Layer 2 (runtime try/catch): Applied by caller for :auto mode. If CUDA errors occur during processing despite NVML pre-check, falls back to CPU.

Arguments

  • backend: :cpu, :gpu, or :auto
  • required_bytes: Estimated GPU memory needed for processing
  • auto_timeout: Max wait for :auto mode before CPU fallback (default 300.0)
  • gpu_timeout: Max wait for :gpu mode (default Inf - wait forever)
  • on_wait: Optional callback(elapsed, available, required)

Returns

  • (backend::Symbol, device_id::Int) - selected backend and GPU device (0-based, -1 for CPU)

Behavior

  • :cpu - Returns (:cpu, -1) immediately
  • :gpu - NVML poll for device, then CUDA waitforgpu_memory. Errors if unavailable/timeout
  • :auto - NVML poll for device with timeout, falls back to (:cpu, -1) with warning
source
SMLMBoxer.variance_weighted_gaussian_kernel!Method
variance_weighted_gaussian_kernel!(output, input, variance, sigma, winsize)

KernelAbstractions kernel for variance-weighted Gaussian convolution. Implements SMITE-style inverse variance weighting.

This follows the same KernelAbstractions pattern used in GaussMLE.jl (kernel-abstract branch) for seamless CPU/GPU execution and consistent API across JuliaSMLM packages.

Arguments

  • output: Output array (nrows, ncols)
  • input: Input array (nrows, ncols)
  • variance: Variance map (nrows, ncols)
  • sigma: Gaussian sigma
  • winsize: Window size (pixels)

Note

Same kernel code runs on CPU (via CPU() backend) or GPU (via CUDABackend()). Backend is selected automatically based on use_gpu parameter.

source
SMLMBoxer.variance_weighted_gaussian_kernel_batched!Method
variance_weighted_gaussian_kernel_batched!(output, input, variance, sigma, winsize, nrows, ncols)

Batched KernelAbstractions kernel for variance-weighted Gaussian convolution. Processes all frames in a single kernel launch via 3D ndrange=(nrows, ncols, nframes), eliminating per-frame launch overhead.

Arguments

  • output: Output array (nrows, ncols, 1, nframes)
  • input: Input array (nrows, ncols, 1, nframes)
  • variance: Variance map (nrows, ncols)
  • sigma: Gaussian sigma
  • winsize: Window size (pixels)
  • nrows: Number of rows (passed explicitly for bounds checking)
  • ncols: Number of columns
source
SMLMBoxer.wait_for_gpu_memoryMethod
wait_for_gpu_memory(required_bytes; timeout=30.0, poll=0.5, on_wait=nothing) -> Bool

Wait until current GPU device has sufficient available memory. Uses CUDA calls (requires active context). Used by :gpu mode after device selection.

Arguments

  • required_bytes: Minimum bytes needed (with safety margin applied internally)
  • timeout: Maximum seconds to wait (default 30.0, use Inf for unlimited)
  • poll: Seconds between checks (default 0.5)
  • on_wait: Optional callback(elapsed, available, required) called each poll

Returns

  • true if memory became available, false if timeout reached
source