API Reference

SMLMBoxer.SMLMBoxer
SMLMBoxer.BoxerConfig
SMLMBoxer.BoxesInfo
SMLMBoxer.GetBoxesArgs
SMLMBoxer._getboxes_impl
SMLMBoxer._gpu_maxima2coords
SMLMBoxer._process_with_batching
SMLMBoxer.api
SMLMBoxer.convolve
SMLMBoxer.convolve_variance_weighted
SMLMBoxer.dog_filter
SMLMBoxer.dog_filter_variance_weighted
SMLMBoxer.dog_kernel
SMLMBoxer.estimate_gpu_memory
SMLMBoxer.estimate_gpu_memory_per_frame
SMLMBoxer.extract_camera_roi
SMLMBoxer.fillbox!
SMLMBoxer.find_best_gpu
SMLMBoxer.findlocalmax
SMLMBoxer.gaussian_2d
SMLMBoxer.genlocalmaximage
SMLMBoxer.get_effective_gain
SMLMBoxer.get_pixel_size
SMLMBoxer.get_variance_map
SMLMBoxer.getboxes
SMLMBoxer.getboxstack
SMLMBoxer.has_cuda
SMLMBoxer.maxima2coords
SMLMBoxer.photons_to_dog_threshold
SMLMBoxer.pixels_to_microns
SMLMBoxer.poll_gpu_nvml
SMLMBoxer.recommend_batch_size
SMLMBoxer.removeoverlap
SMLMBoxer.reshape_for_flux
SMLMBoxer.select_backend
SMLMBoxer.variance_weighted_gaussian_kernel!
SMLMBoxer.variance_weighted_gaussian_kernel_batched!
SMLMBoxer.wait_for_gpu_memory

SMLMBoxer.SMLMBoxer — Module

SMLMBoxer

High-performance particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.

API Overview

For a comprehensive overview of the API, use help mode:

?SMLMBoxer.api

Or access the complete API documentation programmatically:

docs = SMLMBoxer.api()

source

SMLMBoxer.BoxerConfig — Type

BoxerConfig

Configuration for ROI detection via getboxes().

Use either PSF-aware interface (psfsigma + minphotons) or advanced interface (sigmasmall + sigmalarge + minval). PSF-aware is recommended.

Fields

PSF-Aware Interface (Recommended)

psf_sigma::Union{Float64,Nothing}: PSF sigma in microns (e.g., 0.13 for 130nm PSF). Requires camera for pixel conversion. When set, overrides sigmasmall/sigmalarge/minval.
min_photons::Float64: Minimum photons for detection (default: 500.0)

Advanced Interface (Direct Control)

sigma_small::Float64: Small Gaussian sigma in pixels (default: 1.0)
sigma_large::Float64: Large Gaussian sigma in pixels (default: 2.0)
minval::Float64: DoG intensity threshold (default: 0.0)

Box Parameters

boxsize::Int: ROI box size in pixels (default: 7)
overlap::Float64: Max overlap between detections in pixels (default: 2.0)

Backend Parameters

backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)
auto_timeout::Float64: Max wait for GPU in :auto mode before CPU fallback (default: 300.0)
gpu_timeout::Float64: Max wait for GPU in :gpu mode (default: Inf)
on_wait::Union{Function,Nothing}: Optional callback (elapsed, available, required) -> nothing for GPU wait progress (default: nothing)

Examples

# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)

# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)

# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)

source

SMLMBoxer.BoxesInfo — Type

BoxesInfo

Metadata returned alongside ROIBatch from getboxes().

Fields

backend::Symbol: Compute backend used (:gpu or :cpu)
elapsed_s::Float64: Wall time in seconds
device_id::Int: GPU device ID (0-based), or -1 for CPU
n_rois::Int: Number of ROIs detected
batch_size::Int: Frames per batch during processing
n_batches::Int: Number of batches processed
memory_per_batch::Int: Estimated memory per batch in bytes

source

SMLMBoxer.GetBoxesArgs — Type

GetBoxesArgs

Internal structure for getboxes parameters. Users should call getboxes() with keyword arguments rather than constructing this directly.

Primary Interface (Recommended)

psf_sigma::Real: PSF sigma in microns (physical units, e.g., 0.13 for 130nm PSF) Requires camera to be provided for pixel size conversion.
min_photons::Real: Minimum total photons for detection (default: 500.0)

When psf_sigma is provided:

Converted to pixels using camera pixel size
sigmasmall = 1.0 × psfsigma_pixels (automatically calculated)
sigmalarge = 2.0 × psfsigma_pixels (automatically calculated)
minval = photonstodogthreshold(minphotons, psfsigmapixels) (automatically calculated)

Advanced Interface (Direct Control)

sigma_small::Real: Small Gaussian sigma in pixels (default: 1.0)
sigma_large::Real: Large Gaussian sigma in pixels (default: 2.0)
minval::Real: DoG intensity threshold (default: 0.0)

Other Parameters

imagestack: Input image stack
camera: Camera object (IdealCamera or SCMOSCamera)
boxsize::Int: ROI box size in pixels (default: 7)
overlap::Real: Maximum overlap between detections in pixels (default: 2.0)
backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)
auto_timeout::Real: Max wait seconds for :auto mode before CPU fallback (default: 300.0)
gpu_timeout::Real: Max wait seconds for :gpu mode (default: Inf)
on_wait: Optional callback(elapsed, available, required) for wait progress

source

SMLMBoxer._getboxes_impl — Method

_getboxes_impl(args::GetBoxesArgs)

Internal implementation of getboxes that does the actual work.

source

SMLMBoxer._gpu_maxima2coords — Method

_gpu_maxima2coords(localmaximage::CuArray)

GPU-accelerated coordinate extraction using sparse compaction.

Instead of transferring the entire 4D array to CPU (e.g., 1 GB for 512x512x1x1000), uses GPU findall (prefix-sum compaction) to find nonzero indices on-device, then transfers only the sparse indices and values (~1.2 MB for ~100K maxima).

Arguments

localmaximage: 4D CuArray (nrows, ncols, 1, nframes) with nonzero values at maxima

Returns

coords: Vector{Matrix{Float32}} — same format as maxima2coords

source

SMLMBoxer._process_with_batching — Method

_process_with_batching(imagestack, args, kernelsize, max_free_mem; use_gpu, batch_cleanup=nothing)

Process imagestack with memory-aware batching. Handles both single-batch (fits in memory) and multi-batch (too large) cases.

Returns (coords, batch_size, n_batches, memory_per_batch).

Arguments

imagestack: 4D image stack (ny, nx, 1, nframes)
args: GetBoxesArgs with filter parameters
kernelsize: Kernel size for local max detection
max_free_mem: Available memory in bytes
use_gpu: Whether to use GPU for processing
batch_cleanup: Optional function called after each batch (e.g., for GC)

source

SMLMBoxer.api — Method

SMLMBoxer.jl API Reference

Particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.

Exports

Total exports: 6

getboxes - Main detection function
BoxerConfig - Configuration struct for detection parameters
BoxesInfo - Metadata struct returned alongside ROIBatch
recommend_batch_size - Memory-aware batch sizing utility
ROIBatch - Re-exported from SMLMData.jl
SingleROI - Re-exported from SMLMData.jl

Note: SMLMBoxer.api() is available but not exported to avoid conflicts with other JuliaSMLM packages.

Key Concepts

Difference of Gaussians (DoG) Filtering

Detects blob-like features by subtracting two Gaussian-blurred versions of the image:

sigma_small: Matches PSF size for optimal blob detection
sigma_large: Suppresses background (typically 2× sigma_small)
minval: Intensity threshold after filtering

PSF-Aware Detection (Recommended)

Specify physical PSF parameters (in microns), automatically converted to optimal filter settings:

psf_sigma: PSF sigma in microns (e.g., 0.13 for 130nm PSF)
min_photons: Total photon threshold (automatically converted to DoG intensity threshold)

Variance-Weighted Filtering (sCMOS)

When SCMOSCamera is provided, implements SMITE-style inverse variance weighting:

Each pixel weighted by gaussian_kernel / variance where variance = readnoise²
Low-noise pixels: high weight (strong detection influence)
High-noise pixels: low weight (reduced false positives)
GPU-accelerated via KernelAbstractions.jl (device-agnostic kernels)

GPU Acceleration and Scheduling

Standard DoG: NNlib with cuDNN backend (10-100x speedup)
Variance-weighted: KernelAbstractions custom kernels (same code for CPU/GPU)
Multi-GPU support: NVML-based polling selects GPU with most free memory across all devices

Unified GPU retry loop handles multi-process contention:

Poll all GPUs via NVML (no CUDA context creation) for sufficient free memory and low utilization
Acquire GPU context and run processing
On any failure (no memory, TOCTOU context race, runtime OOM): release memory via GC.gc() + CUDA.reclaim(), re-poll NVML with remaining timeout
On timeout: :auto falls back to CPU, :gpu errors
On success: reclaim GPU memory pool so finished jobs don't block other processes

Backend modes:

:cpu - Always CPU, no GPU involvement
:gpu - Require GPU, retry until gpu_timeout (default: Inf), error if unavailable
:auto - Try GPU, retry until auto_timeout (default: 300s), fall back to CPU

Wait progress callback:

config = BoxerConfig(
    psf_sigma=0.13,
    backend=:auto,
    on_wait=(elapsed, available, required) -> @info "Waiting for GPU" elapsed available required
)

Configuration

BoxerConfig

Configuration struct for ROI detection parameters. Supports @kwdef construction with defaults.

@kwdef struct BoxerConfig
    # PSF-aware interface (recommended)
    psf_sigma::Union{Float64,Nothing} = nothing  # PSF sigma in microns
    min_photons::Float64 = 500.0                 # Minimum photons for detection

    # Advanced interface (direct control)
    sigma_small::Float64 = 1.0    # Small Gaussian sigma in pixels
    sigma_large::Float64 = 2.0    # Large Gaussian sigma in pixels
    minval::Float64 = 0.0         # DoG intensity threshold

    # Box parameters
    boxsize::Int = 7              # ROI size in pixels
    overlap::Float64 = 2.0        # Max overlap between detections

    # Backend parameters
    backend::Symbol = :auto       # :cpu, :gpu, or :auto
    auto_timeout::Float64 = 300.0  # Max wait for GPU in :auto mode
    gpu_timeout::Float64 = Inf    # Max wait in :gpu mode
    on_wait::Union{Function,Nothing} = nothing  # Optional wait progress callback
end

Usage:

# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)

# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)

# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)

Core Function

getboxes - Two Calling Conventions

Config-based (recommended for reusable settings):

getboxes(imagestack, camera, config::BoxerConfig) -> (ROIBatch, BoxesInfo)

Kwargs-based (convenient for one-off calls):

getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)

Both conventions are equivalent - kwargs are forwarded to a BoxerConfig internally.

Main detection function. Applies DoG filtering, finds local maxima, extracts ROI patches.

Arguments:

imagestack::AbstractArray{<:Real} - Input image stack (2D or 3D)
camera::Union{AbstractCamera,Nothing} - Camera object (IdealCamera or SCMOSCamera)
config::BoxerConfig - Configuration struct (config-based convention)

Kwargs (kwargs-based convention):

PSF-Aware Interface (Recommended):

psf_sigma::Real - PSF sigma in microns (requires camera for pixel size conversion)
min_photons::Real - Minimum total photons for detection (default: 500.0)

Advanced Interface (Direct Control):

sigma_small::Real - Small Gaussian sigma in pixels (default: 1.0)
sigma_large::Real - Large Gaussian sigma in pixels (default: 2.0)
minval::Real - DoG intensity threshold (default: 0.0)

Other Parameters:

boxsize::Int - ROI size in pixels (default: 7)
overlap::Real - Maximum overlap between detections in pixels (default: 2.0)
backend::Symbol - Compute backend: :cpu, :gpu, or :auto (default: :auto)
- :cpu - Always use CPU
- :gpu - Require GPU, wait for memory if needed (waits forever by default)
- :auto - Try GPU with timeout, fall back to CPU if memory unavailable
auto_timeout::Real - Max seconds to wait for GPU in :auto mode (default: 300.0)
gpu_timeout::Real - Max seconds to wait in :gpu mode (default: Inf)
on_wait::Function - Optional callback (elapsed, available, required) -> nothing for wait progress

Returns: Tuple of (ROIBatch, BoxesInfo)

ROIBatch with fields:

data - ROI stack (boxsize × boxsize × n_rois)
x_corners - Vector of x (column) corner positions
y_corners - Vector of y (row) corner positions
frame_indices - Vector of frame indices for each ROI
camera - Camera object (provided or default IdealCamera)
roi_size - Size of each ROI (square)

BoxesInfo with fields:

backend - Compute backend used (:gpu or :cpu)
elapsed_s - Wall time in seconds
device_id - GPU device ID (0-based), or -1 for CPU
n_rois - Number of ROIs detected
batch_size - Frames per batch during processing
n_batches - Number of batches processed
memory_per_batch - Estimated memory per batch in bytes

recommend_batch_size(height, width; backend=:auto, memory_fraction=0.8) -> Int

Returns the recommended maximum number of frames to load at once given memory constraints.

Use this when processing very large datasets to determine optimal chunk size before calling getboxes().

Arguments:

height::Int - Image height in pixels
width::Int - Image width in pixels
backend::Symbol - Compute backend: :cpu, :gpu, or :auto (default: :auto)
memory_fraction::Real - Fraction of free memory to use (default: 0.8)

Returns: Maximum recommended number of frames to load at once

Memory Model: The processing pipeline requires approximately 6× the raw image size:

Input imagestack
Filtered stack (DoG output)
Local maxima detection intermediates
Coordinate arrays
Box extraction workspace
Broadcast temporaries

Example:

using SMLMBoxer

# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")

# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
    chunk_end = min(chunk_start + max_frames - 1, total_frames)
    imagestack = load_frames(chunk_start:chunk_end)
    (roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)
    # ... process results
end

Re-Exported Types

ROIBatch (from SMLMData.jl)

Container for multiple ROIs from particle detection. Supports iteration and indexing.

# Iteration
for roi in roi_batch
    # roi is a SingleROI with .data, .corner, .frame_idx, .camera
    process(roi.data)
end

# Indexing
roi = roi_batch[1]  # Returns SingleROI

SingleROI (from SMLMData.jl)

Individual ROI with image data, position, frame index, and camera calibration.

Common Workflows

PSF-Aware Detection (Recommended)

using SMLMBoxer, SMLMData

# Create camera with physical pixel size (100nm pixels)
camera = IdealCamera(1:256, 1:256, 0.1f0)

# Config-based (recommended for reusable settings)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
(roi_batch, info) = getboxes(imagestack, camera, config)

# OR kwargs-based (convenient for one-off calls)
(roi_batch, info) = getboxes(imagestack, camera;
    psf_sigma = 0.13,        # 130nm PSF in microns
    min_photons = 500.0,     # Minimum 500 photons
    boxsize = 11)

# Access results
n_detections = length(roi_batch.x_corners)
boxes = roi_batch.data              # 11×11×n ROI patches
positions_x = roi_batch.x_corners   # Column positions
positions_y = roi_batch.y_corners   # Row positions
frames = roi_batch.frame_indices

# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")

sCMOS Variance-Weighted Detection

using SMLMData

# Create sCMOS camera with readnoise map (Float32 for type consistency)
readnoise_map = Float32.(load_readnoise_calibration("camera_calib.mat"))
camera = SCMOSCamera(256, 256, 0.1f0, readnoise_map)

# Variance-weighted detection (automatically enabled)
(roi_batch, info) = getboxes(imagestack, camera;
    psf_sigma = 0.13,
    min_photons = 300.0,  # Lower threshold possible with noise weighting
    backend = :auto)      # GPU with CPU fallback

Advanced: Direct Parameter Control

# Expert mode: bypass PSF-aware interface

# Config-based
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0, boxsize=9, overlap=1.5)
(roi_batch, info) = getboxes(imagestack, nothing, config)

# OR kwargs-based
(roi_batch, info) = getboxes(imagestack;
    sigma_small = 1.5,   # Custom filter sigma (pixels)
    sigma_large = 3.0,
    minval = 10.0,       # Direct intensity threshold
    boxsize = 9,
    overlap = 1.5)

Processing Individual ROIs

(roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)

# Iterate over ROIs
for roi in roi_batch
    # SingleROI fields:
    # - roi.data: Image patch
    # - roi.corner: (x, y) corner position
    # - roi.frame_idx: Frame index
    # - roi.camera: Camera ROI calibration

    fit_gaussian(roi.data)
end

# Direct indexing
first_roi = roi_batch[1]

Internal Functions (Not Exported)

These functions are documented but not part of the public API. Use at your own risk as they may change.

Filtering:

dog_filter(imagestack, args) - Apply DoG filter (routes to standard or variance-weighted)
dog_filter_variance_weighted(imagestack, σ_small, σ_large, args) - Variance-weighted DoG
convolve(imagestack, kernel; use_gpu) - Standard convolution via NNlib
convolve_variance_weighted(imagestack, variance_map, σ, use_gpu) - Variance-weighted via KA
gaussian_2d(sigma, kernelsize) - Create 2D Gaussian kernel
dog_kernel(sigma_small, sigma_large) - Create DoG kernel

Local Maxima Detection:

findlocalmax(imagestack, kernelsize; minval, use_gpu) - Find local maximum coordinates
genlocalmaximage(imagestack, kernelsize; minval, use_gpu) - Generate local max image

Coordinate Processing:

maxima2coords(imagestack) - Convert non-zero pixels to coordinates
removeoverlap(coords, args) - Remove overlapping detections

Box Extraction:

getboxstack(imagestack, coords, args) - Extract ROI patches at coordinates
fillbox!(box, imagestack, row, col, im, boxsize) - Fill single ROI patch

Helper Functions:

get_pixel_size(camera) - Extract pixel size from camera
photons_to_dog_threshold(min_photons, psf_sigma) - Convert photon threshold to DoG threshold
pixels_to_microns(pixel_coords, camera) - Coordinate conversion
extract_camera_roi(camera, row_range, col_range) - Extract camera calibration for ROI
get_variance_map(camera, imagesize) - Compute variance map from camera
reshape_for_flux(arr) - Reshape array for NNlib convolution

Algorithm Details

Detection Pipeline

Filtering: Apply DoG filter (standard or variance-weighted based on camera type)
Local Maxima: Find peaks above threshold using max pooling
Overlap Removal: Eliminate overlapping detections (keep higher intensity)
Box Extraction: Cut out ROI patches around each detection
ROIBatch Construction: Package results with camera calibration

PSF-Aware Parameter Conversion

When psf_sigma (microns) is provided:

# Convert to pixels
psf_sigma_pixels = psf_sigma / pixel_size

# Automatic filter sizing
sigma_small = 1.0 × psf_sigma_pixels  # Match PSF
sigma_large = 2.0 × psf_sigma_pixels  # Background suppression

# Photon threshold → DoG intensity threshold
σ_eff = √(psf_sigma² + sigma_small²)  # Effective sigma after filtering
peak_filtered = min_photons / (2π × σ_eff²)
minval = 0.65 × peak_filtered  # DoG reduction factor

Variance-Weighted Convolution

For sCMOS cameras with readnoise map:

# At each pixel (i,j):
weightsum = sum(gaussian_weight / variance[ii,jj] * input[ii,jj])
varsum = sum(gaussian_weight / variance[ii,jj])
output[i,j] = weightsum / varsum

Implements optimal inverse variance weighting for spatially-varying noise.

Performance Notes

GPU Memory Management: Automatically batches frames if image stack exceeds GPU memory
Type Stability: All inputs converted to Float32 at entry point
Multi-GPU NVML Polling: Scans all GPUs via NVML without creating CUDA contexts. Checks free memory, process contention, and compute utilization. First GPU with sufficient memory and low contention wins.
Contention-Safe Retry: Unified retry loop handles all GPU failure modes (insufficient memory, TOCTOU context race, runtime OOM). Releases memory and re-polls with remaining timeout budget.
Memory Pool Reclaim: Calls GC.gc() + CUDA.reclaim() after both successful and failed GPU processing to return memory to the system, preventing finished jobs from blocking other processes.
Jittered Backoff: NVML polling uses jittered sleep intervals to avoid thundering herd when multiple processes compete for GPUs.
Backend Abstraction: KernelAbstractions enables same code for CPU/GPU variance weighting
Typical Speedup: 10-100x with GPU depending on image size and number of frames

api() returns this documentation as a plain String.

source

SMLMBoxer.convolve — Method

convolve(imagestack, kernel; use_gpu=false)

Convolve imagestack with given kernel using NNlib.

Arguments

imagestack: Input array of image data (H, W, 1, F)
kernel: Kernel to convolve with (K, K)

Keyword Arguments

use_gpu: Whether to use GPU

Returns

filtered_stack: Convolved image stack

source

SMLMBoxer.convolve_variance_weighted — Method

convolve_variance_weighted(imagestack, variance_map, sigma, use_gpu)

Apply variance-weighted Gaussian convolution using KernelAbstractions. Device-agnostic: works on CPU and GPU with same code.

Follows same pattern as convolve(): stays on GPU if use_gpu=true, letting interface.jl handle memory batching for both paths uniformly.

Arguments

imagestack: Input image (rows, cols, 1, frames) - CPU or GPU array
variance_map: Variance at each pixel (rows, cols)
sigma: Gaussian sigma
use_gpu: Use GPU if available

Returns

Variance-weighted filtered image (CuArray if use_gpu, Array otherwise)

source

SMLMBoxer.dog_filter — Method

dog_filter(imagestack, args)

Apply DoG filter to imagestack based on args. Uses variance-weighted filtering if sCMOS camera is provided.

Arguments

imagestack: Input array of image data
args: Arguments with sigma values and camera

Returns

filtered_stack: Filtered image stack

source

SMLMBoxer.dog_filter_variance_weighted — Method

dog_filter_variance_weighted(imagestack, sigma_small, sigma_large, args)

Apply variance-weighted DoG filter using sCMOS variance map. Implements SMITE-style inverse variance weighting during convolution.

Arguments

imagestack: Input image data (rows, cols, 1, frames)
sigma_small: Sigma for small Gaussian
sigma_large: Sigma for large Gaussian
args: GetBoxesArgs with camera

Returns

filtered_stack: Variance-weighted filtered image

source

SMLMBoxer.dog_kernel — Method

dog_kernel(s1, s2)

Compute difference of Gaussian kernels.

Arguments

sigma_small: Sigma for small Gaussian
sigma_large: Sigma for large Gaussian

Returns

dog: Difference of Gaussians kernel

source

SMLMBoxer.estimate_gpu_memory — Method

estimate_gpu_memory(imagestack, camera) -> Int

Estimate GPU memory required for processing imagestack.

Arguments

imagestack: Input image array (H, W, ..., F)
camera: Camera object (affects memory multiplier)

Returns

Estimated bytes needed for GPU processing

Memory Model

Standard DoG path: 6x input size

Input, filteredsmall, filteredlarge, DoG result, localmax temps, GC margin

Variance-weighted (SCMOSCamera): 8x input size

Additional workspace for per-pixel variance weighting (in-place DoG saves one copy)

source

SMLMBoxer.estimate_gpu_memory_per_frame — Method

estimate_gpu_memory_per_frame(height, width, camera) -> Int

Estimate GPU memory required per frame.

Arguments

height: Image height in pixels
width: Image width in pixels
camera: Camera object (affects memory multiplier)

Returns

Estimated bytes needed per frame for GPU processing

source

SMLMBoxer.extract_camera_roi — Method

extract_camera_roi(camera::AbstractCamera, row_range, col_range)

Extract a camera ROI with calibration data for the specified pixel region.

Arguments

camera: Source camera object
row_range: Range of rows to extract
col_range: Range of columns to extract

Returns

Camera object of the same type with ROI calibration data

source

SMLMBoxer.fillbox! — Method

fillbox!(box, imagestack, row, col, im, boxsize)

Fill a box with a crop from the imagestack.

Arguments

box: Array to fill with box crop
imagestack: Input image stack
row, col, im: Coords for crop
boxsize: Size of box

Returns

boxcoords: Upper Left corners of boxes N x (row, col, im)

source

SMLMBoxer.find_best_gpu — Method

find_best_gpu() -> Int

Find the GPU with the most free memory and switch to it.

Returns the device index (0-based). On single-GPU systems, returns 0 immediately.

Uses NVML to query free memory on each device without creating CUDA contexts, avoiding cuDevicePrimaryCtxRetain OOM errors under multi-process contention. Only calls CUDA.device!() once on the selected device.

Example

best = find_best_gpu()  # Switches to best GPU
# Now all CUDA operations use that GPU

source

SMLMBoxer.findlocalmax — Method

findlocalmax(imagestack, kernelsize; minval=0.0, use_gpu=false)

Find the coordinates of local maxima in an image.

Arguments

imagestack: An array of real numbers representing the image data.
kernelsize: The size of the kernel used to identify local maxima.

Keyword Arguments

minval: The minimum value a local maximum must have to be considered valid (default: 0.0).
use_gpu: Whether or not to use GPU acceleration (default: false).

Returns

coords: The coordinates of the local maxima in the image.

source

SMLMBoxer.gaussian_2d — Method

gaussian_2d(sigma, ksize)

Create a 2D Gaussian kernel.

Arguments

sigma: Standard deviation
kernelsize: Kernel size

Returns

kernel: Normalized 2D Gaussian kernel

source

SMLMBoxer.genlocalmaximage — Method

genlocalmaximage(imagestack, kernelsize; minval=0.0, use_gpu=false)

Generate an image highlighting the local maxima using NNlib max pooling.

Arguments

imagestack: An array of real numbers representing the image data (H, W, 1, F).
kernelsize: The size of the kernel used to identify local maxima.

Keyword Arguments

minval: The minimum value a local maximum must have to be considered valid (default: 0.0).
use_gpu: Whether or not to use GPU acceleration (default: false).

Returns

localmaximage: An image with local maxima highlighted.

source

SMLMBoxer.get_effective_gain — Method

get_effective_gain(camera::AbstractCamera)

Get effective gain for converting photons to image ADU units (ADU/photon).

For IdealCamera: returns 1.0 (assumes image is in photon units) For SCMOSCamera: returns QE / gain (photons → ADU conversion factor)

SMLMData defines gain as e⁻/ADU (electrons per ADU)
Physical conversion: photon → QE electrons → electrons/gain ADU
So: ADU/photon = QE / gain

Used to convert photon-based thresholds to image-unit thresholds.

source

SMLMBoxer.get_pixel_size — Method

get_pixel_size(camera::AbstractCamera)

Extract pixel size from camera pixel edges (in microns). Assumes approximately square pixels - returns x-direction pixel size.

For non-square pixels, pixelsizex and pixelsizey may differ slightly. This function returns pixelsizex for simplicity.

Arguments

camera: Camera object (IdealCamera or SCMOSCamera)

Returns

Pixel size in microns (x-direction)

source

SMLMBoxer.get_variance_map — Method

get_variance_map(camera::AbstractCamera, imagesize)

Compute variance map from camera calibration.

Arguments

camera: Camera object with noise calibration
imagesize: Tuple of (nrows, ncols) for the image

Returns

Variance map (variance = readnoise²) matching image dimensions

source

SMLMBoxer.getboxes — Method

getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)

Detect particles/blobs in a multidimensional image stack and return ROI batch with location tracking and processing metadata.

Arguments

imagestack::AbstractArray{<:Real}: The input image stack. Should be 2D or 3D.
camera::Union{AbstractCamera,Nothing}: Optional camera object (IdealCamera or SCMOSCamera) from SMLMData. If not provided, a default IdealCamera is created.

Primary Interface (Recommended - PSF-Aware)

psf_sigma::Real: PSF sigma in microns (physical units, e.g., 0.13 for 130nm PSF). Automatically converted to pixels using camera pixel size and sets optimal DoG filter parameters. Requires camera to be provided.
min_photons::Real: Minimum total photons for detection (default: 500.0). Automatically converted to appropriate intensity threshold.

Advanced Interface (Direct Control)

For expert users who want direct control over filter parameters:

sigma_small::Real: Small Gaussian sigma in pixels (default: 1.0).
sigma_large::Real: Large Gaussian sigma in pixels (default: 2.0).
minval::Real: DoG filter intensity threshold (default: 0.0).

Note: If psf_sigma is provided, it overrides sigmasmall/sigmalarge/minval.

Other Parameters

boxsize::Int: Size of the box to cut out around each local maximum in pixels (default: 7).
overlap::Real: Maximum overlap allowed between boxes in pixels (default: 2.0).
backend::Symbol: Compute backend - :cpu, :gpu, or :auto (default: :auto).
- :cpu - Always use CPU
- :gpu - Require GPU, wait for memory if needed (waits forever by default)
- :auto - Try GPU with timeout, fall back to CPU if memory unavailable
auto_timeout::Real: Max seconds to wait for GPU memory in :auto mode (default: 300.0).
gpu_timeout::Real: Max seconds to wait for GPU memory in :gpu mode (default: Inf).
on_wait::Function: Optional callback (elapsed, available, required) -> nothing for wait progress.

Returns

Tuple of (ROIBatch, BoxesInfo):

ROIBatch with the following fields:

data: ROI stack (boxsize × boxsize × n_rois) containing image patches
x_corners: Vector of x (column) corner positions in camera coordinates
y_corners: Vector of y (row) corner positions in camera coordinates
frame_indices: Vector of frame indices for each ROI
camera: Camera object (provided or default IdealCamera)
roi_size: Size of each ROI (square)

BoxesInfo with the following fields:

backend: Compute backend used (:gpu or :cpu)
elapsed_s: Wall time in seconds
device_id: GPU device ID (0-based), or -1 for CPU
n_rois: Number of ROIs detected
batch_size: Frames per batch during processing
n_batches: Number of batches processed
memory_per_batch: Estimated memory per batch in bytes

Details on filtering

The image stack is convolved with a difference of Gaussians (DoG) filter to identify blobs and local maxima. The DoG is computed from two Gaussian kernels with standard deviations sigma_small and sigma_large.

When using the PSF-aware interface with psf_sigma (in microns):

psf_sigma is converted to pixels using camera pixel size
sigmasmall = 1.0 × psfsigma_pixels (matches PSF for optimal blob detection)
sigmalarge = 2.0 × psfsigma_pixels (background suppression)
minval is automatically calculated from min_photons accounting for PSF spreading and DoG response

Variance-Weighted Filtering (sCMOS)

When an SCMOSCamera is provided, the package uses variance-weighted filtering based on the SMITE algorithm. Each pixel's contribution to the convolution is weighted by:

weight = gaussian_kernel / variance

where variance = readnoise². This implements optimal inverse variance weighting:

Low-noise pixels receive high weight (strong influence on detection)
High-noise pixels receive low weight (reduced influence, avoiding false positives)

This significantly improves detection sensitivity in sCMOS data with spatially-varying noise.

GPU Acceleration: Variance-weighted filtering uses KernelAbstractions.jl for device-agnostic computation. The same kernel code runs on both CPU and GPU, automatically selected based on backend. This provides GPU acceleration for sCMOS cameras (10-100x speedup on large images).

Standard Filtering (IdealCamera or no camera)

Standard DoG convolution is used when no camera is provided or with IdealCamera. The convolution is performed via NNlib (using cuDNN on GPU) or CPU, depending on backend.

After filtering, local maxima above minval are identified. Boxes are cut out around each maximum, excluding overlaps.

Examples

# Recommended: PSF-aware detection with physical units
camera = IdealCamera(1:256, 1:256, 0.1f0)  # 256×256 pixels, 100nm pixel size

(roi_batch, info) = getboxes(imagestack, camera;
    psf_sigma = 0.13,              # PSF sigma in microns (physical units)
    min_photons = 500.0,           # Detect emitters with ≥500 photons
    boxsize = 11)

# Access results
boxes = roi_batch.data             # (11 × 11 × n_rois)
x_corners = roi_batch.x_corners    # x (col) positions
y_corners = roi_batch.y_corners    # y (row) positions
frames = roi_batch.frame_indices

# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")

# Advanced: Direct control over filter parameters
(roi_batch, info) = getboxes(imagestack;
    sigma_small = 1.5,  # Custom small Gaussian sigma
    sigma_large = 3.0,  # Custom large Gaussian sigma
    minval = 10.0)      # Custom intensity threshold

# Iterate over ROIs
for roi in roi_batch
    # roi is a SingleROI with .data, .corner, .frame_idx
    process(roi.data)
end

source

SMLMBoxer.getboxstack — Method

getboxstack(imagestack, coords, args::GetBoxesArgs)

Cut out box regions from imagestack centered on coords.

Arguments

imagestack: Input image stack
coords: Coords of box centers
args: Parameters

Returns

boxstack: Array with box crops from imagestack
boxcoords: Upper left corners of boxes
camera_rois: Camera ROIs for each box (if camera provided)

source

SMLMBoxer.has_cuda — Method

has_cuda() -> Bool

Check if CUDA is available. Wrapper for CUDA.functional().

source

SMLMBoxer.maxima2coords — Method

maxima2coords(imagestack)

Get coordinates of all non-zero pixels in input stack

Arguments

imagestack: Input image stack

Returns

coords: List of coords for each frame (always Float32)

source

SMLMBoxer.photons_to_dog_threshold — Method

photons_to_dog_threshold(min_photons, psf_sigma; effective_gain=1.0)

Convert total photon count threshold to DoG filter intensity threshold in image units (ADU).

Arguments

min_photons: Minimum signal photons above background for detection
psf_sigma: PSF sigma in pixels
effective_gain: Camera gain factor (QE × gain) to convert photons → ADU (default: 1.0)

Returns

minval: DoG filter intensity threshold in image units (ADU)

Physics

For a 2D Gaussian PSF with total photons N and sigma σpsf, the peak intensity is: Ipeak = N / (2π σ_psf²) [photons/pixel]

After convolution with the small Gaussian filter (sigmasmall = 1.0 × psfsigma), the effective sigma becomes: σeff = √(σpsf² + sigma_small²)

The peak after filtering is: Ifiltered = N / (2π σeff²) [photons/pixel]

The DoG response (small - large Gaussian) has a lower peak than the small Gaussian alone. For sigmalarge = 2×sigmasmall, the DoG peak is approximately 0.65× the small Gaussian peak.

For raw camera data in ADU, the threshold is scaled by effective_gain = QE × gain.

source

SMLMBoxer.pixels_to_microns — Method

pixels_to_microns(pixel_coords, camera::AbstractCamera)

Convert pixel coordinates (row, col) to micron coordinates (x, y) using camera geometry.

Arguments

pixel_coords: N×2 matrix of (row, col) coordinates
camera: Camera object with pixeledgesx and pixeledgesy

Returns

N×2 matrix of (x, y) coordinates in microns

source

SMLMBoxer.poll_gpu_nvml — Method

poll_gpu_nvml(required_bytes; timeout=30.0, poll=0.5, on_wait=nothing) -> (Bool, Int)

Poll ALL GPUs via NVML until one has sufficient free memory and low contention. No CUDA context creation needed - safe under multi-process contention.

Arguments

required_bytes: Minimum bytes needed (1.5x safety margin applied internally)
timeout: Maximum seconds to poll (default 30.0, use Inf for unlimited)
poll: Seconds between checks (default 0.5)
on_wait: Optional callback(elapsed, best_available, required) called each poll

Returns

(true, device_id) if a GPU became available, (false, -1) if timeout reached

Contention Detection

A GPU is considered contended when other processes are present AND either:

Free memory is insufficient (< required × 1.5)
Compute utilization exceeds 90%

When other processes are present but memory is sufficient and utilization is low, the GPU is still considered available.

source

SMLMBoxer.recommend_batch_size — Method

recommend_batch_size(height, width; backend=:auto, memory_fraction=0.8) -> Int

Return recommended maximum number of frames to load at once given memory constraints.

This helps users decide how much data to load before calling getboxes(). For very large datasets, loading data in chunks of this size ensures efficient processing without running out of memory.

Arguments

height::Int: Image height in pixels
width::Int: Image width in pixels
backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)
memory_fraction::Real: Fraction of free memory to use (default: 0.8)

Returns

Maximum recommended number of frames to load at once

Memory Model

The processing pipeline requires approximately 6× the raw image size:

Input imagestack
Filtered stack (DoG output)
Local maxima detection intermediates
Coordinate arrays
Box extraction workspace
Broadcast temporaries

Example

using SMLMBoxer

# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")

# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
    chunk_end = min(chunk_start + max_frames - 1, total_frames)
    imagestack = load_frames(chunk_start:chunk_end)
    roi_batch = getboxes(imagestack, camera; psf_sigma=0.13)
    # ... process results
end

source

SMLMBoxer.removeoverlap — Method

removeoverlap(coords, args)

Remove overlapping coords based on distance.

Arguments

coords: List of coords
args: Parameters

Returns

coords: Coords with overlaps removed

source

SMLMBoxer.reshape_for_flux — Method

reshapeforflux(arr::AbstractArray)

Reshape array to have singleton dims for NNlib convolution.

Arguments

arr: Input array, must be 2D or 3D

Returns

Reshaped array with added singleton dimensions

source

SMLMBoxer.select_backend — Method

select_backend(backend::Symbol, required_bytes;
               auto_timeout=300.0, gpu_timeout=Inf, on_wait=nothing) -> (Symbol, Int)

Select compute backend with two-layer GPU contention handling.

Layer 1 (NVML polling): Scans all GPUs via NVML without creating CUDA contexts. Polls through timeout with jittered backoff. First GPU with sufficient free memory and low contention wins. Safe under multi-process contention.

Layer 2 (runtime try/catch): Applied by caller for :auto mode. If CUDA errors occur during processing despite NVML pre-check, falls back to CPU.

Arguments

backend: :cpu, :gpu, or :auto
required_bytes: Estimated GPU memory needed for processing
auto_timeout: Max wait for :auto mode before CPU fallback (default 300.0)
gpu_timeout: Max wait for :gpu mode (default Inf - wait forever)
on_wait: Optional callback(elapsed, available, required)

Returns

(backend::Symbol, device_id::Int) - selected backend and GPU device (0-based, -1 for CPU)

Behavior

:cpu - Returns (:cpu, -1) immediately
:gpu - NVML poll for device, then CUDA waitforgpu_memory. Errors if unavailable/timeout
:auto - NVML poll for device with timeout, falls back to (:cpu, -1) with warning

source

SMLMBoxer.variance_weighted_gaussian_kernel! — Method

variance_weighted_gaussian_kernel!(output, input, variance, sigma, winsize)

KernelAbstractions kernel for variance-weighted Gaussian convolution. Implements SMITE-style inverse variance weighting.

This follows the same KernelAbstractions pattern used in GaussMLE.jl (kernel-abstract branch) for seamless CPU/GPU execution and consistent API across JuliaSMLM packages.

Arguments

output: Output array (nrows, ncols)
input: Input array (nrows, ncols)
variance: Variance map (nrows, ncols)
sigma: Gaussian sigma
winsize: Window size (pixels)

Note

Same kernel code runs on CPU (via CPU() backend) or GPU (via CUDABackend()). Backend is selected automatically based on use_gpu parameter.

source

SMLMBoxer.variance_weighted_gaussian_kernel_batched! — Method

variance_weighted_gaussian_kernel_batched!(output, input, variance, sigma, winsize, nrows, ncols)

Batched KernelAbstractions kernel for variance-weighted Gaussian convolution. Processes all frames in a single kernel launch via 3D ndrange=(nrows, ncols, nframes), eliminating per-frame launch overhead.

Arguments

output: Output array (nrows, ncols, 1, nframes)
input: Input array (nrows, ncols, 1, nframes)
variance: Variance map (nrows, ncols)
sigma: Gaussian sigma
winsize: Window size (pixels)
nrows: Number of rows (passed explicitly for bounds checking)
ncols: Number of columns

source

SMLMBoxer.wait_for_gpu_memory — Method

wait_for_gpu_memory(required_bytes; timeout=30.0, poll=0.5, on_wait=nothing) -> Bool

Wait until current GPU device has sufficient available memory. Uses CUDA calls (requires active context). Used by :gpu mode after device selection.

Arguments

required_bytes: Minimum bytes needed (with safety margin applied internally)
timeout: Maximum seconds to wait (default 30.0, use Inf for unlimited)
poll: Seconds between checks (default 0.5)
on_wait: Optional callback(elapsed, available, required) called each poll

Returns

true if memory became available, false if timeout reached

source