API Reference
SMLMBoxer.SMLMBoxerSMLMBoxer.BoxerConfigSMLMBoxer.BoxesInfoSMLMBoxer.GetBoxesArgsSMLMBoxer._getboxes_implSMLMBoxer._gpu_maxima2coordsSMLMBoxer._process_with_batchingSMLMBoxer.apiSMLMBoxer.convolveSMLMBoxer.convolve_variance_weightedSMLMBoxer.dog_filterSMLMBoxer.dog_filter_variance_weightedSMLMBoxer.dog_kernelSMLMBoxer.estimate_gpu_memorySMLMBoxer.estimate_gpu_memory_per_frameSMLMBoxer.extract_camera_roiSMLMBoxer.fillbox!SMLMBoxer.find_best_gpuSMLMBoxer.findlocalmaxSMLMBoxer.gaussian_2dSMLMBoxer.genlocalmaximageSMLMBoxer.get_effective_gainSMLMBoxer.get_pixel_sizeSMLMBoxer.get_variance_mapSMLMBoxer.getboxesSMLMBoxer.getboxstackSMLMBoxer.has_cudaSMLMBoxer.maxima2coordsSMLMBoxer.photons_to_dog_thresholdSMLMBoxer.pixels_to_micronsSMLMBoxer.poll_gpu_nvmlSMLMBoxer.recommend_batch_sizeSMLMBoxer.removeoverlapSMLMBoxer.reshape_for_fluxSMLMBoxer.select_backendSMLMBoxer.variance_weighted_gaussian_kernel!SMLMBoxer.variance_weighted_gaussian_kernel_batched!SMLMBoxer.wait_for_gpu_memory
SMLMBoxer.SMLMBoxer — Module
SMLMBoxerHigh-performance particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.
API Overview
For a comprehensive overview of the API, use help mode:
?SMLMBoxer.apiOr access the complete API documentation programmatically:
docs = SMLMBoxer.api()SMLMBoxer.BoxerConfig — Type
BoxerConfigConfiguration for ROI detection via getboxes().
Use either PSF-aware interface (psfsigma + minphotons) or advanced interface (sigmasmall + sigmalarge + minval). PSF-aware is recommended.
Fields
PSF-Aware Interface (Recommended)
psf_sigma::Union{Float64,Nothing}: PSF sigma in microns (e.g., 0.13 for 130nm PSF). Requires camera for pixel conversion. When set, overrides sigmasmall/sigmalarge/minval.min_photons::Float64: Minimum photons for detection (default: 500.0)
Advanced Interface (Direct Control)
sigma_small::Float64: Small Gaussian sigma in pixels (default: 1.0)sigma_large::Float64: Large Gaussian sigma in pixels (default: 2.0)minval::Float64: DoG intensity threshold (default: 0.0)
Box Parameters
boxsize::Int: ROI box size in pixels (default: 7)overlap::Float64: Max overlap between detections in pixels (default: 2.0)
Backend Parameters
backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)auto_timeout::Float64: Max wait for GPU in :auto mode before CPU fallback (default: 300.0)gpu_timeout::Float64: Max wait for GPU in :gpu mode (default: Inf)on_wait::Union{Function,Nothing}: Optional callback(elapsed, available, required) -> nothingfor GPU wait progress (default: nothing)
Examples
# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)
# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)SMLMBoxer.BoxesInfo — Type
BoxesInfoMetadata returned alongside ROIBatch from getboxes().
Fields
backend::Symbol: Compute backend used (:gpu or :cpu)elapsed_s::Float64: Wall time in secondsdevice_id::Int: GPU device ID (0-based), or -1 for CPUn_rois::Int: Number of ROIs detectedbatch_size::Int: Frames per batch during processingn_batches::Int: Number of batches processedmemory_per_batch::Int: Estimated memory per batch in bytes
SMLMBoxer.GetBoxesArgs — Type
GetBoxesArgsInternal structure for getboxes parameters. Users should call getboxes() with keyword arguments rather than constructing this directly.
Primary Interface (Recommended)
psf_sigma::Real: PSF sigma in microns (physical units, e.g., 0.13 for 130nm PSF) Requires camera to be provided for pixel size conversion.min_photons::Real: Minimum total photons for detection (default: 500.0)
When psf_sigma is provided:
- Converted to pixels using camera pixel size
- sigmasmall = 1.0 × psfsigma_pixels (automatically calculated)
- sigmalarge = 2.0 × psfsigma_pixels (automatically calculated)
- minval = photonstodogthreshold(minphotons, psfsigmapixels) (automatically calculated)
Advanced Interface (Direct Control)
sigma_small::Real: Small Gaussian sigma in pixels (default: 1.0)sigma_large::Real: Large Gaussian sigma in pixels (default: 2.0)minval::Real: DoG intensity threshold (default: 0.0)
Other Parameters
imagestack: Input image stackcamera: Camera object (IdealCamera or SCMOSCamera)boxsize::Int: ROI box size in pixels (default: 7)overlap::Real: Maximum overlap between detections in pixels (default: 2.0)backend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)auto_timeout::Real: Max wait seconds for :auto mode before CPU fallback (default: 300.0)gpu_timeout::Real: Max wait seconds for :gpu mode (default: Inf)on_wait: Optional callback(elapsed, available, required) for wait progress
SMLMBoxer._getboxes_impl — Method
_getboxes_impl(args::GetBoxesArgs)Internal implementation of getboxes that does the actual work.
SMLMBoxer._gpu_maxima2coords — Method
_gpu_maxima2coords(localmaximage::CuArray)GPU-accelerated coordinate extraction using sparse compaction.
Instead of transferring the entire 4D array to CPU (e.g., 1 GB for 512x512x1x1000), uses GPU findall (prefix-sum compaction) to find nonzero indices on-device, then transfers only the sparse indices and values (~1.2 MB for ~100K maxima).
Arguments
localmaximage: 4D CuArray (nrows, ncols, 1, nframes) with nonzero values at maxima
Returns
coords: Vector{Matrix{Float32}} — same format asmaxima2coords
SMLMBoxer._process_with_batching — Method
_process_with_batching(imagestack, args, kernelsize, max_free_mem; use_gpu, batch_cleanup=nothing)Process imagestack with memory-aware batching. Handles both single-batch (fits in memory) and multi-batch (too large) cases.
Returns (coords, batch_size, n_batches, memory_per_batch).
Arguments
imagestack: 4D image stack (ny, nx, 1, nframes)args: GetBoxesArgs with filter parameterskernelsize: Kernel size for local max detectionmax_free_mem: Available memory in bytesuse_gpu: Whether to use GPU for processingbatch_cleanup: Optional function called after each batch (e.g., for GC)
SMLMBoxer.api — Method
SMLMBoxer.jl API Reference
Particle/blob detection in SMLM image stacks using difference-of-Gaussians filtering with GPU acceleration and sCMOS variance-weighted filtering support.
Exports
Total exports: 6
getboxes- Main detection functionBoxerConfig- Configuration struct for detection parametersBoxesInfo- Metadata struct returned alongside ROIBatchrecommend_batch_size- Memory-aware batch sizing utilityROIBatch- Re-exported from SMLMData.jlSingleROI- Re-exported from SMLMData.jl
Note: SMLMBoxer.api() is available but not exported to avoid conflicts with other JuliaSMLM packages.
Key Concepts
Difference of Gaussians (DoG) Filtering
Detects blob-like features by subtracting two Gaussian-blurred versions of the image:
- sigma_small: Matches PSF size for optimal blob detection
- sigma_large: Suppresses background (typically 2× sigma_small)
- minval: Intensity threshold after filtering
PSF-Aware Detection (Recommended)
Specify physical PSF parameters (in microns), automatically converted to optimal filter settings:
- psf_sigma: PSF sigma in microns (e.g., 0.13 for 130nm PSF)
- min_photons: Total photon threshold (automatically converted to DoG intensity threshold)
Variance-Weighted Filtering (sCMOS)
When SCMOSCamera is provided, implements SMITE-style inverse variance weighting:
- Each pixel weighted by
gaussian_kernel / variancewhere variance = readnoise² - Low-noise pixels: high weight (strong detection influence)
- High-noise pixels: low weight (reduced false positives)
- GPU-accelerated via KernelAbstractions.jl (device-agnostic kernels)
GPU Acceleration and Scheduling
- Standard DoG: NNlib with cuDNN backend (10-100x speedup)
- Variance-weighted: KernelAbstractions custom kernels (same code for CPU/GPU)
- Multi-GPU support: NVML-based polling selects GPU with most free memory across all devices
Unified GPU retry loop handles multi-process contention:
- Poll all GPUs via NVML (no CUDA context creation) for sufficient free memory and low utilization
- Acquire GPU context and run processing
- On any failure (no memory, TOCTOU context race, runtime OOM): release memory via
GC.gc() + CUDA.reclaim(), re-poll NVML with remaining timeout - On timeout:
:autofalls back to CPU,:gpuerrors - On success: reclaim GPU memory pool so finished jobs don't block other processes
Backend modes:
:cpu- Always CPU, no GPU involvement:gpu- Require GPU, retry untilgpu_timeout(default: Inf), error if unavailable:auto- Try GPU, retry untilauto_timeout(default: 300s), fall back to CPU
Wait progress callback:
config = BoxerConfig(
psf_sigma=0.13,
backend=:auto,
on_wait=(elapsed, available, required) -> @info "Waiting for GPU" elapsed available required
)Configuration
BoxerConfig
Configuration struct for ROI detection parameters. Supports @kwdef construction with defaults.
@kwdef struct BoxerConfig
# PSF-aware interface (recommended)
psf_sigma::Union{Float64,Nothing} = nothing # PSF sigma in microns
min_photons::Float64 = 500.0 # Minimum photons for detection
# Advanced interface (direct control)
sigma_small::Float64 = 1.0 # Small Gaussian sigma in pixels
sigma_large::Float64 = 2.0 # Large Gaussian sigma in pixels
minval::Float64 = 0.0 # DoG intensity threshold
# Box parameters
boxsize::Int = 7 # ROI size in pixels
overlap::Float64 = 2.0 # Max overlap between detections
# Backend parameters
backend::Symbol = :auto # :cpu, :gpu, or :auto
auto_timeout::Float64 = 300.0 # Max wait for GPU in :auto mode
gpu_timeout::Float64 = Inf # Max wait in :gpu mode
on_wait::Union{Function,Nothing} = nothing # Optional wait progress callback
endUsage:
# PSF-aware (recommended)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
# Advanced (direct control)
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0)
# GPU-specific
config = BoxerConfig(psf_sigma=0.13, backend=:gpu, gpu_timeout=60.0)Core Function
getboxes - Two Calling Conventions
Config-based (recommended for reusable settings):
getboxes(imagestack, camera, config::BoxerConfig) -> (ROIBatch, BoxesInfo)Kwargs-based (convenient for one-off calls):
getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)Both conventions are equivalent - kwargs are forwarded to a BoxerConfig internally.
Main detection function. Applies DoG filtering, finds local maxima, extracts ROI patches.
Arguments:
imagestack::AbstractArray{<:Real}- Input image stack (2D or 3D)camera::Union{AbstractCamera,Nothing}- Camera object (IdealCamera or SCMOSCamera)config::BoxerConfig- Configuration struct (config-based convention)
Kwargs (kwargs-based convention):
PSF-Aware Interface (Recommended):
psf_sigma::Real- PSF sigma in microns (requires camera for pixel size conversion)min_photons::Real- Minimum total photons for detection (default: 500.0)
Advanced Interface (Direct Control):
sigma_small::Real- Small Gaussian sigma in pixels (default: 1.0)sigma_large::Real- Large Gaussian sigma in pixels (default: 2.0)minval::Real- DoG intensity threshold (default: 0.0)
Other Parameters:
boxsize::Int- ROI size in pixels (default: 7)overlap::Real- Maximum overlap between detections in pixels (default: 2.0)backend::Symbol- Compute backend::cpu,:gpu, or:auto(default::auto):cpu- Always use CPU:gpu- Require GPU, wait for memory if needed (waits forever by default):auto- Try GPU with timeout, fall back to CPU if memory unavailable
auto_timeout::Real- Max seconds to wait for GPU in:automode (default: 300.0)gpu_timeout::Real- Max seconds to wait in:gpumode (default: Inf)on_wait::Function- Optional callback(elapsed, available, required) -> nothingfor wait progress
Returns: Tuple of (ROIBatch, BoxesInfo)
ROIBatch with fields:
data- ROI stack (boxsize × boxsize × n_rois)x_corners- Vector of x (column) corner positionsy_corners- Vector of y (row) corner positionsframe_indices- Vector of frame indices for each ROIcamera- Camera object (provided or default IdealCamera)roi_size- Size of each ROI (square)
BoxesInfo with fields:
backend- Compute backend used (:gpuor:cpu)elapsed_s- Wall time in secondsdevice_id- GPU device ID (0-based), or -1 for CPUn_rois- Number of ROIs detectedbatch_size- Frames per batch during processingn_batches- Number of batches processedmemory_per_batch- Estimated memory per batch in bytes
recommend_batch_size(height, width; backend=:auto, memory_fraction=0.8) -> Int
Returns the recommended maximum number of frames to load at once given memory constraints.
Use this when processing very large datasets to determine optimal chunk size before calling getboxes().
Arguments:
height::Int- Image height in pixelswidth::Int- Image width in pixelsbackend::Symbol- Compute backend::cpu,:gpu, or:auto(default::auto)memory_fraction::Real- Fraction of free memory to use (default: 0.8)
Returns: Maximum recommended number of frames to load at once
Memory Model: The processing pipeline requires approximately 6× the raw image size:
- Input imagestack
- Filtered stack (DoG output)
- Local maxima detection intermediates
- Coordinate arrays
- Box extraction workspace
- Broadcast temporaries
Example:
using SMLMBoxer
# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")
# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
chunk_end = min(chunk_start + max_frames - 1, total_frames)
imagestack = load_frames(chunk_start:chunk_end)
(roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)
# ... process results
endRe-Exported Types
ROIBatch (from SMLMData.jl)
Container for multiple ROIs from particle detection. Supports iteration and indexing.
# Iteration
for roi in roi_batch
# roi is a SingleROI with .data, .corner, .frame_idx, .camera
process(roi.data)
end
# Indexing
roi = roi_batch[1] # Returns SingleROISingleROI (from SMLMData.jl)
Individual ROI with image data, position, frame index, and camera calibration.
Common Workflows
PSF-Aware Detection (Recommended)
using SMLMBoxer, SMLMData
# Create camera with physical pixel size (100nm pixels)
camera = IdealCamera(1:256, 1:256, 0.1f0)
# Config-based (recommended for reusable settings)
config = BoxerConfig(psf_sigma=0.13, min_photons=500.0, boxsize=11)
(roi_batch, info) = getboxes(imagestack, camera, config)
# OR kwargs-based (convenient for one-off calls)
(roi_batch, info) = getboxes(imagestack, camera;
psf_sigma = 0.13, # 130nm PSF in microns
min_photons = 500.0, # Minimum 500 photons
boxsize = 11)
# Access results
n_detections = length(roi_batch.x_corners)
boxes = roi_batch.data # 11×11×n ROI patches
positions_x = roi_batch.x_corners # Column positions
positions_y = roi_batch.y_corners # Row positions
frames = roi_batch.frame_indices
# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")sCMOS Variance-Weighted Detection
using SMLMData
# Create sCMOS camera with readnoise map (Float32 for type consistency)
readnoise_map = Float32.(load_readnoise_calibration("camera_calib.mat"))
camera = SCMOSCamera(256, 256, 0.1f0, readnoise_map)
# Variance-weighted detection (automatically enabled)
(roi_batch, info) = getboxes(imagestack, camera;
psf_sigma = 0.13,
min_photons = 300.0, # Lower threshold possible with noise weighting
backend = :auto) # GPU with CPU fallbackAdvanced: Direct Parameter Control
# Expert mode: bypass PSF-aware interface
# Config-based
config = BoxerConfig(sigma_small=1.5, sigma_large=3.0, minval=10.0, boxsize=9, overlap=1.5)
(roi_batch, info) = getboxes(imagestack, nothing, config)
# OR kwargs-based
(roi_batch, info) = getboxes(imagestack;
sigma_small = 1.5, # Custom filter sigma (pixels)
sigma_large = 3.0,
minval = 10.0, # Direct intensity threshold
boxsize = 9,
overlap = 1.5)Processing Individual ROIs
(roi_batch, info) = getboxes(imagestack, camera; psf_sigma=0.13)
# Iterate over ROIs
for roi in roi_batch
# SingleROI fields:
# - roi.data: Image patch
# - roi.corner: (x, y) corner position
# - roi.frame_idx: Frame index
# - roi.camera: Camera ROI calibration
fit_gaussian(roi.data)
end
# Direct indexing
first_roi = roi_batch[1]Internal Functions (Not Exported)
These functions are documented but not part of the public API. Use at your own risk as they may change.
Filtering:
dog_filter(imagestack, args)- Apply DoG filter (routes to standard or variance-weighted)dog_filter_variance_weighted(imagestack, σ_small, σ_large, args)- Variance-weighted DoGconvolve(imagestack, kernel; use_gpu)- Standard convolution via NNlibconvolve_variance_weighted(imagestack, variance_map, σ, use_gpu)- Variance-weighted via KAgaussian_2d(sigma, kernelsize)- Create 2D Gaussian kerneldog_kernel(sigma_small, sigma_large)- Create DoG kernel
Local Maxima Detection:
findlocalmax(imagestack, kernelsize; minval, use_gpu)- Find local maximum coordinatesgenlocalmaximage(imagestack, kernelsize; minval, use_gpu)- Generate local max image
Coordinate Processing:
maxima2coords(imagestack)- Convert non-zero pixels to coordinatesremoveoverlap(coords, args)- Remove overlapping detections
Box Extraction:
getboxstack(imagestack, coords, args)- Extract ROI patches at coordinatesfillbox!(box, imagestack, row, col, im, boxsize)- Fill single ROI patch
Helper Functions:
get_pixel_size(camera)- Extract pixel size from cameraphotons_to_dog_threshold(min_photons, psf_sigma)- Convert photon threshold to DoG thresholdpixels_to_microns(pixel_coords, camera)- Coordinate conversionextract_camera_roi(camera, row_range, col_range)- Extract camera calibration for ROIget_variance_map(camera, imagesize)- Compute variance map from camerareshape_for_flux(arr)- Reshape array for NNlib convolution
Algorithm Details
Detection Pipeline
- Filtering: Apply DoG filter (standard or variance-weighted based on camera type)
- Local Maxima: Find peaks above threshold using max pooling
- Overlap Removal: Eliminate overlapping detections (keep higher intensity)
- Box Extraction: Cut out ROI patches around each detection
- ROIBatch Construction: Package results with camera calibration
PSF-Aware Parameter Conversion
When psf_sigma (microns) is provided:
# Convert to pixels
psf_sigma_pixels = psf_sigma / pixel_size
# Automatic filter sizing
sigma_small = 1.0 × psf_sigma_pixels # Match PSF
sigma_large = 2.0 × psf_sigma_pixels # Background suppression
# Photon threshold → DoG intensity threshold
σ_eff = √(psf_sigma² + sigma_small²) # Effective sigma after filtering
peak_filtered = min_photons / (2π × σ_eff²)
minval = 0.65 × peak_filtered # DoG reduction factorVariance-Weighted Convolution
For sCMOS cameras with readnoise map:
# At each pixel (i,j):
weightsum = sum(gaussian_weight / variance[ii,jj] * input[ii,jj])
varsum = sum(gaussian_weight / variance[ii,jj])
output[i,j] = weightsum / varsumImplements optimal inverse variance weighting for spatially-varying noise.
Performance Notes
- GPU Memory Management: Automatically batches frames if image stack exceeds GPU memory
- Type Stability: All inputs converted to Float32 at entry point
- Multi-GPU NVML Polling: Scans all GPUs via NVML without creating CUDA contexts. Checks free memory, process contention, and compute utilization. First GPU with sufficient memory and low contention wins.
- Contention-Safe Retry: Unified retry loop handles all GPU failure modes (insufficient memory, TOCTOU context race, runtime OOM). Releases memory and re-polls with remaining timeout budget.
- Memory Pool Reclaim: Calls
GC.gc() + CUDA.reclaim()after both successful and failed GPU processing to return memory to the system, preventing finished jobs from blocking other processes. - Jittered Backoff: NVML polling uses jittered sleep intervals to avoid thundering herd when multiple processes compete for GPUs.
- Backend Abstraction: KernelAbstractions enables same code for CPU/GPU variance weighting
- Typical Speedup: 10-100x with GPU depending on image size and number of frames
api() returns this documentation as a plain String.
SMLMBoxer.convolve — Method
convolve(imagestack, kernel; use_gpu=false)Convolve imagestack with given kernel using NNlib.
Arguments
imagestack: Input array of image data (H, W, 1, F)kernel: Kernel to convolve with (K, K)
Keyword Arguments
use_gpu: Whether to use GPU
Returns
filtered_stack: Convolved image stack
SMLMBoxer.convolve_variance_weighted — Method
convolve_variance_weighted(imagestack, variance_map, sigma, use_gpu)Apply variance-weighted Gaussian convolution using KernelAbstractions. Device-agnostic: works on CPU and GPU with same code.
Follows same pattern as convolve(): stays on GPU if use_gpu=true, letting interface.jl handle memory batching for both paths uniformly.
Arguments
imagestack: Input image (rows, cols, 1, frames) - CPU or GPU arrayvariance_map: Variance at each pixel (rows, cols)sigma: Gaussian sigmause_gpu: Use GPU if available
Returns
- Variance-weighted filtered image (CuArray if use_gpu, Array otherwise)
SMLMBoxer.dog_filter — Method
dog_filter(imagestack, args)Apply DoG filter to imagestack based on args. Uses variance-weighted filtering if sCMOS camera is provided.
Arguments
imagestack: Input array of image dataargs: Arguments with sigma values and camera
Returns
filtered_stack: Filtered image stack
SMLMBoxer.dog_filter_variance_weighted — Method
dog_filter_variance_weighted(imagestack, sigma_small, sigma_large, args)Apply variance-weighted DoG filter using sCMOS variance map. Implements SMITE-style inverse variance weighting during convolution.
Arguments
imagestack: Input image data (rows, cols, 1, frames)sigma_small: Sigma for small Gaussiansigma_large: Sigma for large Gaussianargs: GetBoxesArgs with camera
Returns
filtered_stack: Variance-weighted filtered image
SMLMBoxer.dog_kernel — Method
dog_kernel(s1, s2)
Compute difference of Gaussian kernels.
Arguments
sigma_small: Sigma for small Gaussiansigma_large: Sigma for large Gaussian
Returns
dog: Difference of Gaussians kernel
SMLMBoxer.estimate_gpu_memory — Method
estimate_gpu_memory(imagestack, camera) -> IntEstimate GPU memory required for processing imagestack.
Arguments
imagestack: Input image array (H, W, ..., F)camera: Camera object (affects memory multiplier)
Returns
- Estimated bytes needed for GPU processing
Memory Model
Standard DoG path: 6x input size
- Input, filteredsmall, filteredlarge, DoG result, localmax temps, GC margin
Variance-weighted (SCMOSCamera): 8x input size
- Additional workspace for per-pixel variance weighting (in-place DoG saves one copy)
SMLMBoxer.estimate_gpu_memory_per_frame — Method
estimate_gpu_memory_per_frame(height, width, camera) -> IntEstimate GPU memory required per frame.
Arguments
height: Image height in pixelswidth: Image width in pixelscamera: Camera object (affects memory multiplier)
Returns
- Estimated bytes needed per frame for GPU processing
SMLMBoxer.extract_camera_roi — Method
extract_camera_roi(camera::AbstractCamera, row_range, col_range)Extract a camera ROI with calibration data for the specified pixel region.
Arguments
camera: Source camera objectrow_range: Range of rows to extractcol_range: Range of columns to extract
Returns
- Camera object of the same type with ROI calibration data
SMLMBoxer.fillbox! — Method
fillbox!(box, imagestack, row, col, im, boxsize)
Fill a box with a crop from the imagestack.
Arguments
box: Array to fill with box cropimagestack: Input image stackrow,col,im: Coords for cropboxsize: Size of box
Returns
boxcoords: Upper Left corners of boxes N x (row, col, im)
SMLMBoxer.find_best_gpu — Method
find_best_gpu() -> IntFind the GPU with the most free memory and switch to it.
Returns the device index (0-based). On single-GPU systems, returns 0 immediately.
Uses NVML to query free memory on each device without creating CUDA contexts, avoiding cuDevicePrimaryCtxRetain OOM errors under multi-process contention. Only calls CUDA.device!() once on the selected device.
Example
best = find_best_gpu() # Switches to best GPU
# Now all CUDA operations use that GPUSMLMBoxer.findlocalmax — Method
findlocalmax(imagestack, kernelsize; minval=0.0, use_gpu=false)
Find the coordinates of local maxima in an image.
Arguments
imagestack: An array of real numbers representing the image data.kernelsize: The size of the kernel used to identify local maxima.
Keyword Arguments
minval: The minimum value a local maximum must have to be considered valid (default: 0.0).use_gpu: Whether or not to use GPU acceleration (default: false).
Returns
coords: The coordinates of the local maxima in the image.
SMLMBoxer.gaussian_2d — Method
gaussian_2d(sigma, ksize)
Create a 2D Gaussian kernel.
Arguments
sigma: Standard deviationkernelsize: Kernel size
Returns
kernel: Normalized 2D Gaussian kernel
SMLMBoxer.genlocalmaximage — Method
genlocalmaximage(imagestack, kernelsize; minval=0.0, use_gpu=false)
Generate an image highlighting the local maxima using NNlib max pooling.
Arguments
imagestack: An array of real numbers representing the image data (H, W, 1, F).kernelsize: The size of the kernel used to identify local maxima.
Keyword Arguments
minval: The minimum value a local maximum must have to be considered valid (default: 0.0).use_gpu: Whether or not to use GPU acceleration (default: false).
Returns
localmaximage: An image with local maxima highlighted.
SMLMBoxer.get_effective_gain — Method
get_effective_gain(camera::AbstractCamera)Get effective gain for converting photons to image ADU units (ADU/photon).
For IdealCamera: returns 1.0 (assumes image is in photon units) For SCMOSCamera: returns QE / gain (photons → ADU conversion factor)
- SMLMData defines gain as e⁻/ADU (electrons per ADU)
- Physical conversion: photon → QE electrons → electrons/gain ADU
- So: ADU/photon = QE / gain
Used to convert photon-based thresholds to image-unit thresholds.
SMLMBoxer.get_pixel_size — Method
get_pixel_size(camera::AbstractCamera)Extract pixel size from camera pixel edges (in microns). Assumes approximately square pixels - returns x-direction pixel size.
For non-square pixels, pixelsizex and pixelsizey may differ slightly. This function returns pixelsizex for simplicity.
Arguments
camera: Camera object (IdealCamera or SCMOSCamera)
Returns
- Pixel size in microns (x-direction)
SMLMBoxer.get_variance_map — Method
get_variance_map(camera::AbstractCamera, imagesize)Compute variance map from camera calibration.
Arguments
camera: Camera object with noise calibrationimagesize: Tuple of (nrows, ncols) for the image
Returns
- Variance map (variance = readnoise²) matching image dimensions
SMLMBoxer.getboxes — Method
getboxes(imagestack, camera=nothing; kwargs...) -> (ROIBatch, BoxesInfo)Detect particles/blobs in a multidimensional image stack and return ROI batch with location tracking and processing metadata.
Arguments
imagestack::AbstractArray{<:Real}: The input image stack. Should be 2D or 3D.camera::Union{AbstractCamera,Nothing}: Optional camera object (IdealCamera or SCMOSCamera) from SMLMData. If not provided, a default IdealCamera is created.
Primary Interface (Recommended - PSF-Aware)
psf_sigma::Real: PSF sigma in microns (physical units, e.g., 0.13 for 130nm PSF). Automatically converted to pixels using camera pixel size and sets optimal DoG filter parameters. Requires camera to be provided.min_photons::Real: Minimum total photons for detection (default: 500.0). Automatically converted to appropriate intensity threshold.
Advanced Interface (Direct Control)
For expert users who want direct control over filter parameters:
sigma_small::Real: Small Gaussian sigma in pixels (default: 1.0).sigma_large::Real: Large Gaussian sigma in pixels (default: 2.0).minval::Real: DoG filter intensity threshold (default: 0.0).
Note: If psf_sigma is provided, it overrides sigmasmall/sigmalarge/minval.
Other Parameters
boxsize::Int: Size of the box to cut out around each local maximum in pixels (default: 7).overlap::Real: Maximum overlap allowed between boxes in pixels (default: 2.0).backend::Symbol: Compute backend -:cpu,:gpu, or:auto(default::auto).:cpu- Always use CPU:gpu- Require GPU, wait for memory if needed (waits forever by default):auto- Try GPU with timeout, fall back to CPU if memory unavailable
auto_timeout::Real: Max seconds to wait for GPU memory in:automode (default: 300.0).gpu_timeout::Real: Max seconds to wait for GPU memory in:gpumode (default: Inf).on_wait::Function: Optional callback(elapsed, available, required) -> nothingfor wait progress.
Returns
Tuple of (ROIBatch, BoxesInfo):
ROIBatch with the following fields:
data: ROI stack (boxsize × boxsize × n_rois) containing image patchesx_corners: Vector of x (column) corner positions in camera coordinatesy_corners: Vector of y (row) corner positions in camera coordinatesframe_indices: Vector of frame indices for each ROIcamera: Camera object (provided or default IdealCamera)roi_size: Size of each ROI (square)
BoxesInfo with the following fields:
backend: Compute backend used (:gpu or :cpu)elapsed_s: Wall time in secondsdevice_id: GPU device ID (0-based), or -1 for CPUn_rois: Number of ROIs detectedbatch_size: Frames per batch during processingn_batches: Number of batches processedmemory_per_batch: Estimated memory per batch in bytes
Details on filtering
The image stack is convolved with a difference of Gaussians (DoG) filter to identify blobs and local maxima. The DoG is computed from two Gaussian kernels with standard deviations sigma_small and sigma_large.
When using the PSF-aware interface with psf_sigma (in microns):
- psf_sigma is converted to pixels using camera pixel size
- sigmasmall = 1.0 × psfsigma_pixels (matches PSF for optimal blob detection)
- sigmalarge = 2.0 × psfsigma_pixels (background suppression)
- minval is automatically calculated from min_photons accounting for PSF spreading and DoG response
Variance-Weighted Filtering (sCMOS)
When an SCMOSCamera is provided, the package uses variance-weighted filtering based on the SMITE algorithm. Each pixel's contribution to the convolution is weighted by:
weight = gaussian_kernel / variancewhere variance = readnoise². This implements optimal inverse variance weighting:
- Low-noise pixels receive high weight (strong influence on detection)
- High-noise pixels receive low weight (reduced influence, avoiding false positives)
This significantly improves detection sensitivity in sCMOS data with spatially-varying noise.
GPU Acceleration: Variance-weighted filtering uses KernelAbstractions.jl for device-agnostic computation. The same kernel code runs on both CPU and GPU, automatically selected based on backend. This provides GPU acceleration for sCMOS cameras (10-100x speedup on large images).
Standard Filtering (IdealCamera or no camera)
Standard DoG convolution is used when no camera is provided or with IdealCamera. The convolution is performed via NNlib (using cuDNN on GPU) or CPU, depending on backend.
After filtering, local maxima above minval are identified. Boxes are cut out around each maximum, excluding overlaps.
Examples
# Recommended: PSF-aware detection with physical units
camera = IdealCamera(1:256, 1:256, 0.1f0) # 256×256 pixels, 100nm pixel size
(roi_batch, info) = getboxes(imagestack, camera;
psf_sigma = 0.13, # PSF sigma in microns (physical units)
min_photons = 500.0, # Detect emitters with ≥500 photons
boxsize = 11)
# Access results
boxes = roi_batch.data # (11 × 11 × n_rois)
x_corners = roi_batch.x_corners # x (col) positions
y_corners = roi_batch.y_corners # y (row) positions
frames = roi_batch.frame_indices
# Check processing info
println("Backend: ", info.backend)
println("Elapsed: ", info.elapsed_s * 1000, " ms")
# Advanced: Direct control over filter parameters
(roi_batch, info) = getboxes(imagestack;
sigma_small = 1.5, # Custom small Gaussian sigma
sigma_large = 3.0, # Custom large Gaussian sigma
minval = 10.0) # Custom intensity threshold
# Iterate over ROIs
for roi in roi_batch
# roi is a SingleROI with .data, .corner, .frame_idx
process(roi.data)
endSMLMBoxer.getboxstack — Method
getboxstack(imagestack, coords, args::GetBoxesArgs)
Cut out box regions from imagestack centered on coords.
Arguments
imagestack: Input image stackcoords: Coords of box centersargs: Parameters
Returns
boxstack: Array with box crops from imagestackboxcoords: Upper left corners of boxescamera_rois: Camera ROIs for each box (if camera provided)
SMLMBoxer.has_cuda — Method
has_cuda() -> BoolCheck if CUDA is available. Wrapper for CUDA.functional().
SMLMBoxer.maxima2coords — Method
maxima2coords(imagestack)
Get coordinates of all non-zero pixels in input stack
Arguments
imagestack: Input image stack
Returns
coords: List of coords for each frame (always Float32)
SMLMBoxer.photons_to_dog_threshold — Method
photons_to_dog_threshold(min_photons, psf_sigma; effective_gain=1.0)Convert total photon count threshold to DoG filter intensity threshold in image units (ADU).
Arguments
min_photons: Minimum signal photons above background for detectionpsf_sigma: PSF sigma in pixelseffective_gain: Camera gain factor (QE × gain) to convert photons → ADU (default: 1.0)
Returns
minval: DoG filter intensity threshold in image units (ADU)
Physics
For a 2D Gaussian PSF with total photons N and sigma σpsf, the peak intensity is: Ipeak = N / (2π σ_psf²) [photons/pixel]
After convolution with the small Gaussian filter (sigmasmall = 1.0 × psfsigma), the effective sigma becomes: σeff = √(σpsf² + sigma_small²)
The peak after filtering is: Ifiltered = N / (2π σeff²) [photons/pixel]
The DoG response (small - large Gaussian) has a lower peak than the small Gaussian alone. For sigmalarge = 2×sigmasmall, the DoG peak is approximately 0.65× the small Gaussian peak.
For raw camera data in ADU, the threshold is scaled by effective_gain = QE × gain.
SMLMBoxer.pixels_to_microns — Method
pixels_to_microns(pixel_coords, camera::AbstractCamera)Convert pixel coordinates (row, col) to micron coordinates (x, y) using camera geometry.
Arguments
pixel_coords: N×2 matrix of (row, col) coordinatescamera: Camera object with pixeledgesx and pixeledgesy
Returns
- N×2 matrix of (x, y) coordinates in microns
SMLMBoxer.poll_gpu_nvml — Method
poll_gpu_nvml(required_bytes; timeout=30.0, poll=0.5, on_wait=nothing) -> (Bool, Int)Poll ALL GPUs via NVML until one has sufficient free memory and low contention. No CUDA context creation needed - safe under multi-process contention.
Arguments
required_bytes: Minimum bytes needed (1.5x safety margin applied internally)timeout: Maximum seconds to poll (default 30.0, use Inf for unlimited)poll: Seconds between checks (default 0.5)on_wait: Optional callback(elapsed, best_available, required) called each poll
Returns
(true, device_id)if a GPU became available,(false, -1)if timeout reached
Contention Detection
A GPU is considered contended when other processes are present AND either:
- Free memory is insufficient (< required × 1.5)
- Compute utilization exceeds 90%
When other processes are present but memory is sufficient and utilization is low, the GPU is still considered available.
SMLMBoxer.recommend_batch_size — Method
recommend_batch_size(height, width; backend=:auto, memory_fraction=0.8) -> IntReturn recommended maximum number of frames to load at once given memory constraints.
This helps users decide how much data to load before calling getboxes(). For very large datasets, loading data in chunks of this size ensures efficient processing without running out of memory.
Arguments
height::Int: Image height in pixelswidth::Int: Image width in pixelsbackend::Symbol: Compute backend :cpu, :gpu, or :auto (default: :auto)memory_fraction::Real: Fraction of free memory to use (default: 0.8)
Returns
- Maximum recommended number of frames to load at once
Memory Model
The processing pipeline requires approximately 6× the raw image size:
- Input imagestack
- Filtered stack (DoG output)
- Local maxima detection intermediates
- Coordinate arrays
- Box extraction workspace
- Broadcast temporaries
Example
using SMLMBoxer
# Check how many 512×512 frames to load at once
max_frames = recommend_batch_size(512, 512)
println("Load up to $max_frames frames at a time")
# Load and process in chunks
for chunk_start in 1:max_frames:total_frames
chunk_end = min(chunk_start + max_frames - 1, total_frames)
imagestack = load_frames(chunk_start:chunk_end)
roi_batch = getboxes(imagestack, camera; psf_sigma=0.13)
# ... process results
endSMLMBoxer.removeoverlap — Method
removeoverlap(coords, args)
Remove overlapping coords based on distance.
Arguments
coords: List of coordsargs: Parameters
Returns
coords: Coords with overlaps removed
SMLMBoxer.reshape_for_flux — Method
reshapeforflux(arr::AbstractArray)
Reshape array to have singleton dims for NNlib convolution.
Arguments
arr: Input array, must be 2D or 3D
Returns
- Reshaped array with added singleton dimensions
SMLMBoxer.select_backend — Method
select_backend(backend::Symbol, required_bytes;
auto_timeout=300.0, gpu_timeout=Inf, on_wait=nothing) -> (Symbol, Int)Select compute backend with two-layer GPU contention handling.
Layer 1 (NVML polling): Scans all GPUs via NVML without creating CUDA contexts. Polls through timeout with jittered backoff. First GPU with sufficient free memory and low contention wins. Safe under multi-process contention.
Layer 2 (runtime try/catch): Applied by caller for :auto mode. If CUDA errors occur during processing despite NVML pre-check, falls back to CPU.
Arguments
backend: :cpu, :gpu, or :autorequired_bytes: Estimated GPU memory needed for processingauto_timeout: Max wait for :auto mode before CPU fallback (default 300.0)gpu_timeout: Max wait for :gpu mode (default Inf - wait forever)on_wait: Optional callback(elapsed, available, required)
Returns
(backend::Symbol, device_id::Int)- selected backend and GPU device (0-based, -1 for CPU)
Behavior
:cpu- Returns (:cpu, -1) immediately:gpu- NVML poll for device, then CUDA waitforgpu_memory. Errors if unavailable/timeout:auto- NVML poll for device with timeout, falls back to (:cpu, -1) with warning
SMLMBoxer.variance_weighted_gaussian_kernel! — Method
variance_weighted_gaussian_kernel!(output, input, variance, sigma, winsize)KernelAbstractions kernel for variance-weighted Gaussian convolution. Implements SMITE-style inverse variance weighting.
This follows the same KernelAbstractions pattern used in GaussMLE.jl (kernel-abstract branch) for seamless CPU/GPU execution and consistent API across JuliaSMLM packages.
Arguments
output: Output array (nrows, ncols)input: Input array (nrows, ncols)variance: Variance map (nrows, ncols)sigma: Gaussian sigmawinsize: Window size (pixels)
Note
Same kernel code runs on CPU (via CPU() backend) or GPU (via CUDABackend()). Backend is selected automatically based on use_gpu parameter.
SMLMBoxer.variance_weighted_gaussian_kernel_batched! — Method
variance_weighted_gaussian_kernel_batched!(output, input, variance, sigma, winsize, nrows, ncols)Batched KernelAbstractions kernel for variance-weighted Gaussian convolution. Processes all frames in a single kernel launch via 3D ndrange=(nrows, ncols, nframes), eliminating per-frame launch overhead.
Arguments
output: Output array (nrows, ncols, 1, nframes)input: Input array (nrows, ncols, 1, nframes)variance: Variance map (nrows, ncols)sigma: Gaussian sigmawinsize: Window size (pixels)nrows: Number of rows (passed explicitly for bounds checking)ncols: Number of columns
SMLMBoxer.wait_for_gpu_memory — Method
wait_for_gpu_memory(required_bytes; timeout=30.0, poll=0.5, on_wait=nothing) -> BoolWait until current GPU device has sufficient available memory. Uses CUDA calls (requires active context). Used by :gpu mode after device selection.
Arguments
required_bytes: Minimum bytes needed (with safety margin applied internally)timeout: Maximum seconds to wait (default 30.0, use Inf for unlimited)poll: Seconds between checks (default 0.5)on_wait: Optional callback(elapsed, available, required) called each poll
Returns
trueif memory became available,falseif timeout reached