Linear algebra operations form the computational backbone of scientific computing, yet choosing the optimal algorithm for a given problem and hardware configuration remains a persistent challenge. Today, we're excited to introduce LinearSolveAutotune.jl, a new community-driven autotuning system that automatically benchmarks and selects the best linear solver algorithms for your specific hardware configuration.
LinearSolve.jl provides a unified interface to over 20 different linear solving algorithms, from generic Julia implementations to highly optimized vendor libraries like Intel MKL, Apple Accelerate, and GPU-accelerated solvers. Each algorithm excels in different scenarios:
Small matrices (< 100×100): Pure Julia implementations like RFLUFactorization
often outperform BLAS due to lower overhead
Medium matrices (100-1000×1000): Vendor-optimized libraries like Apple Accelerate and MKL shine
Large matrices (> 1000×1000): GPU acceleration through Metal or CUDA becomes dominant
Sparse matrices: Specialized algorithms like KLU and UMFPACK are essential
The optimal choice depends on matrix size, sparsity, numerical type, and critically, your specific hardware. An M2 MacBook Pro has very different performance characteristics than an AMD Threadripper workstation with an NVIDIA GPU.
LinearSolveAutotune addresses this challenge through a unique approach: collaborative benchmarking with optional telemetry sharing. Here's how it works:
Local Benchmarking
Run comprehensive benchmarks on your machine with a simple command:
using LinearSolve, LinearSolveAutotune
# Run benchmarks across different matrix sizes and types
results = autotune_setup()
# View performance summary
display(results)
# Generate performance visualization
plot(results)
The system automatically:
Tests algorithms across matrix sizes from 5×5 to 15,000×15,000
Benchmarks Float32, Float64, Complex, and BigFloat types
Detects available hardware acceleration (GPUs, vendor libraries)
Measures performance in GFLOPS for easy comparison
Smart Recommendations
Based on your benchmarks, LinearSolveAutotune generates tailored recommendations for each scenario:
# Example output from an Apple M2 system:
# ┌─────────────┬──────────────────────────────┐
# │ Size Range │ Best Algorithm │
# ├─────────────┼──────────────────────────────┤
# │ tiny (5-20) │ RFLUFactorization │
# │ small │ RFLUFactorization │
# │ medium │ AppleAccelerateLUFactorization │
# │ large │ AppleAccelerateLUFactorization │
# │ huge │ MetalLUFactorization │
# └─────────────┴──────────────────────────────┘
Community Telemetry (Optional)
The real innovation lies in opt-in community telemetry. By sharing your benchmark results, you contribute to a growing database that helps improve algorithm selection heuristics for everyone:
# Share your results with the community
share_results(results)
This creates an automatic GitHub comment on our results collection issue with:
Your hardware configuration (CPU, GPU, available libraries)
Performance measurements across all algorithms
System-specific recommendations
Beautiful performance visualizations
Privacy First: The telemetry system:
Only shares benchmark performance data
Never collects personal information
Requires explicit opt-in via share_results()
Uses GitHub authentication for transparency
All shared data is publicly visible on GitHub
The community has already contributed benchmarks from diverse hardware configurations, revealing fascinating insights:
On Apple M2 processors, we discovered that Apple's Accelerate framework delivers exceptional performance for medium-sized matrices, achieving 750+ GFLOPS for large Float32 matrices. However, for tiny matrices (< 20×20), the pure Julia RFLUFactorization
is 3-5x faster due to lower call overhead.
Metal acceleration on Apple Silicon shows interesting threshold behavior:
Below 500×500: CPU algorithms dominate
500-5000×5000: Competitive performance
Above 5000×5000: GPU delivers 2-3x speedup, reaching over 1 TFLOP
For complex arithmetic, we found that specialized algorithms matter even more:
LUFactorization
outperforms vendor libraries by 2x for ComplexF32
Apple Accelerate struggles with complex numbers, making pure Julia implementations preferable
The beauty of LinearSolve.jl's autotuning system is that you don't need to manually specify algorithms. The benchmark results from the community directly improve the default heuristics, so you simply use:
using LinearSolve
# Create your linear problem
A = rand(100, 100)
b = rand(100)
prob = LinearProblem(A, b)
# Just solve - LinearSolve automatically picks the best algorithm!
sol = solve(prob) # Uses optimized heuristics based on community benchmarks
The autotuning results you and others share help LinearSolve.jl make intelligent decisions about:
When to use pure Julia implementations vs vendor libraries
Matrix size thresholds for GPU acceleration
Special handling for complex numbers and sparse matrices
By contributing your benchmark results with share_results()
, you're directly improving the default algorithm selection for everyone. The more diverse hardware configurations we collect, the smarter the automatic selection becomes.
LinearSolveAutotune generates comprehensive performance visualizations showing:
Algorithm comparison plots: GFLOPS vs matrix size for each algorithm
Heatmaps: Performance across different size ranges and types
System information: Hardware details and available acceleration
Here's an example from recent community submissions showing the dramatic performance differences across algorithms:
Metal GPU vs CPU Performance (Apple M2)
┌────────────────────────────────────────────┐
│ 1000 ┤ ▁▁▁▁▁▂▂▃▄▅▆▇█ Metal GPU │
│ │ │
│ 500 ┤ ▅▆▇██████ Apple Accelerate │
│ │ ▂▄████▅▃▂▁ │
│ 100 ┤ ▆████▃▁ Generic LU │
│ │████▁ │
│ 10 ┤██ RF Factorization │
│ │ │
│ 1 └────────────────────────────────────┘
│ 10 100 1000 10000 │
│ Matrix Size (n×n) │
└────────────────────────────────────────────┘
The telemetry system is designed with transparency and user control at its core:
Local Execution: All benchmarks run locally on your machine
Data Generation: Results are formatted as markdown tables and plots
Authentication: Uses GitHub OAuth for secure, transparent submission
Public Sharing: Creates a comment on a public GitHub issue
Community Analysis: Results feed into improved algorithm selection heuristics
The collected data helps us:
Identify performance patterns across different hardware
Improve default algorithm selection
Discover optimization opportunities
Guide future development priorities
Ready to optimize your linear algebra performance? Here's how to get started:
# Install the packages
using Pkg
Pkg.add(["LinearSolve", "LinearSolveAutotune"])
# Run comprehensive benchmarks
using LinearSolve, LinearSolveAutotune
results = autotune_setup()
# Analyze your results
display(results)
plot(results)
# Optional: Share with the community
share_results(results)
LinearSolveAutotune represents a new paradigm in scientific computing: community-driven performance optimization. By aggregating performance data across diverse hardware configurations, we can:
Build better default heuristics that work well for everyone
Identify performance regressions quickly
Guide optimization efforts where they matter most
Create hardware-specific algorithm recommendations
We envision expanding this approach to other SciML packages, creating a comprehensive performance knowledge base that benefits the entire Julia scientific computing ecosystem.
The success of LinearSolveAutotune depends on community participation. Whether you're running on a laptop, workstation, or HPC cluster, your benchmarks provide valuable data that helps improve performance for everyone.
Visit our results collection issue to see community submissions, and consider running the autotuning suite on your hardware. Together, we're building a faster, smarter linear algebra ecosystem for Julia.
LinearSolveAutotune was developed as part of the SciML ecosystem with contributions from the Julia community. Special thanks to all early adopters who have shared their benchmark results and helped refine the system.
For more information, see the LinearSolve.jl documentation and join the discussion on Julia Discourse.