Let’s just jump right in! This time we have a bunch of new GPU tools and sparsity handling.
(Breaking with Deprecations) DiffEqGPU: GPU-based Ensemble Simulations
MonteCarloProblem interface received an overhaul. First of all, the
interface has been renamed to
Ensemble. The changes are:
parallel_type has been deprecated and a deprecation warning is
thrown mentioning this. So don’t worry: your code will work but will give
warnings as to what to change. Additionally, the DiffEqMonteCarlo.jl package
is no longer necessary for any of this functionality.
solve of a
EnsembleProblem works on the same dispatch mechanism as the
rest of DiffEq, which looks like
where the third argument is an ensembling algorithm to specify the
threading-based form. Code with the deprecation warning will work until the
release of DiffEq 7.0, at which time the alternative path will be removed.
The change to dispatch was done for a reason: it allows us to build new libraries
specifically for sophisticated handling of many trajectory ODE solves without
introducing massive new dependencies to the standard DifferentialEquations.jl
user. However, many people might be interested in the first project to make
use of this: DiffEqGPU.jl.
DiffEqGPU.jl lets you define a problem, like an
ODEProblem, and then solve
thousands of trajectories in parallel using your GPU. The syntax looks like:
monteprob = EnsembleProblem(my_ode_prob) solve(monteprob,Tsit5(),EnsembleGPUArray(),num_monte=100_000)
and it will return 100,000 ODE solves. We have seen between a 12x and 90x speedup depending on the GPU of the test systems, meaning that this can be a massive improvement for parameter space exploration on smaller systems of ODEs. Currently there are a few limitations of this method, including that events cannot be used, but those will be solved shortly. Additional methods for GPU-based parameter parallelism are coming soon to the same interface. Also planned are GPU-accelerated multi-level Monte Carlo methods for faster weak convergence of SDEs.
Again, this is utilizing compilation tricks to take the user-defined
and recompile it on the fly to a
.ptx kernel, and generating kernel-optimized
array-based formulations of the existing ODE solvers
Automated Sparsity Detection
Shashi Gowda (@shashigowda) implemented a sparsity detection algorithm which digs through user-defined Julia functions with Cassette.jl to find out what inputs influence the output. The basic version checks at a given trace, but a more sophisticated version, which we are calling Concolic Combinatoric Analysis, looks at all possible branch choices and utilizes this to conclusively build a Jacobian whose sparsity pattern captures the possible variable interactions.
The nice part is that this functionality is very straightforward to use. For example, let’s say we had the following function:
function f(dx,x,p,t) for i in 2:length(x)-1 dx[i] = x[i-1] - 2x[i] + x[i+1] end dx = -2x + x dx[end] = x[end-1] - 2x[end] nothing end
If we want to find out the sparsity pattern of
f, we would simply call:
sparsity_pattern = sparsity!(f,output,input,p,t)
output is an array like
input is an array like
are possible parameters, and
t is a possible
t. The function will then
be analyzed and
sparsity_pattern will return a
Sparsity type of
which denotes the terms in the Jacobian with non-zero elements. By doing
sparse(sparsity_pattern) we can turn this into a
SparseMatrixCSC with the
correct sparsity pattern.
This functionality highlights the power of Julia since there is no way to
conclusively determine the Jacobian of an arbitrary program
f using numerical
techniques, since all sorts of scenarios lead to “fake zeros” (cancelation,
not checking a place in parameter space where a branch is false, etc.). However,
by directly utilizing Julia’s compiler and the SSA provided by a Julia function
definition we can perform a non-standard interpretation that tells all of the
possible numerical ways the program can act, thus conclusively determining
all of the possible variable interactions.
Of course, you can still specify analytical Jacobians and sparsity patterns if you want, but if you’re lazy… :)
GPU Offloading in Implicit DE Solving
We are pleased to announce the
LinSolveGPUFactorize option which allows for
automatic offloading of linear solves to the GPU. For a problem with a large
enough dense Jacobian, using
linsolve=LinSolveGPUFactorize() will now
automatically perform the factorization and back-substitution on the GPU,
allowing for better scaling. For example:
using CuArrays Rodas5(linsolve = LinSolveGPUFactorize())
This simply requires a working installation of CuArrays.jl. See the linear solver documentation for more details.
Experimental: Automated Accelerator (GPU) Offloading
We have been dabbling in allowing automated accelerator (GPU, multithreading, distributed, TPU, etc.) offloading when the right hardware is detected and the problem size is sufficient to success a possible speedup. A working implementation exists as a PR for DiffEqBase which would allow automated acceleration of linear solves in implicit DE solving. However, this somewhat invasive of a default, and very architecture dependent, so it is unlikely we will be releasing this soon. However, we are investigating this concept in more detail in the AutoOffload.jl. If you’re interested in Julia-wide automatic acceleration, please take a look at the repo and help us get something going!
A Complete Set of Iterative Solver Routines for Implicit DEs
Previous releases had only a pre-built GMRES implementation. However, as detailed on the linear solver page, we now have an array of iterative solvers readily available, including:
- LinSolveGMRES – GMRES
- LinSolveCG – CG (Conjugate Gradient)
- LinSolveBiCGStabl – BiCGStabl Stabilized Bi-Conjugate Gradient
- LinSolveChebyshev – Chebyshev
- LinSolveMINRES – MINRES
These are all compatible with matrix-free implementations of a
Exponential integrator improvements
Thanks to Yingbo Ma (@YingboMa), the exprb methods have been greatly improved.
Our current development is very much driven by the ongoing GSoC/JSoC projects, which is a good thing because they are outputting some really amazing results!
Here’s some things to look forward to:
- Automated matrix-free finite difference PDE operators
- Surrogate optimization
- Jacobian reuse efficiency in Rosenbrock-W methods
- Native Julia fully implicit ODE (DAE) solving in OrdinaryDiffEq.jl
- High Strong Order Methods for Non-Commutative Noise SDEs
- GPU-Optimized Sparse (Colored) Automatic Differentiation
- Parallelized Implicit Extrapolation of ODEs