DifferentialEquations.jl v6.7.0: GPU-based Ensembles and Automatic Sparsity

Let's just jump right in! This time we have a bunch of new GPU tools and sparsity handling.

(Breaking with Deprecations) DiffEqGPU: GPU-based Ensemble Simulations

The MonteCarloProblem interface received an overhaul. First of all, the interface has been renamed to Ensemble. The changes are:

MonteCarloProblem -> EnsembleProblem
MonteCarloSolution -> EnsembleSolution
MonteCarloSummary -> EnsembleSummary
num_monte -> trajectories

Specifying parallel_type has been deprecated and a deprecation warning is thrown mentioning this. So don't worry: your code will work but will give warnings as to what to change. Additionally, the DiffEqMonteCarlo.jl package is no longer necessary for any of this functionality.

Now, solve of a EnsembleProblem works on the same dispatch mechanism as the rest of DiffEq, which looks like solve(ensembleprob,Tsit5(),EnsembleThreads(),trajectories=n) where the third argument is an ensembling algorithm to specify the threading-based form. Code with the deprecation warning will work until the release of DiffEq 7.0, at which time the alternative path will be removed.

See the updated ensembles page for more details

The change to dispatch was done for a reason: it allows us to build new libraries specifically for sophisticated handling of many trajectory ODE solves without introducing massive new dependencies to the standard DifferentialEquations.jl user. However, many people might be interested in the first project to make use of this: DiffEqGPU.jl. DiffEqGPU.jl lets you define a problem, like an ODEProblem, and then solve thousands of trajectories in parallel using your GPU. The syntax looks like:

monteprob = EnsembleProblem(my_ode_prob)
solve(monteprob,Tsit5(),EnsembleGPUArray(),num_monte=100_000)

and it will return 100,000 ODE solves. We have seen between a 12x and 90x speedup depending on the GPU of the test systems, meaning that this can be a massive improvement for parameter space exploration on smaller systems of ODEs. Currently there are a few limitations of this method, including that events cannot be used, but those will be solved shortly. Additional methods for GPU-based parameter parallelism are coming soon to the same interface. Also planned are GPU-accelerated multi-level Monte Carlo methods for faster weak convergence of SDEs.

Again, this is utilizing compilation tricks to take the user-defined f and recompile it on the fly to a .ptx kernel, and generating kernel-optimized array-based formulations of the existing ODE solvers

Automated Sparsity Detection

Shashi Gowda (@shashigowda) implemented a sparsity detection algorithm which digs through user-defined Julia functions with Cassette.jl to find out what inputs influence the output. The basic version checks at a given trace, but a more sophisticated version, which we are calling Concolic Combinatoric Analysis, looks at all possible branch choices and utilizes this to conclusively build a Jacobian whose sparsity pattern captures the possible variable interactions.

The nice part is that this functionality is very straightforward to use. For example, let's say we had the following function:

function f(dx,x,p,t)
  for i in 2:length(x)-1
    dx[i] = x[i-1] - 2x[i] + x[i+1]
  end
  dx[1] = -2x[1] + x[2]
  dx[end] = x[end-1] - 2x[end]
  nothing
end

If we want to find out the sparsity pattern of f, we would simply call:

sparsity_pattern = sparsity!(f,output,input,p,t)

where output is an array like dx, input is an array like x, p are possible parameters, and t is a possible t. The function will then be analyzed and sparsity_pattern will return a Sparsity type of I and J which denotes the terms in the Jacobian with non-zero elements. By doing sparse(sparsity_pattern) we can turn this into a SparseMatrixCSC with the correct sparsity pattern.

This functionality highlights the power of Julia since there is no way to conclusively determine the Jacobian of an arbitrary program f using numerical techniques, since all sorts of scenarios lead to "fake zeros" (cancelation, not checking a place in parameter space where a branch is false, etc.). However, by directly utilizing Julia's compiler and the SSA provided by a Julia function definition we can perform a non-standard interpretation that tells all of the possible numerical ways the program can act, thus conclusively determining all of the possible variable interactions.

Of course, you can still specify analytical Jacobians and sparsity patterns if you want, but if you're lazy... :)

See SparsityDetection.jl's README for more details.

GPU Offloading in Implicit DE Solving

We are pleased to announce the LinSolveGPUFactorize option which allows for automatic offloading of linear solves to the GPU. For a problem with a large enough dense Jacobian, using linsolve=LinSolveGPUFactorize() will now automatically perform the factorization and back-substitution on the GPU, allowing for better scaling. For example:

using CuArrays
Rodas5(linsolve = LinSolveGPUFactorize())

This simply requires a working installation of CuArrays.jl. See the linear solver documentation for more details.

Experimental: Automated Accelerator (GPU) Offloading

We have been dabbling in allowing automated accelerator (GPU, multithreading, distributed, TPU, etc.) offloading when the right hardware is detected and the problem size is sufficient to success a possible speedup. A working implementation exists as a PR for DiffEqBase which would allow automated acceleration of linear solves in implicit DE solving. However, this somewhat invasive of a default, and very architecture dependent, so it is unlikely we will be releasing this soon. However, we are investigating this concept in more detail in the AutoOffload.jl. If you're interested in Julia-wide automatic acceleration, please take a look at the repo and help us get something going!

A Complete Set of Iterative Solver Routines for Implicit DEs

Previous releases had only a pre-built GMRES implementation. However, as detailed on the linear solver page, we now have an array of iterative solvers readily available, including:

LinSolveGMRES – GMRES
LinSolveCG – CG (Conjugate Gradient)
LinSolveBiCGStabl – BiCGStabl Stabilized Bi-Conjugate Gradient
LinSolveChebyshev – Chebyshev
LinSolveMINRES – MINRES

These are all compatible with matrix-free implementations of a AbstractDiffEqOperator.

Exponential integrator improvements

Thanks to Yingbo Ma (@YingboMa), the exprb methods have been greatly improved.

Next Directions

Our current development is very much driven by the ongoing GSoC/JSoC projects, which is a good thing because they are outputting some really amazing results!

Here's some things to look forward to:

Automated matrix-free finite difference PDE operators
Surrogate optimization
Jacobian reuse efficiency in Rosenbrock-W methods
Native Julia fully implicit ODE (DAE) solving in OrdinaryDiffEq.jl
High Strong Order Methods for Non-Commutative Noise SDEs
GPU-Optimized Sparse (Colored) Automatic Differentiation
Parallelized Implicit Extrapolation of ODEs