DifferentialEquations.jl v6.4.0: Full GPU ODE, Performance, ModelingToolkit

This is a huge release. We should take the time to thank every contributor to the JuliaDiffEq package ecosystem. A lot of this release focuses on performance features. The ability to use stiff ODE solvers on the GPU, with automated tooling for matrix-free Newton-Krylov, faster broadcast, better Jacobian re-use algorithms, memory use reduction, etc. All of these combined give some pretty massive performance boosts in the area of medium to large sized highly stiff ODE systems. In addition, numerous robustness fixes have enhanced the usability of these tools, along with a few new features like an implementation of extrapolation for ODEs and the release of ModelingToolkit.jl.

Let's start by summing up this release with an example.

Comprehensive Example

Here's a nice showcase of DifferentialEquations.jl: Neural ODE with batching on the GPU (without internal data transfers) with high order adaptive implicit ODE solvers for stiff equations using matrix-free Newton-Krylov via preconditioned GMRES and trained using checkpointed adjoint equations. Few programs work directly with neural networks and allow for batching, few utilize GPUs, few have methods applicable to highly stiff equations, few allow for large stiff equations via matrix-free Newton-Krylov, and finally few have checkpointed adjoints. This is all done in a high level programming language. What does the code for this look like?

using OrdinaryDiffEq, Flux, DiffEqFlux, DiffEqOperators, CuArrays
x = Float32[2.; 0.]|>gpu
tspan = Float32.((0.0f0,25.0f0))
dudt = Chain(Dense(2,50,tanh),Dense(50,2))|>gpu
p = DiffEqFlux.destructure(dudt)
dudt_(du,u::TrackedArray,p,t) =  du .= DiffEqFlux.restructure(dudt,p)(u)
dudt_(du,u::AbstractArray,p,t) = du .= Flux.data(DiffEqFlux.restructure(dudt,p)(u))
ff = ODEFunction(dudt_,jac_prototype = JacVecOperator(dudt_,x))
prob = ODEProblem(ff,x,tspan,p)
diffeq_adjoint(p,prob,KenCarp4(linsolve=LinSolveGMRES());u0=x,
               saveat=0.0:0.1:25.0,backsolve=false)

That is 10 lines of code, and we can continue to make it even more succinct.

Now, onto the release highlights.

Full GPU Support in ODE Solvers

Now not just the non-stiff ODE solvers but the stiff ODE solvers allow for the initial condition to be a GPUArray, with the internal methods not performing any indexing in order to allow for all computations to take place on the GPU without data transfers. This allows for expensive right-hand side calculations, like those in neural ODEs or PDE discretizations, to utilize GPU acceleration without worrying about whether the cost of data transfers will overtake the solver speed enhancements.

While the presence of broadcast throughout the solvers might worry one about performance...

Fast DiffEq-Specific Broadcast

Yingbo Ma (@YingboMa) implemented a fancy broadcast wrapper that allows for all sorts of information to be passed to the compiler in the differential equation solver's internals, making a bunch of no-aliasing and sizing assumptions that are normally not possible. These change the internals to all use a special @.. which turns out to be faster than standard loops, and this is the magic that really enabled the GPU support to happen without performance regressions (and in fact, we got some speedups from this, close to 2x in some cases!)

Smart linsolve defaults and LinSolveGMRES

One of the biggest performance-based features to be released is smarter linsolve defaults. If you are using dense arrays with a standard Julia build, OpenBLAS does not perform recursive LU factorizations which we found to be suboptimal by about 5x in some cases. Thus our default linear solver now automatically detects the BLAS installation and utilizes RecursiveFactorizations.jl to give this speedup for many standard stiff ODE cases. In addition, if you passed a sparse Jacobian for the jac_prototype, the linear solver now automatically switches to a form that works for sparse Jacobians. If you use an AbstractDiffEqOperator, the default linear solver automatically switches to a Krylov subspace method (GMRES) and utilizes the matrix-free operator directly. Banded matrices and Jacobians on the GPU are now automatically handled as well.

Of course, that's just the defaults, and most of this was possible before but now has just been made more accessible. In addition to these, the ability to easily switch to GMRES was added via LinSolveGMRES. Just add linsolve = LinSolveGMRES() to any native Julia algorithm with a swappable linear solver and it'll switch to using GMRES. In this you can pass options for preconditioners and tolerances as well. We will continue to integrate this better into our integrators as doing so will enhance the efficiency when solving large sparse systems.

Automated J*v Products via Autodifferentiation

When using GMRES, one does not need to construct the full Jacobian matrix. Instead, one can simply use the directional derivatives in the direction of v in order to compute J*v. This has now been put into an operator form via JacVecOperator(dudt_,x), so now users can directly ask for this to occur using one line. It allows for the use of autodifferentiation or numerical differentiation to calculate the J*v.

DEStats

One of the nichest but nicest new features is DEStats. If you do sol.destats then you will see a load of information on how many steps were taken, how many f calls were done, etc. giving a broad overview of the performance of the algorithm. Thanks to Kanav Gupta (@kanav99) and Yingbo Ma (@YingboMa) for really driving this feature since it has allowed for a lot of these optimizations to be more thoroughly investigated. You can expect DiffEq development to accelerate with this information!

Improved Jacobian Reuse

One of the things which was noticed using DEStats was that the amount of Jacobians and inversions that were being calculated could be severly reduced. Yingbo Ma (@YingboMa) did just that, greatly increasing the performance of all implicit methods like KenCarp4 showing cases in the 1000+ range where OrdinaryDiffEq's native methods outperformed Sundials CVODE_BDF. This still has plenty of room for improvement.

DiffEqBiological performance improvements for large networks (speed and sparsity)

Samuel Isaacson (@isaacson) has been instrumental in improving DiffEqBiological.jl and its ability to handle large reaction networks. It can now parse the networks much faster and can build Jacobians which utilize sparse matrices. It pairs with his ParseRxns(???) library and has been a major source of large stiff test problems!

Partial Neural ODEs, Batching and GPU Fixes

We now have working examples of partial neural differential equations, which are equations which have pre-specified portions that are known while others are learnable neural networks. These also allow for batched data and GPU acceleration. Not much else to say except let your neural diffeqs go wild!

Low Memory RK Optimality and Alias_u0

Kanav Gupta (@kanav99) and Hendrik Ranocha (@ranocha) did amazing jobs at doing memory optimizations of low-memory Runge-Kutta methods for hyperbolic or advection-dominated PDEs. Essentially these methods have a minimal number of registers which are theoretically required for the method. Kanav added some tricks to the implementation (using a fun = -> += overload idea) and Henrick added the alias_u0 argument to allow for using the passed in initial condition as one of the registers. Unit tests confirm that our implementations achieve the minimum possible number of registers, allowing for large PDE discretizations to make use of DifferentialEquations.jl without loss of memory efficiency. We hope to see this in use in some large-scale simulation software!

More Robust Callbacks

Our ContinuousCallback implementation now has increased robustness in double event detection, using a new strategy. Try to break it.

GBS Extrapolation

New contributor Konstantin Althaus (@AlthausKonstantin) implemented midpoint extrapolation methods for ODEs using Barycentric formulas and different a daptivity behaviors. We will be investigating these methods for their parallelizability via multithreading in the context of stiff and non-stiff ODEs.

ModelingToolkit.jl Release

ModelingToolkit.jl has now gotten some form of a stable release. A lot of credit goes to Harrison Grodin (@HarrisonGrodin). While it has already been out there and found quite a bit of use, it has really picked up steam over the last year as a modeling framework suitable for the flexibility DifferentialEquations.jl. We hope to continue its development and add features like event handling to its IR.

SUNDIALS J*v interface, stats, and preconditioners

While we are phasing out Sundials from our standard DifferentialEquations.jl practice, the Sundials.jl continues to improve as we add more features to benchmark against. Sundials' J*v interface has now been exposed, so adding a DiffEqOperator to the jac_prototype will work with Sundials. DEStats is hooked up to Sundials, and now you can pass preconditioners to its internal Newton-Krylov methods.

Next Directions