This is a huge release. We should take the time to thank every contributor to the JuliaDiffEq package ecosystem. A lot of this release focuses on performance features. The ability to use stiff ODE solvers on the GPU, with automated tooling for matrix-free Newton-Krylov, faster broadcast, better Jacobian re-use algorithms, memory use reduction, etc. All of these combined give some pretty massive performance boosts in the area of medium to large sized highly stiff ODE systems. In addition, numerous robustness fixes have enhanced the usability of these tools, along with a few new features like an implementation of extrapolation for ODEs and the release of ModelingToolkit.jl.
Let's start by summing up this release with an example.
Here's a nice showcase of DifferentialEquations.jl: Neural ODE with batching on the GPU (without internal data transfers) with high order adaptive implicit ODE solvers for stiff equations using matrix-free Newton-Krylov via preconditioned GMRES and trained using checkpointed adjoint equations. Few programs work directly with neural networks and allow for batching, few utilize GPUs, few have methods applicable to highly stiff equations, few allow for large stiff equations via matrix-free Newton-Krylov, and finally few have checkpointed adjoints. This is all done in a high level programming language. What does the code for this look like?
using OrdinaryDiffEq, Flux, DiffEqFlux, DiffEqOperators, CuArrays
x = Float32[2.; 0.]|>gpu
tspan = Float32.((0.0f0,25.0f0))
dudt = Chain(Dense(2,50,tanh),Dense(50,2))|>gpu
p = DiffEqFlux.destructure(dudt)
dudt_(du,u::TrackedArray,p,t) = du .= DiffEqFlux.restructure(dudt,p)(u)
dudt_(du,u::AbstractArray,p,t) = du .= Flux.data(DiffEqFlux.restructure(dudt,p)(u))
ff = ODEFunction(dudt_,jac_prototype = JacVecOperator(dudt_,x))
prob = ODEProblem(ff,x,tspan,p)
That is 10 lines of code, and we can continue to make it even more succinct.
Now, onto the release highlights.
Now not just the non-stiff ODE solvers but the stiff ODE solvers allow for the initial condition to be a GPUArray, with the internal methods not performing any indexing in order to allow for all computations to take place on the GPU without data transfers. This allows for expensive right-hand side calculations, like those in neural ODEs or PDE discretizations, to utilize GPU acceleration without worrying about whether the cost of data transfers will overtake the solver speed enhancements.
While the presence of broadcast throughout the solvers might worry one about performance...
Yingbo Ma (@YingboMa) implemented a fancy broadcast wrapper that allows for all sorts of information to be passed to the compiler in the differential equation solver's internals, making a bunch of no-aliasing and sizing assumptions that are normally not possible. These change the internals to all use a special
@.. which turns out to be faster than standard loops, and this is the magic that really enabled the GPU support to happen without performance regressions (and in fact, we got some speedups from this, close to 2x in some cases!)
One of the biggest performance-based features to be released is smarter linsolve defaults. If you are using dense arrays with a standard Julia build, OpenBLAS does not perform recursive LU factorizations which we found to be suboptimal by about 5x in some cases. Thus our default linear solver now automatically detects the BLAS installation and utilizes RecursiveFactorizations.jl to give this speedup for many standard stiff ODE cases. In addition, if you passed a sparse Jacobian for the
jac_prototype, the linear solver now automatically switches to a form that works for sparse Jacobians. If you use an
AbstractDiffEqOperator, the default linear solver automatically switches to a Krylov subspace method (GMRES) and utilizes the matrix-free operator directly. Banded matrices and Jacobians on the GPU are now automatically handled as well.
Of course, that's just the defaults, and most of this was possible before but now has just been made more accessible. In addition to these, the ability to easily switch to GMRES was added via
LinSolveGMRES. Just add
linsolve = LinSolveGMRES() to any native Julia algorithm with a swappable linear solver and it'll switch to using GMRES. In this you can pass options for preconditioners and tolerances as well. We will continue to integrate this better into our integrators as doing so will enhance the efficiency when solving large sparse systems.
GMRES, one does not need to construct the full Jacobian matrix. Instead, one can simply use the directional derivatives in the direction of
v in order to compute
J*v. This has now been put into an operator form via
JacVecOperator(dudt_,x), so now users can directly ask for this to occur using one line. It allows for the use of autodifferentiation or numerical differentiation to calculate the
One of the nichest but nicest new features is DEStats. If you do
sol.destats then you will see a load of information on how many steps were taken, how many
f calls were done, etc. giving a broad overview of the performance of the algorithm. Thanks to Kanav Gupta (@kanav99) and Yingbo Ma (@YingboMa) for really driving this feature since it has allowed for a lot of these optimizations to be more thoroughly investigated. You can expect DiffEq development to accelerate with this information!
One of the things which was noticed using DEStats was that the amount of Jacobians and inversions that were being calculated could be severly reduced. Yingbo Ma (@YingboMa) did just that, greatly increasing the performance of all implicit methods like
KenCarp4 showing cases in the 1000+ range where OrdinaryDiffEq's native methods outperformed Sundials CVODE_BDF. This still has plenty of room for improvement.
Samuel Isaacson (@isaacson) has been instrumental in improving DiffEqBiological.jl and its ability to handle large reaction networks. It can now parse the networks much faster and can build Jacobians which utilize sparse matrices. It pairs with his ParseRxns(???) library and has been a major source of large stiff test problems!
We now have working examples of partial neural differential equations, which are equations which have pre-specified portions that are known while others are learnable neural networks. These also allow for batched data and GPU acceleration. Not much else to say except let your neural diffeqs go wild!
Kanav Gupta (@kanav99) and Hendrik Ranocha (@ranocha) did amazing jobs at doing memory optimizations of low-memory Runge-Kutta methods for hyperbolic or advection-dominated PDEs. Essentially these methods have a minimal number of registers which are theoretically required for the method. Kanav added some tricks to the implementation (using a fun
+= overload idea) and Henrick added the
alias_u0 argument to allow for using the passed in initial condition as one of the registers. Unit tests confirm that our implementations achieve the minimum possible number of registers, allowing for large PDE discretizations to make use of DifferentialEquations.jl without loss of memory efficiency. We hope to see this in use in some large-scale simulation software!
ContinuousCallback implementation now has increased robustness in double event detection, using a new strategy. Try to break it.
New contributor Konstantin Althaus (@AlthausKonstantin) implemented midpoint extrapolation methods for ODEs using Barycentric formulas and different a daptivity behaviors. We will be investigating these methods for their parallelizability via multithreading in the context of stiff and non-stiff ODEs.
ModelingToolkit.jl has now gotten some form of a stable release. A lot of credit goes to Harrison Grodin (@HarrisonGrodin). While it has already been out there and found quite a bit of use, it has really picked up steam over the last year as a modeling framework suitable for the flexibility DifferentialEquations.jl. We hope to continue its development and add features like event handling to its IR.
While we are phasing out Sundials from our standard DifferentialEquations.jl practice, the Sundials.jl continues to improve as we add more features to benchmark against. Sundials' J*v interface has now been exposed, so adding a DiffEqOperator to the
jac_prototype will work with Sundials.
DEStats is hooked up to Sundials, and now you can pass preconditioners to its internal Newton-Krylov methods.
Improved nonlinear solvers for stiff SDE handling
More adaptive methods for SDEs
Better boundary condition handling in DiffEqOperators.jl
More native implicit ODE (DAE) solvers
Adaptivity in the MIRK BVP solvers
LSODA integrator interface