SciML Ecosystem Update: Improved QNDF Outperforms CVODE On SciMLBenchmarks

It has been a few months so there's a lot to update, especially in terms of performance improvements. But let's start with the announcement of how we have solved one of the longest running issues in the SciML universe.

QNDF Improvements Replaces CVODE in Recommendations and Defaults

For context, let's look at the previous DifferentialEquations.jl documentation on how to choose a stiff ODE solver:

For stiff problems at high tolerances (>1e-2?) it is recommended that you use Rosenbrock23 or TRBDF2...

Okay, it keeps on going and going. All of the methods it recommends are from OrdinaryDiffEq.jl though, which is all guided by the SciMLBenchmark results.

But at the very end of the recommendations you see:

For asymptotically large systems of ODEs (N>1000?) where f is very costly and the complex eigenvalues are minimal (low oscillations), in that case CVODEBDF will be the most efficient but requires Vector{Float64}. CVODEBDF will also do surprisingly well if the solution is smooth. However, this method can handle less stiffness than other methods and its Newton iterations may fail at low accuracy situations. Another good choice for this regime is lsoda.

In translation, "if you happen to be in the asymptotically big case, you still need to make use of Sundials".

That is no more. The king has fallen. Long live QNDF.

For awhile, OrdinaryDiffEq.jl had a few BDF implementations, (ABDF2, QBDF, QNDF, etc.) but none of them stacked up to Sundials.jl in this regime. And BDF-type methods are just better suited for this regime than Rosenbrock, SDIRK, and FIRK methods which OrdinaryDiffEq.jl had already heavily optimized beyond the previous software. This domain though just keep being elusive.

In 2017 Yingbo and I had many conversations on the right approach for this, and those ideas refined and converged over the years. Nordseick seems less numerically stable in comparison to divided differences when done correctly. Shampine's NDF seems like it should be an improvement over BDF, but there's no high performance implementations. Etc. Putting all of this together, Yingbo had pieces of these ideas implemented over 8 repos in the span of the 4 years moving towards this idea. In 2018, a GSoC student implemented a "from the textbook" NDF, the previous QNDF, which did admirably but didn't match CVODE.

But it kept marching on in many ways. OrdinaryDiffEq.jl started to use its own pure Julia implemented BLAS routines, having all sorts of auto-optimizing SIMD, etc. Finally, on May 22, 2021, Yingbo with a new summer student, Junpeng Gao (@JunpengGao233), cracked the code. They pair programed and ended up with this mega PR. The result? In the entire SciMLBenchmarks, QNDF matches or outperforms CVODE_BDF. It ranges from 20 stiff ODEs with POLLU:

to 1,000 stiff ODEs with BCR:

and on a difficult 8,000 equation HVAC model, we saw it taking larger steps, less Jacobians, and factorizing less:

Across the board we saw the dream had finally come together, with all of the pieces in handfuls of different repositories coming together to give a new great algorithm. Because of this, the automated algorithm selection of DifferentialEquations.jl v6.17 no longer chooses CVODE_BDF in any scenario. As CVODE_Adams is always outperformed in the benchmarks by either Tsit5, Vern7, Vern9, or VCABM, and ARKODE is outperformed by the same or KenCarp47 or TRBDF2, the benchmarks, recommendations, and automated algorithm choices no longer make use of Sundials.jl.

The importance of this cannot be understated. Working with a C++ code binding always had limitations in the special linear solvers that could be used, interactions with the JIT compiler, the ability to work with CUDA and AMDGPUs in a automatic way, the ability to differentiate through the solver. By having the dominant algorithm as pure Julia, all of these issues are now solved as it interacts with the compiler in a native way. This will be important as it allows automating the development of multi-GPU MPI codes (coming soon!), further expanding the reach of "asymptotically large systems".

Sundials is absolutely fantastic, it should be applauded that in a 300 solver implementation effort with over 100 developers involved in many commits, with over 100 repositories and entire funded teams and labs helping to improve various aspects over 5 years, CVODE_BDF stood strong as unbeatable in an important category even while we had repeatedly attempted to find ways to unseat it. But, sooner or later it had to be overcome, and that time has come. We are very thankful to the developers of Sundials as their decades of code and writings have been immensely helpful in unpacking this subject, and we hope to contribute to this area of ODEs as much as they have. But for now, Julia uses can rejoice as if they are solving systems of more than 1,000 ODEs, they will get a free speed boost, a feature boost, and a reduced dependency chain.

Is this the end of Sundials.jl? No

Seeing these advancements, people have asked me this question pertaining to maintenance of legacy code. Let it be stated directly:

SciML is committed to maintaining legacy solver libraries

In 2021 you can still use ODE.jl on Julia v1.6, the solvers built 3 years before the creation of DifferentialEquations.jl. They are locked in time giving the same answers at the same speed, allowing ease of code updating.

Sundials.jl will likely evolve similarly. We still note that CVODE_BDF is still worth trying in many cases, and it will likely have instances where it outperforms QNDF, as every method is optimized in different ways. To be clear, the reason for the recommendation change is because over a large array of equations we see a very general pattern of preferring QNDF, though there will always be exceptions to the rule and so CVODE_BDF will still have a place in the DifferentialEquations.jl arsenal, just a very reduced one from the position it held before.

That said, the advancement of QNDF means there will likely be no attempts to connect to Sundials CUDA support, but what is there will continue to be maintained. It's built with fancy LAPACK options, KLU integration, etc. and it will continue to stay that way. It will be updated and its tests will keep passing. But it will likely not capture as much developer focus.

There is one more thing to make note of. There is still one place in the entire DifferentialEquations.jl universe where a Sundials algorithm still reigns supreme as the recommended algorithm over any pure Julia method, and that is not in ODEs but DAEs. IDA is still unmatched in the pure Julia space. While it's routinely outperformed if you have a DAE written in mass matrix form of a certain size, giving rise to the recommendations of OrdinaryDiffEq.jl in many DAE cases, IDA is still unmatched when you have a fully-implicit or asymtopically large DAE definition. That will change. Junpeng Gao's summer project, under the mentorship of Yingbo, will be to similarly build a pure Julia BDF/NDF for fully-implicit DAEs into OrdinaryDiffEq.jl, aiming to similarly extensive benchmark and outperform IDA by the end of the summer using the same tricks as QNDF. When that is completed, we will finally flip the last yellow box on DifferentialEquations.jl and end a half-decade >100 developer project to essentially have 99% of cases most optimally use a pure Julia solver, optimized using the latest techniques from the literature and purpose-built for the latest hardware. And that will be a major congratulations to everyone who has been involved in this journey.

Symbolics.jl E-Graph-Based Floating Point Cycles Minimization

If you missed the release of the new JuliaSymbolics Organization, please check it out. It's a "daughter organization" with pieces spun out of SciML since the symbolic computing became a world of its own. The Symbolics.jl roadmap and release announcement of Symbolics.jl received a ton of fanfare. We thank the community for the warm welcoming!

Since that time, Symbolics.jl has been moving rapidly with a lot of help of Shashi Gowda (@shashi) and the new developer team that the new organization has brought in. To highlight this, check out the recently released publication on Symbolics.jl. It showcases how E-graph approaches of Metatheory.jl combined with symbolic simplification can give rise to a mechanism for automatically simplifying code into the form which takes the least floating point cycles. Symbolics.jl will be a fantastic symbolic-numerics tool for the SciML community. We note that ModelingToolkit.jl will stay in SciML, and is now built on top of the fully documented Symbolics.jl computer algebra system.

Speaking of ModelingToolkit.jl, SciML user Jonnie Diegelman gave an enlightening talk at SIAM CSE 2021 on how SciML tools are used by him to accelerate about 15,000x over Simulink. See the talk for yourself!

Major Improvements to Adjoint Sensitivities with Ill-Conditioning (Stiffness)

Work in collaboration with the Deng Lab at MIT has lead to advances in stiff neural ordinary differential equations. One of the main things that was shown is that you cannot just use a stiff ODE solver with the adjoints you had before and expect it to work: you need to use different adjoint techniques to do it stably and efficiently. We show precisely how to tackle some classically hard stiff ODEs and train neural networks on them using the latest techniques in DiffEqFlux.jl. This is an extremely difficult domain but we hope to keep sharing new advances as they are available.

Order of Magnitude Performance Improvement in Training Neural ODEs By Differentiating the Solver

SciML developers recently published a paper at ICML showing how to improve training performance of fast neural ODEs for machine learning applications (image processing, natural language processing, etc.) by using differentiating through the solver and regularizing based on internal solver heuristics. Importantly, this is not an approach that can be done without defining something equivalent to differentiating the solver, showcasing that the generalized physics-informed approach through AD has applications beyond SciML problems. We thank Avik Pal (@avik-pal) for doing a lot of the demonstrations in this work.

Major Performance Improvements to NeuralPDE

The recent work of Kirill Zubov mixed with CUDA.jl 3.0 has led to some pretty major performance improvements with NeuralPDE.jl, particularly with GPUs. A formal benchmarking setup will be built in the near future to track performance in a more concrete way.

SciML Summer Projects to Look Forward To

As summer is starting, it's time for the yearly update of all of the exciting announcements to expect soon. With 7 funded GSoC projects, 1 开源软件供应链点亮计划 - 暑期2021 project, and many funded members across top universities and technical computing companies dedicating their times to SciML, there is a huge list. But here's the quickest summary: