I work in the ICU monitoring field, on the R&D team of a company with live systems at dozens of hospitals and multiple FDA approvals. We use extended Kalman filters (i.e. non-blackbox "ML") to estimate certain lab values of patients that are highly indicative of them crashing, based on live data from whatever set of monitors they're hooked up to - and it's highly robust.
What the authors of this paper are doing is throwing stuff at the wall to see if it works, and publishing results. That's not necessarily a bad thing at all, but I say this to underline that their results are not at all reflective of SOTA capabilities, and they're not doing much exploration of prior art.
Calling EKFs "ML" is certainly a choice.
It is a reasonable choice, and especially with the quotes around it, completely understandable.
The distinction between statistical inference and machine learning is too blurry to police Kalman filters onto one side.
It's machine learning until you understand how it works, then it's just control theory and filters again.
Diffusion models are a happy middle ground. :-)
Is it less ML than linear regression?
If you want to draw the line between ML and not ML, I think you’ll have to put Kalman filters and linear regression on the non-ML side. You can put support vector machines and neural networks on the ML side.
In some sense the exact place you draw the distinction is arbitrary. You could try to characterize where the distinction is by saying that models with fewer parameters and lower complexity tend to be called “not ML”, and models with more parameters and higher complexity tend to be called “ML”.
Linear regression is literally the second lecture of the Stanford ML class. https://cs229.stanford.edu/
If you want to say "not neural networks" or not dnn or not llm, sure. But it's obviously machine learning
AI professor here.
Anything that can separate data apoints can rightly been seen as a "supervised machine learning classifier".
Todemystify the area, I literally introduce my intro to ML lecture by drawing a line on the board, give its equation y = 0.5 x on the backboard, reminding students that they already know this, and then explain how to use it as a spam filter by interpreting the points on either side of the line as good emails versus spam ones.
Linear regression is machine learning. At their core neural networks are just repeated linear regression + a non-linearity arranged in interesting ways. The key is that they can be trained to fit data using some optimization protocol (e.g. gradient descent). Just because linear regression has a closed form solution and is conceptually simple doesn't mean anything here.
EKFs work by 'learning' the covariance matrix on the fly, so I don't see why not?
Hence the quotes ;).
As an intuition on why many people see this as different.
PAC Learning is about compression, KF/EKF is more like Taylor expansion.
The specific types of PAC Learning that this paper covers has problems with a simplicity bias, and fairly low sensitivity.
While based on UHATs, this paper may provide some insights.
https://arxiv.org/abs/2502.02393
Obviously LLM and LRMs are the most studied, but even the recent posts on here from anthropic show that without a few high probability entries in the k-top results, confabulations are difficult for transformers.
Obviously there are PAC Learning methods that target anomaly detection, but they are very different than even EKF + Mc
You will note in this paper that even highly weighted features exhibited low sensitivity.
While the industry may find some pathological cases that make the approach usable, autograd and the need for parallelism make the application of this papers methods to tiny variations to multivariate problem ambitious.
They also only trained on medical data. Part of the reason the foundation models do so well is that they encode verifiers from a huge corpus that invalidates the traditional bias variance tradeoffs from the early 90's papers.
But they are still selecting from the needles and don't have access to the hay in the haystack.
The following paper is really not related except it shows how compression exacerbates that problem.
https://arxiv.org/abs/2205.06977
Chaitin's constant encoding the Halting problem, and that it is normal and uncomputable is the extreme top end of computability, but relates to the compression idea.
EKFs have access to the computable reals, and while non-linear, KF and EKFs can be thought of linearization of the approximations as a lens.
If the diagnostic indicators were both ergodic and Markovian, this paper's approach would probably be fairly reliable.
But these efforts are really about finding a many to one reduction that works.
I am skeptical about it in this case for PAC ML, but perhaps they will find a pathological case.
But the tradeoffs between statistical learning and expansive methods are quite different.
Obviously hype cycles drive efforts, I encourage you to look at this years AAAI conference report and see that you are not alone with the frustration on the single minded approach.
IMHO this paper is a net positive, showing that we are moving from a broad exploration to targeted applications.
But that is just my opinion.
Parameter estimation is ML now?
I think ML is in quotes for a reason—the reason is because the usage is not typical.
Why not? LLMs, vision models, and kalman filters all learn parameters based on data.
A linear regression model can be written and trained as a neural net, has a loss function, all of that. Most if not all ML problems can be formulated as modelling a probability distribution
Neural networks are not ML now?
EKF is a neural network!?