I'm not sure about the 1.0/2.0/3.0 classification, but it did lead me to think about LLMs as a programming paradigm: we've had imperative & declarative, procedural & functional languages, maybe we'll come to view deterministic vs. probabilistic (LLMs) similarly.
def __main__:
You are a calculator. Given an input expression, you compute the result and print it to stdout, exiting 0.
Should you be unable to do this, you print an explanation to stderr and exit 1.
(and then, perhaps, a bunch of 'DO NOT express amusement when the result is 5318008', etc.)
Why bother using human language to communicate with a computer? You interact with a computer using a programming language—code—which is more precise and effective. Specifically: → In 1.0, you communicate with computers using compiled code. → In 2.0, you communicate with compilers using high-level programming languages. → In 3.0, you interact with LLMs using prompts, which arguably should not be in natural human language. Nonetheless, you should communicate with AGIs using human language, just as you would with other human beings.
Why bother using higher-level programming languages to communicate with a computer? You interact with a computer using assembly - raw bit shifting and memory addresses - which is more precise and effective.
Using assembly is not really more precise in terms of solving the problem. You can definitely make an argument that using a higher level language is equally if not more precise. Especially since your low level assembly will be limited to which architectures it can run on, you can state that the c++ that generates that assembly is "more precisely defining a calculator program".
I agree with your general point, but C++ isn't a great example, as it is so underspecified. Imagine as part of our calculator we wrote the function:
What is the result of add(32767, 1)? C++ does not presume to define just one meaning for such an expression. Or even any meaning at all. What to do when the program tries to add ints that large is left to the personal conscience of compiler authors.Precision is not boolean (present or absent/0 or 1). There may be many numbers between 0 and 1. Compared to human languages, programming languages are much more precise that makes the results much more predictable in practice.
I can imagine OS being written in C++ and working most of the time. I don't think you can replace Linux written in C with any number of LLM prompts.
LLM can be a [bad so far] programmer but a prompt is not a program.
Using code may not be more precise in terms of solving a problem than english. Take the NHS. With better AI, saying build a good IT system for the NHS may have worked better than this stuff https://www.theguardian.com/society/2013/sep/18/nhs-records-...
You can express dang near anything you wish to express in assembly in a higher order programming language because it is designed to allow that level of clarity and specificity. In fact most have compile time checks to stop you if you have not properly specified certain behavior.
The English language is not comparable. It is a language designed to capture all the ambiguity of human thought, and as such is not appropriate for computation.
TLDR: There's a reason why programmers still exist after the dawn of 4GL / 'no code' frameworks. Otherwise we'd all be product managers typing specs into JIRA and getting fully formed applications out the other side.
If this is what it comes to, it would explain the many, many software malfunctions in Star Trek. If everything is an LLM/LRM (or whatever super advanced version they have in the 23rd century) then everything can evolve into weird emergent behaviours.
stares at every weird holo-deck episode
[dead]
LLMs are not inherently indeterministic. Batching, temperature, and other things make them appear so when run by big providers but a locally-run LLM model at zero temperature will always produce the same output given the same input.
That's an improvement, they are still "chaotic" though in that small changes in input can change the output unpredictably strong
Yes, this paper says exactly what you talked about: https://arxiv.org/abs/2404.01332
That assumes they were implemented with deterministic operators, which isn't the default assumption when using neural network libs on GPUs. Imagine random seeds, cublas optimizations - like you can configure all these things, but I wouldn't assume it, esp in GPU-optimized OSS..
Why does this remind me of COBOL.
'cos COBOL was designed to be human readable (writable ?).
Output "1" if the program halts; "0" if it doesn't.
funnily enough, you can give the LLM the code and ask it if the function will halt, and for some cases of input, it is able to say that the program does/does not halt.
The halting problem is about being able to answer this question in full generality, though. Being able to answer the question for specific cases is already feasible and always was.
You know, the more I think about it, the more I like this model.
What we have today with ChatGPT and the like (and even IDE integrations and API use) is imperative right, it's like 'answer this question' or 'do this thing for me', it's a function invocation. Whereas the silly calculator program I presented above is (unintentionally) kind of a declarative probabilistic program - it's 'this is the behaviour I want, make it so' or 'I have these constraints and these unknowns, fill in the gaps'.
What if we had something like Prolog, but with the possibility of facts being kind of on-demand at runtime, powered by the LLM driving it?
This (sort of) is already a paradigm: https://en.m.wikipedia.org/wiki/Probabilistic_programming
That's entirely orthogonal.
In probabilistic programming you (deterministically) define variables and formulas. It's just that the variables aren't instances of floats, but represent stochastic variables over floats.
This is similar to libraries for linear algebra where writing A * B * C does not immediately evaluate, but rather builds an expression tree that represent the computation; you need to do say `eval(A * B * C)` to obtain the actual value, and it gives the library room to compute it in the most efficient way.
It's more related to symbolic programming and lazy evaluation than (non-)determinism.
I wonder when companies will remove the personality out of LLMs by default, especially for tools
that would require actually curating the training data and eliminating sources that contain casual conversation
too expensive since those are all licensed sources, much easier to train on Reddit data
Just ask an LLM to remove the personality from the training data. Then train a new LLM on that.
It will work, but at the scale needed for pretraining you are bound to have many quality issues that will destroy your student model, so your data cleaning process better be very capable.
One way to think of it is that any little bias or undesirable path in your teacher model will be amplified in the resulting data and is likely to become over represented in the student model.
> maybe we'll come to view deterministic vs. probabilistic (LLMs) similarly
I can't believe someone would seriously write this and not realize how nonsensical it is.
"indeterministic programming", you seriously cannot come up with a bigger oxymoron.
Why do people keep having this reaction to something we're already used to? When you're developing against an API, you're already doing the same thing, planning for what happens when the request hangs, or fails completely, or gives a different response, and so on. Same for basically any IO.
It's almost not even new, just that it generates text instead of JSON, or whatever. But we've already been doing "indeterministic programming" for a long time, where you cannot always assume a function 100% returns what it should all the time.
You’re right about the trees but wrong (hear me out) about the forest.
Yes, programming isn’t always deterministic, not just due to the leftpad API endpoint being down, but by design - you can’t deterministically tell which button the user is going to click. So far so good.
But, you program for the things that you expect to happen, and handle the rest as errors. If you look at the branching topology of well-written code, the majority of paths lead to an error. Most strings are not valid json, but are handled perfectly well as errors. The paths you didn’t predict can cause bugs, and those bugs can be fixed.
Within this system, you have effective local determinism. In practice, this gives you the following guarantee: if the program executed correctly until point X, the local state is known. This state is used to build on top of that, and continue the chain of bounded determinism, which is so incredibly reliable on modern CPUs that you can run massive financial transactions and be sure it works. Or, run a weapons system or a flight control system.
So when people point out that LLMs are non-deterministic (or technically unstable, to avoid bike-shedding), they mean that it’s a fundamentally different type of component in an engineering system. It’s not like retrying an HTTP request, because when things go wrong it doesn’t produce “errors”, it produces garbage that looks like gold.
Programmers aren't deterministic either. If I ask ten programmers to come up with a solution to the same problem, I'm not likely to get ten identical copies. Different programmers, even competent experienced programmers, might have different priorities that aren't in the requirements. For example, trading off program maintainability or portability over performance.
The same could apply to LLMs, or even different runs from the same LLMs.
> Programmers aren't deterministic either.
No but programs are. An LLM can be a programmer too, but it’s not a program the way we want and expect programs to behave: deterministically. Even if a programmer could perform a TLS handshake manually very fast, ignoring the immense waste of energy, the program is a much better engineering component, simply because it is deterministic and does the same thing every time. If there’s a bug, it can be fixed, and then the bug will not re-appear.
> If I ask ten programmers to come up with a solution to the same problem, I'm not likely to get ten identical copies.
Right, but you only want one copy. If you need different clients speaking with each other you need to define a protocol and run conformance tests, which is a lot of work. It’s certainly doable, but you don’t want a different program every time you run it.
I really didn’t expect arguing for reproducibility in engineering to be controversial. The primary way we fix bugs is by literally asking for steps to reproduction. This is not possible when you have a chaos agent in the middle, no matter how good. The only reasonable conclusion is to treat AI systems as entirely different components and isolate them such that you can keep the boring predictability of mechanistic programs. Basically separating engineering from the alchemy.
Not really, we have many implementations of web servers or ftp clients, but they all follow the same protocol. So you can pair any two things that talk the same protocol and have a consistent systems. If you gave ten programmers a specs, you get ten implementations that follows the specs. With LLMs, you get random things.
Why would we embrace that even more? In Software Development we try to keep things deterministic as much as possible. The more variables we're introducing into our software, the more complicated it becomes.
The whole notion of adding LLM prompts as a replacement for code just seems utterly insane to me. It would be a massive waste of resources as we're reprompting AI a lot more frequently than we need to. Also must be fun to debug, as it may or may not work correctly depending on how the LLM model is feeling at that moment. Compilation should always be deterministic, given the same environment.
Some algorithms are inherently probabilistic (bloom filters are a very common example, HyperLogLog is another). If we accept that probabilistic algorithms are useful, then we can extrapolate that to using LLMs (or other neural networks) for similar useful work.
You can make the LLM/NN deterministic. That was never a problem.
> request hangs, or fails completely, or gives a different response
I try to avoid those, not celebrate them.
[flagged]
> It makes no sense at all, it's cuckooland, are you all on crazy pills?
First step towards understanding something you obviously have strong feelings about, is to try to avoid hitting those triggers while you think about the thing, otherwise it clouds you. Not a requirement by any measure, just a tip.
> are you telling me people will do three years university to learn to prompt?
Are people going to university for three years to write "1.0" or "2.0" software? I certainly didn't, and I don't think even the majority of software developers have done so, at least in my personal experience but YMMV.
> I do not understand where there is anything here to be "not sure" on?
They're not sure about the specific naming, not the concept or talk as a whole.
> LLMs making non-deterministic mistakes
Everything they do is non-deterministic when temperature is set to anything above 0.0, as that's the entire point. The "correct" answers are as non-deterministic as the "mistakes", although I'm not sure "mistake" is correct because it did chose the right/correct tokens, it's just that you didn't like/expect it to chose that particular tokens.
> It makes no sense at all, it's cuckooland, are you all on crazy pills?
Frequent LLM usage impairs thinking. The LLM has no connection to reality, and it takes over people's minds.
>Frequent LLM usage impairs thinking
Is there hard evidence on this?
If you are the type who prefers studies:
https://time.com/7295195/ai-chatgpt-google-learning-school/
Otherwise, read pro-LLM blogs which are mostly rambling nonsense that overpromises while almost no actual LLM written software exists.
You can also see how the few open source developers who jump on the LLM bandwagon now have worse blogging and programming output than they had pre-LLM.
I have 7 different 100% LLM written programs in use at my company daily, some going back to GPT-4 and some a recent as gemini 2.5.
Software engineers are so lost in the weeds of sprawling feature pack endless flexibility programs that they have completely lost sight of simple narrow scope programs. I can tell an LLM exactly how we need the program to work (forgoing endless settings and option menus) and tell it exactly what it needs to do (forgoing endless branching possibilities for every conceivable user workflow) and get a lean lightweight program that takes the user from A to B in 3k LOC.
Is the program something that could be sold? No. Would it work for other companies/users? Probably not. Does it replace a massive 1M+ LOC $20/mo software package for that user in our bespoke use case? Yes.
Short answer no.
Longer answer there was that study posted this week that compared it to using search and then what was it…raw thinking or something similar. I could totally understand in certain cases you are not activating parts of your brain as much, I don’t know any of it proves much in aggregate.
Some preliminary evidence:
https://news.ycombinator.com/item?id=44286277
yes
https://www.mdpi.com/2075-4698/15/1/6
Sounds like you’re taking crazy pills.
Far to early from any of the studies done so far to come to your conclusion.
The LLM proponents are so desperate now that they have to resort to personal insults. Are investors beginning to realize the scam?
It’s strange how often criticism gets deflected with claims of personal attack. You’re citing a study that doesn’t say anything close to what you’re claiming. You’re fabricating conclusions that simply aren’t there.
I quoted zero studies in the comment you respond to and had no intentions of doing so. I quoted a study as well as personal observations under duress after a citation demand appeared.
I honestly have no idea what point you’re trying to make now. You opened with bold claims and zero evidence, then acted like being asked for a citation was some kind of duress. If you’re going to assert sweeping conclusions, expect to be challenged. That’s not an attack, it’s basic discourse.
You were the one who started with the insults.
Saying to someone "you are more intelligent if you don't use an LLM" is a compliment, not an insult.
You're not fooling anyone.
Do you really need a study to tell you that offloading your thinking to something else impairs your thinking?
But yes, there are studies to prove the most obvious statement in the world.
https://news.ycombinator.com/item?id=44286277
It’s not obvious to me but perhaps you are approach it from a biased perspective. Sure if you left all higher order function to a LLM, thinking about homework and simply parsing it all through a chatbot, of course you are losing out. There is a lot of nuance to it and I am not sure if that very first initial study captures it. Everyone is different and YMMV but I suspect it will come down to how you use the tools not a simple blanket statement like yours.
Do you really latch on to a single early study to make conclusions in the world? Wild. Next time before going down the path of rudeness, why don’t you share a real anecdote or thought. We have all seen that study linked many times already.
You asked for a study, you got a study. Yet you’re still playing mental gymnastics to somehow prove that not using your brain doesn’t impair thinking. You want an anecdote now? After dismissing a study? Wild. I doubt that will convince you if a study won’t, just grasping at straws.
And no, it’s not nuanced at all. If you stop using your brain, you lose cognitive abilities. If you stop working out, you lose muscle. If you stop coding and let someone else do it, you lose coding abilities.
No one is being rude, that’s just what it feels like when someone calls you out with evidence.
Do you think you could condense your point of view without hyperbole and rudeness so the rest of us can understand it?