Hacker News

bilsbie•15d

There are no new ideas in AI, only new datasets jxmo.io

306 comments

voxleone•15d
I'd say with confidence: we're living in the early days. AI has made jaw-dropping progress in two major domains: language and vision. With large language models (LLMs) like GPT-4 and Claude, and vision models like CLIP and DALL·E, we've seen machines that can generate poetry, write code, describe photos, and even hold eerily humanlike conversations.
But as impressive as this is, it’s easy to lose sight of the bigger picture: we’ve only scratched the surface of what artificial intelligence could be — because we’ve only scaled two modalities: text and images.
That’s like saying we’ve modeled human intelligence by mastering reading and eyesight, while ignoring touch, taste, smell, motion, memory, emotion, and everything else that makes our cognition rich, embodied, and contextual.
Human intelligence is multimodal. We make sense of the world through:
Touch (the texture of a surface, the feedback of pressure, the warmth of skin0; Smell and taste (deeply tied to memory, danger, pleasure, and even creativity); Proprioception (the sense of where your body is in space — how you move and balance); Emotional and internal states (hunger, pain, comfort, fear, motivation).
None of these are captured by current LLMs or vision transformers. Not even close. And yet, our cognitive lives depend on them.
Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.
The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.
- dinfinity•15d
  > Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.
  I respectfully disagree. Touch gives pretty cool skills, but language, video and audio are all that are needed for all online interactions. We use touch for typing and pointing, but that is only because we don't have a more efficient and effective interface.
  Now I'm not saying that all other senses are uninteresting. Integrating touch, extensive proprioception, and olfaction is going to unlock a lot of 'real world' behavior, but your comment was specifically about intelligence.
  Compare humans to apes and other animals and the thing that sets us apart is definitely not in the 'remaining' senses, but firmly in the realm of audio, video and language.
  - computably•14d
    Language is literally an abstraction of sensory inputs and cognitive processes. One can make similar arguments about image generation. These abstractions might characterize the higher cognitive abilities of humans, but it makes no sense to ignore "lower level" cognition. Embodiment is the foundation of our rich internal world models, in particular spacetime, causality, etc.
    Current generative models merely mimic the output, with a fuzzy abstract linguistic mess in place of any physical/causal models. It's unsurprising that their capacity to "reason" is so brittle.
    - dinfinity•14d
      > Language is literally an abstraction of sensory inputs and cognitive processes.
      Language can exist entirely independently from senses and cognition. It is an encoding of patterns in the world where the only thing that matters is if anybody or anything wielding it can map the encodings to and from the patterns they encode for (which is more of a sociological/synchronisation challenge).
      Does C, or Java, 'make no sense' because it 'ignores lower level cognition'?
      There are many parts of non-programming languages that similarly have nothing to do with embodiment. Some of them are even about incredibly abstract things impossible in our universe. One could argue that for many fields genius lies in being able to mentally model what is so foreign to the intuition our embodiment has imbued us with or to be able to find a mapping to facilitate that intuition. Said otherwise: the experience our embodiment has given us might limit how well we can understand the world (Quantum Mechanics anyone?).
      Again, embodiment is interesting and worth pursuing, but far from a requirement for far-reaching intelligence.
      - pjmorris•14d
        > Does C, or Java, 'make no sense' because it 'ignores lower level cognition'?
        It makes sense in context, but that context includes the machine on which the compiled code runs. Without the underlying machine, there's no real purpose for C or Java. I'm open to the idea that 'lower level cognition' may be as relevant to language as the machine is to C or Java.
        motorest•14d
        > Without the underlying machine, there's no real purpose for C or Java.
        They do express algorithms, don't they?
      - yencabulator•14d
        > Language can exist entirely independently from senses and cognition.
        Helen Keller begs to disagree. Language and cognition were clearly linked for her.
        > It wasn't until April 5, 1887, when Anne took Helen to an old pump house, that Helen finally understood that everything has a name. Sullivan put Helen’s hand under the stream and began spelling “w-a-t-e-r” into her palm, first slowly, then more quickly.
        > Keller later wrote in her autobiography, “As the cool stream gushed over one hand she spelled into the other the word water, first slowly, then rapidly. I stood still, my whole attention fixed upon the motions of her fingers. Suddenly I felt a misty consciousness as of something forgotten–-a thrill of returning thought; and somehow the mystery of language was revealed to me. I knew then that ‘w-a-t-e-r’ meant the wonderful cool something that was flowing over my hand. That living word awakened my soul, gave it light, hope, joy, set it free! There were barriers still, it is true, but barriers that could in time be swept away.”
        dinfinity•14d
        I said language can exist independently, not that all language exists independently.
        "one plus one equals two" can be understood and worked with without ever feeling water over your hand. It is a priori knowledge (see Hume's fork for an explanation).
        You have to understand that the richness of language linked to cognition is due to your experience with that part of language and resulting romantization of it. It doesn't mean that it is a core defining feature of language, even though it feels that way (and as touching as that anecdote is).
        computably•8d
        > "one plus one equals two" can be understood and worked with without ever feeling water over your hand. It is a priori knowledge
        "Understood" and "worked with" are completely different.
        The complete absence of embodiment is several degrees removed from "feeling water over your hand." LLMs have no sensory apparatus to relate the word "one" to actual, discrete, singular objects. The most rudimentary calculator can represent and compute "1+1=2", but I doubt any philosophical tradition or even an educated layperson would claim calculators "understand what 1+1=2 means." The "understanding" part has nothing to do with the accuracy or truthfulness of the computation; it comes from the relation of the abstract statement to counting of actual objects.
      - computably•8d
        > Language can exist entirely independently from senses and cognition.
        Maybe? Are you just outlining the thesis, or saying this should be self-evident?
        > It is an encoding of patterns in the world where the only thing that matters is if anybody or anything wielding it can map the encodings to and from the patterns they encode for (which is more of a sociological/synchronisation challenge).
        Yes, and my point is, current genAI utterly fails in unpredictable / bizarre ways because it only mimics the abstract encodings, ignorant of patterns in the world. Obviously some people argue "next token prediction is all you need," but that's a claim that is far from self-evident.
        > There are many parts of non-programming languages that similarly have nothing to do with embodiment. Some of them are even about incredibly abstract things impossible in our universe. One could argue that for many fields genius lies in being able to mentally model what is so foreign to the intuition our embodiment has imbued us with or to be able to find a mapping to facilitate that intuition.
        I would say this misses the point. The meaningfulness of abstractions, even ones that are unintuitive or unphysical or illogical, come from our embodied experience. Our enjoyment of even the most absurd fiction comes from our ability to simultaneously comprehend what it is "about" and what is "possible." Both relate to experience-of-reality and mean nothing in a vacuum.
        > Said otherwise: the experience our embodiment has given us might limit how well we can understand the world (Quantum Mechanics anyone?).
        I agree that we are limited in some ways by our particular embodiment. e.g. There's a huge spectrum of sensory experiences - colors, sounds, smells ... - which we know other animals have that we do not.
        As I understand, where we disagree is on the why. I would say our capacity for understanding comes from our embodiment, therefore it's only natural that the limits of our embodiment also limit our understanding. After all we could imagine that if we had direct sensory experience of quantum effects, we would understand QM better or at least easier. In some fuzzy way, (no embodiment => poor understanding) and (embodiment => better understanding) is evidence for (embodiment <=> understanding). I suppose your argument is a counterfactual that we might be able to imagine a (no embodiment & some understanding) so (no embodiment =/> poor understanding), but I don't see the evidence that this is not just imaginable but actually possible in reality.
  - voxleone•15d
    > Language and vision are just the beginning — the parts we were able to digitize first - not necessarily the most central to intelligence.
    I probably made a mistake when i asserted that -- should have thought it over. Vision is evolutionarily older and more “primitive”, while language is uniquely human [or maybe, more broadly, primate, cetacean, cephalopod, avian...] symbolic, and abstract — arguably a different order of cognition altogether. But i maintain that each and every sense is important as far as human cognition -- and its replication -- is concerned.
    - wizzwizz4•15d
      People who lack one of those senses, or even two of them, tend to do just fine.
      - oasisaimlessly•15d
        Mostly thanks to other humans helping them.
        If all humans lacked vision, the human race would definitely not do just fine.
        actionfromafar•14d
        I think we need to think about vision and world modelling somewhat separately. We could construct an artificial (tech enhanced) society where sight was not available. People would still "model the world in their minds" with the "abstract model" part of the vision/world system.
    - dinfinity•14d
      Vision is interesting in that it leverages the maximum speed with which it is easily possible to gather information about our surroundings in this universe. I believe that is what makes it special and very valuable. I also believe this aspect makes it a strong attractor for convergent evolution.
      Language allows encoding and compression of information about the world, which is of course incredibly powerful and increases communication bandwidth enormously (as well as tons of other stuff).
      I'd say that for high level cognitive processes, hearing and speaking were an important stepping stone because for some reason evolving organs that can generate relatively high bandwidth signals in audio seems to be easier than evolving something that does that for visuals (very few Teletubby screens on tummies in the natural world).
      Interesting games to think about in this sense: Pictionary/drawing games and charades.
      - actionfromafar•14d
        Regarding visual communication, I think you downsell posturing, gesturing and facial expressions a little. They may not be as high bandwidth as talking but they are very low latency and pretty stealthy if necessary.
        dinfinity•14d
        Well, there is sign language, so I guess you're right. It would be interesting to see how high bandwidth gesturing can be compared to speaking.
        I thought about this some more and I think the prevalence of making sounds rather than gesturing etc. is due to sound being a broadcasting mechanism that works over long distances and without line of sight.
        Visually indicating that you've claimed some territory is pretty hard.
        actionfromafar•14d
        I was thinking more about "low level" communication which can be gleaned from body language, frowns, smiles, gaze, winks, pointing etc. Perhaps not very information dense, but very fast.
  - ordu•14d
    > Touch gives pretty cool skills, but language, video and audio are all that are needed for all online interactions. We use touch for typing and pointing, but that is only because we don't have a more efficient and effective interface.
    It may be, that we are not using touch for anything important as adults. But babies rely on touch to explore their surroundings. They stick anything into their mouth, why? Because a tongue is the most touch-sensitive organ. They are exploring things by touching them with their tongues.
    I can only guess what people get from that, but my guess is they get understanding of geometry and of surface properties of objects, which you'll have problems to get by processing photos or texts.
    > your comment was specifically about intelligence.
    Talking about intelligence, I do not believe that LLMs can match humans without deep understanding or 3d-space and material science^W intuition. It needs touch and temperature sensitivity at least. Probably you can replace it with billions of words of texts describing these things, but I doubt it.
    - dinfinity•14d
      It is trivial to train AI on 3D representations. In fact, that already happens in cases where robot algorithms are trained in simulations.
      Another thing to remember is that the senses we have aren't the only ones in biology and far from the only ones possible. In fact, anything that gives you another type of information about the world (you're modeling) is a different sense. In that sense (ha), AI has access to an incredibly vast and varied array of senses that is inaccessible to humans. Lidar is a very simple example of that.
      I don't think touch and temperature sensitivity are needed to achieve it, but I do agree that training with senses specifically for understanding 3D space is very important. At the very least binocular video.
      - ordu•14d
        > It is trivial to train AI on 3D representations.
        So AI developers understand limitations and trying to remove them. It will help, but it will not make AI vision to be on par with a human's.
        > In that sense (ha), AI has access to an incredibly vast and varied array of senses that is inaccessible to humans. Lidar is a very simple example of that.
        I don't think that current uses of lidars have anything to do with intelligence. Not every neuro-net is about intelligence.
        > I don't think touch and temperature sensitivity are needed to achieve it,
        I'm sure they are. To understand forms you need to explore them with touch. The ability to understand forms by just looking at them is an acquired skill. Maybe it is possible to train these abilities without the touch, but how? I believe it will take a shitload of training data, and I'm not sure it will be good enough.
        Temperature sensitivity is a big thing, because it allow you to guess thermal conductivity of a thing by just looking at it. It allows to guess wetness of a thing. It allows us to guess temperature of things by looking at them: like you see sun shining, fire burning, people touching things and yanking their hands from hot things. Or just how about a person that cautiously trying to learn a temperature of a thing, at first measuring infrared radiation, then a quick touch, then a touch for a longer time, and finally a long sustained contact: how could you understand all these proceedings without your own experience of grasping the hot thing, crying from a pain and dropping the thing on your feet?
        These are just obvious ideas from top of my mind. What else comes from temperature sensitivity I don't know and no one is, because no one really knows how people learn to use their senses and to think. There are theories about it, but they are more of descriptive nature: they describe what is known without having a lot of a predictive power. Because of this the optimism of AI crowd seems overinflated. They don't know what they are trying to do, and still they believe in their eventual success.
        Probably you can learn it by thinking, but can LLMs think, while training? You can learn it as a pattern of a behaviour, without understanding the meaning of it, but then you'll hallucinate this pattern all the time, just because some of the movements were close enough.
        > At the very least binocular video.
        I'm not sure that people can learn 3d by looking. At least they do not just rely on a binocular vision to learn it. They touch, they lick. They measure things in different ways (by sticking it in mouth, by grasping, by climbing on top of it or falling from it, by hugging it), they measure distances by crawling or walking along them. They are finding a spot where they can see what happens behind a pack of tree, or maybe behind something else. People not just using more senses, they are acting also, which allows them to learn causal relationships. Watching binocular video is not acting, so you can get correlation only without any hope to learn how to distinguish correlations from causations, and at the same time it is much more limited in a data available.
        Science says that 80 or 90% of information people get is coming from their vision? I'm skeptical about this, because I don't know how they measure "information", but in any case human vision was trained with support from other senses. I wouldn't be surprised, if at certain stages of a baby's development other senses are more advanced and are used to get labelled data to train vision.
  - azeirah•14d
    Humans are known for their exceptional sensitivity in their hands and fingers. There are only few animals that come close to our ability to manipulate objects.
    Only octopuses, elephants and apes are in a similar league with regards to dexterity and finesse.
    - cma•14d
      You can be born without hands and have zero cognitive deficits. Sensory info and action-feedback from hands, vision, hearing, isn't key to intelligence, but if you are born without vision and hearing it can cause developmental issues, but even if you lose vision and hearing before 2 years you can develop normally, like Hellen Keller.
      - dbspin•14d
        Actually this is wrong. There's a connection between bodily sensation and emotion so profound that quadriplegics can develop flat affect which in turn leads to decision paralysis and cognitive deficit. Emotions are regulated somatically, and inform decision making and other aspects of motivation and reasoning.
        Source: https://pmc.ncbi.nlm.nih.gov/articles/PMC2633768/
        drw85•12d
        Spinal cord injury implies that you were born with those senses and then abruptly lost them.
        In that case, all the processes and pathways in your brain are relying on and tied to those senses, so losing them might also disrupt or affect those pathways and how they function.
- mr_world•15d
  Organic adaption and persistence of memory I would say are the two major advancements that need to happen.
  Human neural networks are dynamic, they change and rearrange, grow and sever. An LLM is fixed and relies on context, if you give it the right answer it won't "learn" that is the correct answer unless it is fed back into the system and trained over months. What if it's only the right answer for a limited period of time?
  To build an intelligent machine, it must be able train itself in real time and remember.
  - specialist•15d
    Yes and: and forget.
    - 8n4vidtmkvmk•14d
      Why is forgetting important? Things can either have an end time where they are no longer applicable or things we thought were true turn out to be false but it's still useful to see where we went wrong.
      I imagine humans are limited by the # of synapses we have so it's useful to forget but maybe machines can move the useless stuff to deep storage until it's dug out, in the same way certain things can trigger a deep memory in humans.
      - lupire•14d
        Which is cheaper, among these two logically equivalent things: reducing one weight, or increasing every other weight?
      - specialist•12d
        Do you remove dead code? Get rid of clutter? Ever try to change a habit?
        8n4vidtmkvmk•11d
        Yeah, I delete dead code but it's important to remember why I wrote it that way in the first place and why I'm deleting it now. Doomed to repeat past mistakes and all that.
- tsimionescu•14d
  I think one counterpoint to this idea is the compute cost.
  To a great extent, it's not AI research that is the primary driver behind the huge advances in AI, either in terms of techniques (transformers) or data sets. Instead, the biggest single factor responsible for this huge boost are advances in compute hardware and compute power in general. Even if we had known about the Transformer architecture 20 years earlier, and we had had the datasets that OpenAI and Google amassed 20 years earlier, we still would not have been able to get anywhere close to training an LLM on hardware from 20 years ago.
  And given this, and given that LLMs have already pushed this compute power to the limit, it's very possible that we'll stagnate at more or less the current level unless and until a new 10x or even 100x boost in compute power happens. It's very unlikely that you could train a model on 100x as much data as you get today without that, which is what you would likely require to add multiple modalities and then combine them.
  - lupire•14d
    Tick-tock model. Research switches between adding new ideas and pouring more power into good ideas.
- chasd00•15d
  > Language and vision are just the beginning..
  Based on the architectures we have they may also be the ending. There’s been a lot of news in the past couple years about LLMs but has there been any breakthroughs making headlines anywhere else in AI?
  - dragonwriter•15d
    > There’s been a lot of news in the past couple years about LLMs but has there been any breakthroughs making headlines anywhere else in AI?
    Yeah, lots of stuff tied to robotics, for instance; this overlaps with vision, but the advances go beyond vision.
    Audio has seen quite a bit. And I imagine there is stuff happening in niche areas that just aren't as publicly interesting as language, vision/imagery, audio, and robotics.
  - nomel•15d
    Two Nobel prizes in chemistry: https://www.nature.com/articles/s41746-024-01345-9
    - swee69•14d
      prizes != breakthroughs
      - nomel•14d
        progress != breakthroughs
  - edanm•15d
    Sure. In physics, math, chemistry, biology. To name a few.
- Swizec•15d
  > The real frontier of AI lies in the messy, rich, sensory world where people live. We’ll need new hardware (sensors), new data representations (beyond tokens), and new ways to train models that grow understanding from experience, not just patterns.
  Like Dr. Who said: DALEKs aren't brains in a machine, they are the machine!
  Same is true for humans. We really are the whole body, we're not just driving it around.
  - nomel•15d
    There are many people who mentally developed while paralyzed that literally drive around their bodies via motorized wheelchair. I don't think there's any evidence that a brain couldn't exist or develop in a jar, given only the inputs modern AI now has (text, video, audio).
    - Swizec•15d
      > any evidence that a brain couldn't exist or develop in a jar
      The brain could. Of course it could. It's just a signals processing machine.
      But would it be missing anything we consider core to the way humans think? Would it struggle with parts of cognition?
      For example: experiments were done with cats growing up in environments with vertical lines only. They were then put in a normal room and had a hard time understanding flat surfaces.
      https://computervisionblog.wordpress.com/2013/06/01/cats-and...
      - nomel•15d
        This isn't remotely a hypothetical, so I imagine there are some examples out there, especially from back when polio was a problem. Although, for practical reasons, they might have had limited exposure to novelty, which could have negative consequences.
        Swizec•14d
        I agree it’s not hypothetical and also as a layperson I don’t know how much impact on cognition has been studied. Would be cool if it has!
        I do know of studies that showed blind people start using their visual cortex to process sounds. That is pretty cool imo
      - •15d
        [deleted]
- slashdave•14d
  > modeled human intelligence
  That's not what these models do
- skydhash•15d
  Yeah, but are there new ideas or only wishes?
  - jdgoesmarching•15d
    It’s pure magical thinking that would be correctly dismissed if it didn’t have AI attached to it. Imagine talking this way about anything else.
    “We’ve barely scratched the surface with Rust, so far we’re only focused on code and haven’t even explored building mansions or ending world hunger”
    - tim333•15d
      AI has some real possibilities of building mansions and ending hunger in a way that Rust doesn't.
      - drw85•12d
        How is this ending anyones hunger? As long as humans are steering the ship, the commodity will be limited to those that control it and they will make all the money. If anything, it has a big potential to cause more hunger.
- TimByte•14d
  In a way until AI systems can feel the weight of a cup or flinch from heat, we're not close to modeling anything like embodied cognition
- antithesizer•14d
  The big horizon isn't just incorporating another sensory modality, it's what Heidegger called being-in-the-world, living among us as a human-like social being. That advancement depends on robotics to provide emboddied experience.
- •15d
  [deleted]
- timewizard•15d
  > has made jaw-dropping progress
  They took 1970s dead tech and deployed it on machines 1 million times more powerful. I'm not sure I'd qualify this as progress. I'd also need an explanation as to what systemic improvements in models and computations that give an exponential growth in performance are planned.
  I don't see anything.
  - petesergeant•14d
    > They took 1970s dead tech and deployed it on machines 1 million times more powerful. I’m not sure I’d qualify this as progress
    If this isn’t meant to be sarcasm or irony, you’ve got some really exciting research and learning ahead of you! At the moment it reads very “computers are just addition and multiplication and we’ve had that for thousands of years!”
    - timewizard•14d
      > you’ve got some really exciting research and learning ahead of you
      I've done the research. Which is why I made the point I did. You're being dismissive and rude instead of putting forth any sort of argument. It's the paper hat of fake intellect. Yawn.
      > At the moment it reads very “computers are just addition and multiplication and we’ve had that for thousands of years!”
      Let's be specific then. The problem with the models is they require exponential cost growth for model generation giving only linear increases in output performance. This cost curve is currently a factor or two stronger than the curve of increasing hardware performance. Putting the technology, absent any actual fundamental algorithmic improvements, which do /not/ seem forthcoming despite billions in speculative funding, into a strict coffin corner. In short: AI winter 2.0.
      Got any plans for that? Any specific research that deals with that? Any thoughts of your own on this matter?
      - petesergeant•14d
        > I've done the research
        Great. What's the 1970s equivalent of word2vec or embeddings, that we've simply scaled up? Where are the papers about the transformer architecture or attention from the 1970s? Sure feels like you think LLMs are just big perceptrons.
        > The problem with the models is they require exponential cost growth
        Let's stick to the assertion I was disputing instead.
      - mr_toad•13d
        A linear increase in technology can easily lead to a greater than linear increase in economic gain. Sometimes even small performance gains can overturn whole industries.
  - ekunazanu•14d
    Winning two Nobel prizes wasn't enough progress?
    - timewizard•14d
      Is progress measured in nobel prizes? My understanding is those are put to a vote by institutional committee.
      Putting that aside. The shared prize in 2024 was given for work done in the 1970s and 1980s. Was this meant to be a confirmation of my point? You've done so beautifully.
      In 2022 they saw fit to award Ben Bernanke. Yep. That one. For, I kid you not, work on the impacts of financial crises. Ironically also work originally done in the 1970s and 80s.
      - ekunazanu•14d
        AlphaFold uses transformers. That is definitely not from the 70s and 80s.
        Progress for me includes both small iterative refinements and big leaps. It also includes trying old techniques in new domains with new technology. So I think we just have differing definitions for progress.
tippytippytango•15d
Sometimes we get confused by the difference between technological and scientific progress. When science makes progress it unlocks new S-curves that progress at an incredible pace until you get into the diminishing returns region. People complain of slowing progress but it was always slow, you just didn’t notice that nothing new was happening during the exponential take off of the S-curve, just furious optimization.
- baxtr•15d
  Fully agree.
  And at the same time I have noticed that people don’t understand the difference between an S-curve and an exponential function. They can look almost identical at certain intervals.
- protocolture•14d
  As far back as 2017 I copped a lot of flak for suggesting that the coming automation revolution will be great at copying office workers and artists but wont be in order of replacing the whole human race. A lot of the time moores law got thrown back in my face. But thats how this works, we unlock something new, we exploit it as far as possible, the shine wears off and we deal with the aftermath.
- Zacharias030•14d
  The crypto mind cannot comprehend
  - the_sleaze_•14d
    Something I've come to recognize and deeply resent.
    Being Right and being Successful are not the same thing.
- TimByte•14d
  The real trick is recognizing when you're in the thick of engineering polish versus standing on the edge of a new scientific leap
- timewizard•15d
  You're being awfully generous to describe basic hype as "technological progress."
- pevansgreenwood•15d
  That's putting the cart before the horse. Thermodynamics came after the steam engine was made practical. Flight came before aerodynamics. Metallurgy before materials science. Radio before electromagnetic theory took hold. Even LLMs are the result of a lot of tinkering rather than scientific insight. It’s the successful tinkering that creates the puzzle science later formalises.
EternalFury•15d
What John Carmack is exploring is pretty revealing. Train models to play 2D video games to a superhuman level, then ask them to play a level they have not seen before or another 2D video game they have not seen before. The transfer function is negative. So, in my definition, no intelligence has been developed, only expertise in a narrow set of tasks.
It’s apparently much easier to scare the masses with visions of ASI, than to build a general intelligence that can pick up a new 2D video game faster than a human being.
- ozgrakkurt•14d
  Seeing comments here saying “this problem is already solved”, “he is just bad at this” etc. feels bad. He has given a long time to this problem by now. He is trying to solve this to advance the field. And needless to say, he is a legend in computer engineering or w/e you call it.
  It should be required to point to the “solution” and maybe how it works to say “he just sucks” or “this was solved before”.
  IMO the problem with current models is that they don’t learn categorically like: lions are animals, animals are alive. goats are animals, goats are alive too. So if lions have some property like breathing and goats also have it, it is likely that other similar things have the same property.
  Or when playing a game, a human can come up with a strategy like: I’ll level this ability and lean on it for starting, then I’ll level this other ability that takes more time to ramp up while using the first one, then change to this play style after I have the new ability ready. This might be formulated completely based on theoretical ideas about the game, and modified as the player gets more experience.
  With current AI models as far as I can understand, it will see the whole game as an optimization problem and try to find something at random that makes it win more. This is not as scalable as combining theory and experience in the way that humans do. For example a human is innately capable of understanding there is a concept of early game, and the gains made in early game can compound and generate a large lead. This is pattern matching as well but it is on a higher level .
  Theory makes learning more scalable compared to just trying everything and seeing what works
  - 93po•14d
    I'm a huge fan of Carmack and read the book (Masters of Doom) multiple times and love it, too. But he's a legend for pioneering PC gaming graphics in a way that was feasible for a single (very talented) person to accomplish, and was also pioneering something that already existed on consoles. I think there's a big leap from very cleverly recreating existing very basic and simple 3d graphics for a new platform versus the massive task that is AGI/ASI, which I don't think is something a single person can meaningfully move forward at this point. Even the big jump we got from GPTs was due to many many people.
    - Olreich•13d
      Current models are lossy databases at this point. Carmack looks like he might be trying to get logical reasoning to work (learning something abstract in one context and applying it to a similar context). That is something that would advance the field significantly and may be possible with a small team of researchers.
  - motorest•14d
    > Seeing comments here saying “this problem is already solved”, “he is just bad at this” etc. feels bad. He has given a long time to this problem by now. He is trying to solve this to advance the field. And needless to say, he is a legend in computer engineering or w/e you call it.
    This comment, with the exception of the random claim of "he is just bad at this", reads like a thinly veiled appeal to authority. I mean, you're complaining about people pointing out prior work, reviewing the approach, and benchmarking the output.
    I'm not sure you are aware, but those items (bibliographical review, problem statement, proposal, comparison/benchmarks) are the very basic structure of an academic paper, which each and every single academic paper on any technical subject are required to present in order to be publishable.
    I get that there must be a positive feedback element to it, but pay attention to your own claim: "He is trying to solve this to advance the field." How can you tell whether this really advances the field if you want to shield it from any review or comparison? Otherwise what's the point? To go on and claim that ${RANDOM_CELEB} parachuted into a field and succeeded at first try where all so-called researchers and experts failed?
    Lastly, "he is just bad at this". You know who is bad at research topics? Researchers specialized on said topic. Their job is to literally figure out something they don't know. Why do you think someone who just started is any different?
  - •14d
    [deleted]
- vladimirralev•15d
  He is not using appropriate models for this conclusion and neither is he using state of the art models in this research and moreover he doesn't have an expensive foundational model to build upon for 2d games. It's just a fun project.
  A serious attempt at video/vision would involve some probabilistic latent space that can be noised in ways that make sense for games in general. I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game. I think you could prompt veo3 to play any game for a few seconds and it will generally make sense even though it is not fine tuned.
  - sigmoid10•15d
    Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.
    - mym1990•14d
      What is the basis for it having a reasonable understanding of fluid dynamics? Why don’t you think it’s just regurgitating some water scenes derived from its training data, rather than generating actual fluid dynamics?
      - sigmoid10•14d
        Because it can actually extrapolate to unseen cases while maintaining realism.
        mym1990•13d
        Ah yes, the classic “because it can” argument. I’ll take that to mean you don’t know what you’re talking about.
        sigmoid10•8d
        It seems you are confusing this with a personal opinion. This is not my opinion. This is merely the consensus of current research.
        See here for example:
        [1] https://arxiv.org/pdf/2410.18072
        [2] https://arxiv.org/pdf/2411.02914v1
        [3] https://openai.com/index/video-generation-models-as-world-si...
        But even if you knew nothing about this topic, the observation that you simply couldn't store the necessary amount of video data in a model such that it could simply regurgitate it should give you a big clue as to what is happening.
  - altairprime•15d
    Is any model currently known to succeed in the scenario that Carmack’s inappropriate model failed?
    - outofpaper•15d
      No monolithic models but us ng hybrid approaches we've been able to beet humans for some time now.
      - altairprime•15d
        To confirm: hybrid approaches can demonstrate competence at newly-created video games within a short period of exposure, so long as similar game mechanics from other games were incorporated into their training set?
  - 317070•15d
    What you're thinking of is much more like the Genie model from DeepMind [0]. That one is like Veo, but interactive (but not publically available)
    [0] https://deepmind.google/discover/blog/genie-2-a-large-scale-...
  - Intralexical•15d
    > I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game.
    In the same way that keeping a dream journal is basically doing investigative journalism, or talking to yourself is equivalent to making new friends, maybe.
    The difference is that while they may both produce similar, "plausible" output, one does so as a result of processes that exist in relation to an external reality.
  - troupo•15d
    > I think veo3 proves that ai can generalize 2d and even 3d games
    It doesn't. And you said it yourself:
    > generating a video under prompt constraints is basically playing a game.
    No. It's neither generating a game (that people can play) nor is it playing a game (it's generating a video).
    Since it's not a model of the world in any sense of the word, there are issues with even the most basic object permanenece. E.g. here's veo3 generating a GTA-style video. Oh look, the car spins 360 and ends up on a completely different street than the one it was driving down previously: https://www.youtube.com/watch?v=ja2PVllZcsI
    - vladimirralev•15d
      It is still doing a great job for a few frames, you could keep it more anchored to the state of the game if you prompt it. Much like you can prompt coding agents to keep a log of all decisions previously made. Permanenece is excellent, it slips often but it mostly because it is not grounded to specific game state by the prompt or by the decision log.
      - troupo•14d
        So, "it generates a game" somehow "it's incapable of maintaining basic persistence without continuous prompting per frame".
        Also, prompting doesn't work as you imply it does.
  - keerthiko•15d
    > generating a video under prompt constraints is basically playing a game
    Besides static puzzles (like a maze or jigsaw) I don't believe this analogy holds? A model working with prompt constraints that aren't evolving or being added over the course of "navigating" the generation of the model's output means it needs to process 0 new information that it didn't come up with itself — playing a game is different from other generation because it's primarily about reacting to input you didn't know the precise timing/spatial details of, but can learn that they come within a known set of higher order rules. Obviously the more finite/deterministic/predictably probabilistic the video game's solution space, the more it can be inferred from the initial state, aka reduce to the same type of problem as generating a video from a prompt), which is why models are still able to play video games. But as GP pointed out, transfer function negative in such cases — the overarching rules are not predictable enough across disparate genres.
    > I think you could prompt veo3 to play any game for a few seconds
    I'm curious what your threshold for what constitutes "play any game" is in this claim? If I wrote a script that maps button combinations to average pixel color of a portion of the screen buffer, by what metric(s) would veo3 be "playing" the game more or better than that script "for a few seconds"?
    edit: removing knee-jerk reaction language
    - vladimirralev•15d
      It's not ideal, but you can prompt it with an image of a game frame, explain the objects and physics in text and let it generate a few frames of gameplay as a substitute for controller input as well as what it expects as an outcome. I am not talking about real interactive gameplay.
      I am just saying we have proof that it can understand complex worlds and sets of rules, and then abide by them. It doesn't know how to use a controller and it doesn't know how to explore the game physics on its own, but those steps are much easier to implement based on how coding agents are able to iterate and explore solutions.
    - hluska•15d
      [flagged]
      - keerthiko•15d
        fair, and I edited my choice of words, but if you're reading that much aggression from my initial comment (which contains topical discussion) to say what you did, you must find the internet a far more savage place than it really is :/
        hluska•15d
        [flagged]
  - pshc•15d
    I think we need a spatial/physics model handling movement and tactics watched over by a high level strategy model (maybe an LLM).
- IIAOPSW•14d
  There's something fascinating about this, because the human ability to "transfer knowledge" (eg pick up some other never before seen video game and quickly understand it) isn't really that general. There's a very particular "overtone window" of the sort of degrees of difference where it is possible.
  If I were to hand you a version of a 2d platformer (lets say Mario) where the gimmick is that you're actually playing the fourier transform of the normal game, it would be hopeless. You might not ever catch on that the images on screen are completely isomorphic to a game you're quite familiar with and possibly even good at.
  But some range of spatial transform gimmicks are cleanly intuitive. We've seen this with games like vvvvvv and braid.
  So the general rule seems to be that intelligence is transferable to situations that are isomorphic up to certain "natural" transforms, but not to "matching any possible embedding of the same game in a different representation".
  Our failure to produce anything more than hyper-specialists forces us to question exactly is meant by the ability to generalize other than just "mimicking an ability humans seem to have".
  - Certhas•14d
    When studying physics, people eventually learn about Fourier transform, and they learn about quantum mechanics, where the Fourier transform switches between describing things in terms of position and of momentum. And amazingly the harmonic oscillator is the same in position and momentum space! So maybe there are other creatures that perceive in momentum space! Everything is relative!
    Except that's of course superficial nonsense. Position space isn't an accident of evolution, one of many possible encodings of spatial data. It's an extremely special encoding: The physical laws are local in position and space. What happens on the moon does not impact what happens when I eat breakfast much. But points arbitrarily far in momentum space do interact. Locality of action is a very very deep physical principle, and it's absolutely central to our ability to reason about the world at all. To break it apart into independent pieces.
    So I strongly reject your example. It makes no sense to present the pictures of a video game in Fourier space. Its highly unnatural for very profound reasons. Our difficulty stems entirely from the fact that our vision system is built for interpreting a world with local rules and laws.
    I also see no reason that an AI could successfully transfer between the two representations easily. If you start from scratch it could train on the Fourier space data, but that's more akin to using different eyes, rather than transfer.
    - IIAOPSW•13d
      But, you're not really rejecting my example, you're proving it. The human ability to generalize the concept of a 2d platformer is limited to a very narrow range of "intuitive" generalizations that have deeply baked assumptions in them like "locality of action". So when we try to replicate the ability to "generalize", at some point we have to recognize that we can't "generalize in general" but rather we have to deeply bake in certain assumptions about what sorts of variations on the learned theme are possible. Mario with some sort of gimmick that still respects locality of action is doable, the fourier transform of Mario isn't.
      This is a problem because we are approaching AI from an angle of no a priori assumptions about the variations on the pattern that it should be able to generalize to. We just imagine that there's some magic way to recognize any isomorphic representation and transfer our knowledge to the new variables, when the reality is we can only recognize when the domain being transferred to is only different in a narrow set of ways like being upside down or on a bent surface. The set of possible variations on a 2d platformer we can generalize well enough to just pick up and play is a tiny subset of all the ways you could map the pixels on the screen to something else without technically losing information.
      We could probably make an AI that bakes in the sort of assumptions where it can easily generalize what it learns to fourier space representations of the same data, but then it probably wouldn't be good at generalizing the same sorts of things we are good at generalizing.
      My point (hypothesis really) is that the ability to "generalize in general" is a fiction. We can't do it either. But the sort of things we can generalize are exactly the sort that tend to occur in nature anyway so we don't notice the blind spot in what we can't do because it never comes up.
  - chongli•14d
    One of my favourite examples of games that are hard to train an AI on is The Legend of Zelda for NES. Many other games of the NES era have (at least in the short term) a goal function which almost perfectly corresponds to some simple memory value such as score or x-position.
    Not Zelda. That game is highly nonlinear and its measurable goals (triforce pieces) are long-term objectives that take a lot of gameplay to obtain. As far as I’m aware, no AI has been able to make even modest progress without any prior knowledge of the game itself.
    Yet many humans can successfully play and complete the first dungeon without any outside help. While completing the full game is a challenge that takes dedication, many people achieved it long before having access to the internet and its spoiler resources.
    So why is this? Why are humans so much better at Zelda than AIs? I believe that transfer knowledge has a lot to do with it. For starters, Link is approximately human (technically Hylian, but they are considered a race of humans, not a separate species) which means his method of sensing and interacting with his world will be instantly familiar to humans. He’s not at all like an earthworm or an insect in that regard.
    Secondly, many of the objects Link interacts with are familiar to most modern humans today: swords, shields, keys, arrows, money, bombs, boomerangs, a ladder, a raft, a letter, a bottle of medicine, etc. Since these objects in-game have real world analogues, players will already understand their function without having to figure it out. Even the triforce itself functions similarly to a jigsaw puzzle, making it obvious what the player’s final objective should be. Furthermore, many players would be familiar with the tropes of heroic myths from many cultures which the Zelda plot closely adheres to (undertake a quest of personal growth, defeat the nemesis, rescue the princess).
    All of this cultural knowledge is something we take for granted when we sit down to play Zelda for the first time. We’re able to transfer it to the game without any effort whatsoever, something I have yet to witness an AI achieve (train an AI on a general cultural corpus containing all of the background cultural information above and get it to transfer that knowledge into gameplay as effectively as an unspoiled Zelda beginner).
    As for the Fourier transform, I don’t know. I do know that the Legend of Zelda has been successfully completed while playing entirely blindfolded. Of course, this wasn’t with Fourier transformed sound, though since the blindfolded run relies on sound cues I imagine a player could adjust to the Fourier transformed sound effects.
- YokoZar•15d
  I wonder if this is a case of overfitting from allowing the model to grow too large, and if you might cajole it into learning more generic heuristics by putting some constraints on it.
  It sounds like the "best" AI without constraint would just be something like a replay of a record speedrun rather than a smaller set of heuristics of getting through a game, though the latter is clearly much more important with unseen content.
- justanotherjoe•15d
  I don't get why people are so invested in framing it this way. I'm sure there are ways to do the stated objective. John Carmack isn't even an AI guy why is he suddenly the standard.
  - GuB-42•15d
    Who is an "AI guy"? The field as we know it is fairly new. Sure, neural nets are old hat, but a lot has happened in the last few years.
    John Carmack founded Keen technology in 2022 and has been working seriously on AI since 2019. From his experience in the video game industry, he knows a thing or two about linear algebra and GPUs, that is the underlying maths and the underlying hardware.
    So, for all intent and purposes, he is an "AI guy" now.
    - amelius•15d
      But the logic seems flawed.
      He has built an AI system that fails to do X.
      That does not mean there isn't an AI system that can do X. Especially considering that a lot is happening in AI, as you say.
      Anyway, Carmack knows a lot about optimizing computations on modern hardware. In practice, that happens to be also necessary for AI. However, it is not __sufficient__ for AI.
      - gerdesj•15d
        "He has built an AI system that fails to do X."
        Perhaps you have put your finger on the fatal flaw ...
      - PeeMcGee•14d
        > That does not mean there isn't an AI system that can do X.
        You are holding the burden of proof here...
        amelius•14d
        No, Carmack holds the burden of proof because he started the argument. His incapable program does not prove anything.
        Maybe this is formulated a bit harshly, but let us respect the logic here.
        jeffreygoesto•14d
        One of my supervisors used to say: "Don't tell me it's impossible, tell me _you_ could not do it." A true c_nt move that ends every discussion.
        amelius•14d
        Huh, by saying that something is impossible, __you__ are ending the discussion, not your professor.
        lucumo•14d
        No. Pointing out a flaw in an argument doesn't require proving the opposite.
      - nkmnz•14d
        This is exactly how Science works. He’s right until proven wrong. And so are you.
  - qaq•15d
    Keen includes researchers like Richard Sutton, Joseph Modayil etc. Also John has being doing it full time for almost 5 years now so given his background and aptitude for learning I would imaging by this time he is more of an AI guy then a fairly large percentage of AI PhDs.
    - justanotherjoe•14d
      Yeah and in another 5 years he'd probably be at nobel laureate level in AI. I don't think that's how it works. What do you mean? Even a phd program can take 5 years sometimes. Also the man started saying he'd bring about AGI right at the gate. He wasn't being exactly humble.
      God I hate sounding like this. I swear I'm not too good for John Carmack, as he's infinitely smarter than me. But I just find it a bit weird.
      I'm not against his discovery, just against the vibe and framing of the op.
      - qaq•14d
        He stated AGI is an interesting problem to work on could you provide a reference on him claiming "he'd bring about AGI right at the gate"?
        justanotherjoe•13d
        Isn't that basically saying the same thing? I meant at the gate as he's speaking of AGI before the 5 years you mentioned
  - varjag•15d
    What in your opinion constitutes an AI guy?
    - •14d
      [deleted]
  - refulgentis•15d
    Names >> all, and increasingly so.
    One phenomena that bared this to me, in a substantive way, was noticing an increasing # of reverent comments re: Geohot in odd places here, that are just as quickly replied to by people with a sense of how he works, as opposed to the keywords he associates himself with. But that only happens here AFAIK.
    Yapping, or, inducing people to yap about me, unfortunately, is much more salient to my expected mindshare than the work I do.
    It's getting claustrophobic intellectually, as a result.
    Example from the last week is the phrase "context engineering" - Shopify CEO says he likes it better than prompt engineering, Karpathy QTs to affirm, SimonW writes it up as fait accompli. Now I have to rework my site to not use "prompt engineering" and have a Take™ on "context engineering". Because of a couple tweets + a blog reverberating over 2-3 days.
    Nothing against Carmack, or anyone else named, at all. i.e. in the context engineering case, they're just sharing their thoughts in realtime. (i.e. I don't wanna get rolled up into a downvote brigade because it seems like I'm affirming the loose assertion Carmack is "not an AI guy", or, that it seems I'm criticizing anyone's conduct at all)
    EDIT: The context engineering example was not in reference to another post at the time of writing, now one is the top of front page.
    - dvfjsdhgfv•15d
      > Now I have to rework my site to not use "prompt engineering" and have a Take™ on "context engineering". Because of a couple tweets + a blog reverberating over 2-3 days.
      The difference here is that your example shows a trivial statement and a change period of 3 days, whereas what Carmack is doing is taking years.
      - refulgentis•15d
        Right. Nothing against Carmack. Grew up on the guy. I haven't looked into, at all, into any of the disputed stuff, and should actively proclaim I'm a yuge Carmack fanboy.
  - energy123•15d
    Credentialism is bad, especially when used as a stick
  - m_rpn•14d
    Maybe cause he's like top 5 most influential computer programmers of all time and knew to be a super human workaholic?
  - raincole•15d
    Because it "confirms" what they already believe in.
  - surecoocoocoo•15d
    Ah some No True Scotsman
    Not sure why justanotherjoe is a credible resource on who is and isn’t expert in some new dialectic and euphemism for machine state management. You’re that nobody to me :shrug:
    Yann LeCun is an AI guy and has simplified it as “not much more than physical statistics.”
    WWhole lot of AI is decades old info theory books applied to modern computer.
    Either a mem value is or isn’t what’s expected. Either an entire matrix of values is or isn’t what’s expected. Store the results of some such rules. There’s your model.
    The words are made up and arbitrary because human existence is arbitrary. You’re being sold on a bridge to nowhere.
    - justanotherjoe•14d
      I'm not being gatekeeper here. John Carmack came into AI around 2021 iirc and came in Lex Friedman and said he's going to bring about AGI. It's okay for him to try so but he had no particular expertise in the field. He's a brilliant guy and I'm not gonna say he's not going to succeed, or that his opinion is worthless. But people seeemed to think that the whole field is a farce just waiting for an adult to come in and fix it. I find that biased. By the way this is how people end up worshipping figures like Musk. There's a limit to transfer function of human expertise, ironically to the discussion at hand.
      That's just what I think anyway.
      - surecoocoocoo•14d
        [dead]
  - sieabahlpark•15d
    [dead]
- smokel•15d
  The subject you are referring to is most likely Meta-Reinforcement Learning [1]. It is great that John Carmack is looking into this, but it is not a new field of research.
  [1] https://instadeep.com/2021/10/a-simple-introduction-to-meta-...
- Uehreka•15d
  These questions of whether the model is “really intelligent” or whatever might be of interest to academics theorizing about AGI, but to the vast swaths of people getting useful stuff out of LLMs, it doesn’t really matter. We don’t care if the current path leads to AGI. If the line stopped at Claude 4 I’d still keep using it.
  And like I get it, it’s fun to complain about the obnoxious and irrational AGI people. But the discussion about how people are using these things in their everyday lives is way more interesting.
- ferguess_k•15d
  Can you please explain "the transfer function is negative"?
  I'm wondering whether one has tested with the same model but on two situations:
  1) Bring it to superhuman level in game A and then present game B, which is similar to A, to it.
  2) Present B to it without presenting A.
  If 1) is not significantly better than 2) then maybe it is not carrying much "knowledge", or maybe we simply did not program it correctly.
  - tough•15d
    I think the problem is we train models to pattern match, not to learn or reason about world models
    - singron•15d
      I think this is clearly a case of over fitting and failure to generalize, which are really well understood concepts. We don't have to philosophize about what pattern matching really means.
    - magicalhippo•15d
      In the Physics of Language Models[1] they argue that you must augment your training data by changing sentences and such, in order for the model to be able to learn the knowledge. As I understand their argument, language models don't have a built-in way to detect what is important information and what is not, unlike us. Thus the training data must aid it by presenting important information in many different ways.
      Doesn't seem unreasonable that the same holds in a gaming setting, that one should train on many variations of each level. Change the lengths of halls connecting rooms, change the appearance of each room, change power-up locations etc, and maybe even remove passages connecting rooms.
      [1]: https://physics.allen-zhu.com/part-3-knowledge/part-3-1
    - NBJack•15d
      In other words, they learn the game, not how to play games.
      - fsmv•15d
        They memorize the answers not the process to arrive at answers
        EternalFury•15d
        They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.
        nightpool•15d
        Okay, so fuzz the frame rate and latency? That feels very easy to fix.
        wredcoll•15d
        Good point, you should write to John Carmack and let him know you've figured out the problem.
        IshKebab•15d
        This has been disproven so many times... They clearly do both. You can trivially prove this yourself.
        0xWTF•15d
        > You can trivially prove this yourself.
        Given the long list of dead philosophers of mind, if you have a trivial proof, would you mind providing a link?
        IshKebab•15d
        Just go and ask ChatGPT or Claude something that can't possibly be in its training set. Make something up. If it is only memorising answers then it will be impossible for it to get the correct result.
        A simple nonsense programming task would suffice. For example "write a Python function to erase every character from a string unless either of its adjacent characters are also adjacent to it in the alphabet. The string only contains lowercase a-z"
        That task isn't anywhere in its training set so they can't memorise the answer. But I bet ChatGPT and Claude can still do it.
        Honestly this is sooooo obvious to anyone that has used these tools, it's really insane that people are still parroting (heh) the "it just memorises" line.
        imiric•15d
        LLMs don't "memorize" concepts like humans do. They generate output based on token patterns in their training data. So instead of having to be trained on every possible problem, they can still generate output that solves it by referencing the most probable combination of tokens for the specified input tokens. To humans this seems like they're truly solving novel problems, but it's merely a trick of statistics. These tools can reference and generate patterns that no human ever could. This is what makes them useful and powerful, but I would argue not intelligent.
        IshKebab•14d
        > To humans this seems like they're truly solving novel problems
        Because they are. This is some crazy semantic denial. I should stop engaging with this nonsense.
        We have AI that is kind of close to passing the Turing test and people still say it's not intelligent...
        alternatex•14d
        Depending on the interviewer, you could make a non-AI program pass the Turing test. It's quite a meaningless exercise.
        IshKebab•14d
        Obviously I mean for a sophisticated interviewer. Not nonsense like the Loebner prize.
        tough•14d
        The Turing test is contrived to chatting via textual interface.
        These machines are only able to output text.
        It seems hard to think they could reasonably think any -normal- person.
        Tech only feels like magic if you don't know how it works
        imiric•14d
        > Because they _are_.
        Not really. Most of those seemingly novel problems are permutations of existing ones, like the one you mentioned. A solution is simply a specific permutation of tokens in the training data which humans are not able to see.
        This doesn't mean that the permutation is something that previously didn't exist, let alone that it's something that is actually correct, but those scenarios are much rarer.
        None of this is to say that these tools can't be useful, but thinking that this is intelligence is delusional.
        > We have AI that is kind of close to passing the Turing test and people still say it's not intelligent...
        The Turing test was passed arguably decades ago. It's not a test of intelligence. It's an _imitation game_ where the only goal is to fool humans into thinking they're having a text conversation with another human. LLMs can do this very well.
        troupo•15d
        People who say that LLMs memorize stuff are just as clueless who assume that there's any reasoning happening.
        They generate statistically plausible answers (to simplify the answer) based on the training set and weights they have.
        Tijdreiziger•15d
        What if that’s all we’re doing, though?
        troupo•14d
        Most of us definitely do :)
        Or we do it most of the time :)
        •14d
        [deleted]
        pdabbadabba•15d
        It’s really easy: go to Claude and ask it a novel question. It will generally reason its way to a perfectly good answer even if there is no direct example of it in the training data.
        keerthiko•15d
        When LLM's come up with answers to questions that aren't directly exampled in the training data, that's not proof at all that it reasoned its way there — it can very much still be pattern matching without insight from the actual code execution of the answer generation.
        If we were taking a walk and you asked me for an explanation for a mathematical concept I have not actually studied, I am fully capable of hazarding a casual guess based on the other topics I have studied within seconds. This is the default approach of an LLM, except with much greater breadth and recall of studied topics than I, as a human, have.
        This would be very different than if we sat down at a library and I applied the various concepts and theorems I already knew to make inferences, built upon them, and then derived an understanding based on reasoning of the steps I took (often after backtracking from several reasoning dead ends) before providing the explanation.
        If you ask an LLM to explain their reasoning, it's unclear whether it just guessed the explanation and reasoning too, or if that was actually the set of steps it took to get to the first answer they gave you. This is why LLMs are able to correct themselves after claiming strawberry has 2 rs, but when providing (guessing again) their explanations they make more "relevant" guesses.
        pdabbadabba•13d
        I'm not sure what "just guessed" means here. My experience with LLMs is that their "guesses" are far more reliable than a human's casual guess. And, as you say, they can provide cogent "explanations" of their "reasoning." Again, you say they might be "just guessing" at the explanation, what does that really mean if the explanation is cogent and seems to provide at least a plausible explanation for the behavior? (By the way, I'm sure you know that plenty of people think that human explanations for their behavior are also mere narrative reconstructions.)
        I don't have a strong view about whether LLMS are really reasoning -- whatever that might mean. But the point I was responding to is that LLMS have simply memorized all the answers. That is clearly not true under any normal meanings of those words.
        IshKebab•15d
        LLMs clearly don't reason in the same way that humans or SMT solvers do. That doesn't mean they aren't reasoning.
        MichaelZuo•15d
        How do you know it’s a novel question?
        hackinthebochs•15d
        You have probably seen examples of LLMs doing the "mirror test", i.e. identifying themselves in screenshots and referring to the screenshot from the first person. That is a genuinely novel question as an "LLM mirror test" wasn't a concept that existed before about a year ago.
        MichaelZuo•15d
        Elephant mirror tests existed, so it doesn’t seem all that novel when the word “elephant” could just be substituted for the word “LLM”?
        hackinthebochs•15d
        The question isn't about universal novelty, but whether the prompt/context is novel enough such that the LLM answering competently demonstrates understanding. The claim of parroting is that the dataset contains a near exact duplicate of any prompt and so the LLM demonstrating what appears to be competence is really just memorization. But if an LLM can generalize from an elephant mirror test to an LLM mirror test in an entirely new context (showing pictures and being asked to describe it), that demonstrates sufficient generalization to "understand" the concept of a mirror test.
        MichaelZuo•13d
        How do you know it’s the one generalizing?
        Likely there has been at least one text that already does that for say dolphin mirror tests or chimpanzee mirror teats.
        IshKebab•15d
        It's not exactly difficult to come up with a question that's so unusual the chance of it being in the training set is effectively zero.
        troupo•15d
        And as any programmer will tell you: they immediately devolve into "hallucinating" answers, not trying to actually reason about the world. Because that's what they do: they create statistically plausible answers even if those answers are complete nonsense.
        MichaelZuo•15d
        Can you provide some examples of these genuinely unique questions?
        pdabbadabba•13d
        I'm not sure what you mean by "genuinely." But in the coding context LLMs answer novel questions all the time. My codebase uses components and follows patterns that an LLM will have seen before, but the actual codebase is unique. Yet, the LLM can provide detailed explanations about how it works, what bugs or vulnerabilities it might have, modify it, or add features to it.
        MichaelZuo•13d
        It must not have existed prior in any text database whatsoever.
        pdabbadabba•13d
        It certainly wasn't. The codebase is thousands of lines of bespoke code that I just wrote.
        drw85•12d
        Which pretty much every line in it was written similarly somewhere else before, including an explanation and is somehow included in the massive data set it was trained on.
        So far i have asked the AI some novel questions and it came up with novel answers full of hallucinated nonsense, since it copied some similarly named setting or library function and replaced a part of it's name with something i was looking for.
        pdabbadabba•12d
        And this training data somehow includes an explanation of how these individual lines (with variable names unique to my application) work together in my unique combination to produce a very specific result? I don't buy it.
        And...
        > pretty much
        Is it "pretty much" or "all"? The claim that the LLM simply has simply memorized all of its responses seems to require "all."
      - beefnugs•15d
        yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?
        I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"
        There is some kind of nested multidimensional thing to train on here instead of immediate limited choices
      - IshKebab•15d
        Well yeah... If you only ever played one game in your life you would probably be pretty shit at other games too. This does not seem very revealing to me.
        trainerxr50•15d
        I am decent at chess but barely know how the pieces in Go move.
        Of course, this because I have spent a lot of time TRAINING to play chess and basically none training to play go.
        I am good on guitar because I started training young but can't play the flute or piano to save my life.
        Most complicated skills have basically no transfer or carry over other than knowing how to train on a new skill.
        drw85•12d
        But the point here is, if i gave you a guitar with a string more or less. Or a different shaped guitar, you could play it.
        If i give you a chess set with dwarf themed pieces and different colored squares, you could play immediately.
        e2021•14d
        I don't think thats true. If you'd only ever played Doom, I think you could play, say, counterstrike or half-life and be pretty good at it, and i think Carmack is right that its pretty interesting that this doesn't seem to be the case for ai models
    - antisthenes•15d
      Where do you draw the line between pattern matching and reasoning about world models?
      A lot of intelligence is just pattern matching and being quick about it.
      - halfcat•15d
        The line is: building an internal world model requires interfacing with the world, not a model of it, and subsequent failing (including death and survivorship over generations) and adaptation. Plus pattern matching.
        Current AI only does one of those (pattern matching, not evolution), and the prospects of simulating evolution is kind of bleak, given I don’t think we can simulate a full living cell yet from scratch? Building a world model requires life (or something that has undergone a similar evolutionary survivorship path), not something that mimics life.
        BlueTemplar•13d
        You don't need to simulate a full living cell to have evolution. In fact, isn't using evolving programs a decades-old technic ?
        halfcat•12d
        Genetic programming models a natural process of evolution to do something useful, the same way machine learning models neurons to do something useful.
        But producing something useful is a totally different thing from producing resilience in physical reality. That takes a world model, and I guess my suspicion is that an entity can’t build a world model without a long history of surviving in that world.
        Put another way, you can never replicate what it’s like to burn your hand on the fire using only words. You could have a million people tell a child about what fire is like, the dangers of it, the power of it, the pain of it. But they will never develop an innate understanding of it that helps them navigate the real world.
        Until they stick their hand in the fire. Then they know.
    - ferguess_k•15d
      I kinda think I'm more or less the same...OK maybe we have different definitions of "pattern matching".
      - veqz•15d
        It's Plato's cave:
        We train the models on what are basically shadows, and they learn how to pattern match the shadows.
        But the shadows are only depictions of the real world, and the LLMs never learn about that.
        ebonnafoux•14d
        But the same is true for human, we get our information though our senses we do not have the __real__ word directly.
        actionfromafar•14d
        We do much more than LLMs have. We have bodies and feelings.
        EternalFury•15d
        100%
    - •15d
      [deleted]
  - Zanfa•14d
    According to Carmack's recent talk [0], SOTA models that have been trained on game A don't perform better or train faster on game B. Even worse, training on game B negatively affects performance in game A when returning to it.
    [0] https://www.youtube.com/watch?v=3pdlTMdo7pY
    - rhdunn•14d
      You can see a similar effect with LLM finetunes. If you finetune a base model (or other instruct/finetune model) for a new task (e.g. better maths or programming language comprehension) it performs worse at other tasks like creative writing.
      To mitigate this you have to include the other categories in your finetune training dataset so it doesn't lose the existing knowledge. Otherwise, the backpropagation and training will favour weights that reflect the new data.
      In the game example having the weights optimized for game A doesn't help with game B. It would be interesting to see if training for both game A and B help it understand concepts in both.
      Similarly with programming languages it would be interesting to see if training it with multiple languages if it can extract concepts like if statements and while loops.
      IIUC from the observations with multilingual LLMs you need to have the different things you are supporting in the training set together. Then the current approach is able to identify similar concepts/patterns. It's not really learning these concepts but is learning that certain words often go together or that a word in one language is similar to another.
      It would be interesting to study multilingual LLMs for their understanding of those languages in the case where the two languages are similar (e.g. Scottish and Irish Gaelic; Dutch and Afrikaans; etc.), are in the same language family (French, Spanish, Portuguese), or are in different language families (Italian, Japanese, Swahili), etc.
      - Zanfa•14d
        > In the game example having the weights optimized for game A doesn't help with game B. It would be interesting to see if training for both game A and B help it understand concepts in both.
        Supposedly it does both A and B worse. That's their problem statement essentially. Current SOTA models don't behave like humans would. If you took a human that's really good at A and B, chances are they're gonna pick up C much quicker than a random person off the street that hasn't even seen Atari before. With SOTA models, the random "person" does better at C than the A/B master.
- goatlover•15d
  I've wondered about the claim that the models played those Atari/2D video games at superhuman levels, because I clearly recall some humans achieving superhuman levels before models were capable of it. Must have been superhuman compared to average human player, not someone who spent an inordinate amount of time mastering the game.
  - raincole•15d
    I'm not sure why you think so. AI outperforms humans in many games already. Basically all the games we care to put money to train a model.
    AI has beat the best human players in Chess, Go, Mahjong, Texas hold'em, Dota, Starcraft, etc. It would be really, really surprising that some Atari game is the holy grail of human performance that AI cannot beat.
    - tsimionescu•15d
      I recall this not being true at all for Dota and Starcraft. I recall AlphaStar performed much better than the top non-pro players, but it couldn't consistently beat the pro players with the budget that Google was willing to spend, and I believe the same was true of Dota II (and there they were even playing a limited form of the game, with fewer heroes and without the hero choice part, I believe).
      - wredcoll•15d
        As I recall, the Starcraft ones heavily involved being able to exploit the computer's advantage in "twitch" speed over any human, it's just a slightly more complicated way of how any aim-bot enabled AI will always beat a human in an FPS, the game is designed to reward a certain amount of physical speed and accuracy.
        In other words, the Starcraft AIs that win do so by microing every single unit in the entire game at the same time, which is pretty clever, but if you reduce them to interfacing with the game in the same way a human does, they start losing.
        One of my pet peeves when we talk about the various chess engines is yes, given a board state they can output the next set of moves to beat any human, but can they teach someone else to play chess? I'm not trying to activate some kinda "gotcha" here, just getting at what does it actually mean to "know how to play chess". We'd expect any human that claimed to know how to play to be able to teach any other human pretty trivially.
        8n4vidtmkvmk•14d
        I don't think the current chess models can train humans to play, but I imagine that's another thing that can be optimized for. Start with some existing chess training program, sprinkle in some AI, collect some data, figure out what methods increase ELO score the fastest.
- runeks•14d
  > What John Carmack is exploring is pretty revealing. Train models to play 2D video games to a superhuman level, then ask them to play a level they have not seen before or another 2D video game they have not seen before.
  Where can I read about these experiments?
  - henryjcee•14d
    some info in here https://www.slideshare.net/slideshow/john-carmack-s-slides-f...
- bob1029•14d
  Generalization across tasks is clearly still elusive. The only reason we see such success with modern LLMs is because of the heroic amount of parameters used. When you are probing into a space of a billion samples, you will come back with something plausible every time.
  The only thing I've seen approximating generalization has appeared in symbolic AI cases with genetic programming. It's arguably dumb luck of the mutation operator, but oftentimes a solution is found that does work for the general case - and it is possible to prove a general solution was found with a symbolic approach.
- hluska•15d
  When I finished my degree, the idea that a software system could develop that level of expertise was relegated to science fiction. It is an unbelievable human accomplishment to get to that point and honestly, a bit of awe makes life more pleasant.
  Less quality of life focused, I don’t believe that the models he uses for this research are capable of more. Is it really that revealing?
- •15d
  [deleted]
- bthornbury•15d
  This generalization issue in RL in specific was detailed by OpenAI in 2018
  https://arxiv.org/pdf/1804.03720
- moralestapia•15d
  I wonder how much performance decreases if they just use slightly modified versions of the same game. Like a different color scheme, or a couple different sprites.
- fullshark•15d
  Just sounds like an example of overfitting. This is all machine learning at its root.
- TimByte•14d
  The gap between hype and actual generalization is still massive
- SquibblesRedux•14d
  Indeed, it's nothing but function fitting.
- t55•15d
  this is what deepmind did 10 years ago lol
  - smokel•15d
    No, they (and many others before them) are genuinely trying to improve on the original research.
    The original paper "Playing Atari with Deep Reinforcement Learning" (2013) from Deepmind describes how agents can play Atari games, but these agents would have to be specifically trained on every individual game using millions of frames. To accomplish this, simulators were run in parallel, and much faster than in real-time.
    Also, additional trickery was added to extract a reward signal from the games, and there is some minor cheating on supplying inputs.
    What Carmack (and others before him) is interested in, is trying to learn in a real-life setting, similar to how humans learn.
jschveibinz•15d
I will respectfully disagree. All "new" ideas come from old ideas. AI is a tool to access old ideas with speed and with new perspectives that hasn't been available up until now.
Innovation is in the cracks: recognition of holes, intersections, tangents, etc. on old ideas. It has bent said that innovation is done on the shoulders of giants.
So AI can be an express elevator up to an army of giant's shoulders? It all depends on how you use the tools.
- alfalfasprout•15d
  Access old ideas? Yes. With new perspectives? Not necessarily. An LLM may be able to assist in interpreting data with new perspectives but in practice they're still fairly bad at greenfield work.
  As with most things, the truth lies somewhere in the middle. LLMs can be helpful as a way of accelerating certain kinds and certain aspects of research but not others.
  - stevep98•15d
    > Access old ideas? Yes. With new perspectives?
    I wonder if we can mine patent databases for old ideas that never worked out in the past, but now are more useful. Perhaps due to modern machining or newer materials or just new applications of the idea.
- bcrosby95•15d
  The article is discussing working in AI innovation vs focusing on getting more and better data. And while there have been key breakthroughs in new ideas, one of the best ways to increase the performance of these systems is getting more and better data. And how many people think data is the primary avenue to improvement.
  It reminds me of an AI talk a few decades ago, about how the cycle goes: more data -> more layers -> repeat...
  Anyways, I'm not sure how your comment relates to these two avenues of improvement.
- baxtr•15d
  Imagine a human had read every book/publication in every field of knowledge that mankind has ever produced AND couldn’t come up with anything entirely new. Hard to imagine.
  - mdaniel•14d
    My hypothesis of the mismatch is centered around "read" - I think that when you wrote it, and when others similarly think about that scenario, the surprise is because our version of "read" is the implied "read and internalized" or at bare minimum "read for comprehension" but as very best I can tell the LLM's version is "encoded tokens into vector space" and not "encoded into semantic graph"
    I welcome the hair-splittery that is sure to follow about what it means to "understand" anything
    - 8n4vidtmkvmk•14d
      That's the point, isn't it? The missing link. AIs can't yet truly comprehend, or internalize, or whatever you want to call it. That's probably equivalent to AGI or singularity. We're not there yet. Feeding copious amounts of data into existing architecture won't get us there either.
      A human with all that data, if it could fit in their brain, would likely come up with something interesting. Even then... I'm not entirely sure it's so simple. I'd wager most of us have enough knowledge in our brains today to come up with something if we applied ourselves, but ideas don't spontaneously appear just because the knowledge is there.
      What if we take our AI models and force them to continuously try making connections between unlikely things? The novel stuff is likely in the parts that don't already have strong connections because research is lacking but could. But how would it evaluate what's interesting?
  - hugh-avherald•14d
    It is possible that such a human wouldn't come up with anything new, even if they could.
  - melagonster•14d
    The difficult part is proposing an experiment to check a new idea.
- jjtheblunt•15d
  > I will respectfully disagree. All "new" ideas come from old ideas.
  The insight into the structure of the benzene ring famously came in a dream, hadn't been seen before, but was imagined as a snake bitings its own tail.
  - troupo•15d
    And as we all know, it came in a dream to a complete novice in chemistry with zero knowledge of any old ideas in chemistry: https://en.wikipedia.org/wiki/August_Kekul%C3%A9
    --- start quote ---
    The empirical formula for benzene had been long known, but its highly unsaturated structure was a challenge to determine. Archibald Scott Couper in 1858 and Joseph Loschmidt in 1861 suggested possible structures that contained multiple double bonds or multiple rings, but the study of aromatic compounds was in its earliest years, and too little evidence was then available to help chemists decide on any particular structure.
    More evidence was available by 1865, especially regarding the relationships of aromatic isomers.
    [ Kekule claimed to have had the dream in 1865 ]
    --- end quote ---
    The dream claim came from Kekule himself 25 years after his proposal that he had to modify 10 years after he proposed it.
- •15d
  [deleted]
- •15d
  [deleted]
- gametorch•15d
  Exactly!
  Can you imagine if we applied the same gatekeeping logic to science?
  Imagine you weren't allowed to use someone else's scientific work or any derivative of it.
  We would make no progress.
  The only legitimate defense I have ever seen here revolves around IP and copyright infringement, which I couldn't care less about.
kogus•15d
To be fair, if you imagine a system that successfully reproduced human intelligence, then 'changing datasets' would probably be a fair summary of what it would take to have different models. After all, our own memories, training, education, background, etc are a very large component of our own problem solving abilities.
strangescript•15d
If you work with model architecture and read papers, how could not know there are a flood of new ideas? Only few yield interesting results though.
I kind of wonder if libraries like pytorch have hurt experimental development. So many basic concepts no one thinks about anymore because they just use the out of the box solutions. And maybe those solutions are great and those parts are "solved", but I am not sure. How many models are using someone else's tokenizer, or someone else's strapped on vision model just to check a box in the model card?
- thenaturalist•15d
  That's been the very normal way of the human world.
  When the foundation layer at a given moment doesn't yield an ROI on intellectual exploration - say because you can overcompensate with VC funded raw compute and make more progess elsewhere -, few(er) will go there.
  But inevitably, as other domains reach diminishing returns, bright minds will take a look around where significant gains for their effort can be found.
  And so will the next generation of PyTorch or foundational technologies evolve.
- delifue•14d
  The hardware(GPU)'s architectural limitations may slow research more than PyTorch. The hardware lottery https://hardwarelottery.github.io/
  - mikewarot•14d
    I've been trying to get BitGrid [1] a hardware lottery ticket for a decade now.
    [1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
- mardifoufs•15d
  Yeah and even then, it's been like ~ 2-3 years since the last rather major Architectural improvement, major enough for a lot of people to actually hear about it and use it daily. I think some people lose perspective on how short of a time frame 3 years is.
  But yes, there's a ton of interesting and useful stuff (beyond datasets and data related improvements) going on right now, and I'm not even talking about LLMs. I don't do anything related to LLM and even then I still see tons of new stuff popping up regularly.
- _giorgio_•15d
  It's the opposite.
  Frameworks like pytorch are really flexible. You can implement any architecture, and if it's not enough, you can learn CUDA.
  Keras it's the opposite, it's probably like you describe things.
- kevmo314•15d
  The people who don't think about such things probably wouldn't develop experimentally sans pytorch either.
lossolo•15d
I wrote about it around a year ago here:
"There weren't really any advancements from around 2018. The majority of the 'advancements' were in the amount of parameters, training data, and its applications. What was the GPT-3 to ChatGPT transition? It involved fine-tuning, using specifically crafted training data. What changed from GPT-3 to GPT-4? It was the increase in the number of parameters, improved training data, and the addition of another modality. From GPT-4 to GPT-40? There was more optimization and the introduction of a new modality. The only thing left that could further improve models is to add one more modality, which could be video or other sensory inputs, along with some optimization and more parameters. We are approaching diminishing returns." [1]
10 months ago around o1 release:
"It's because there is nothing novel here from an architectural point of view. Again, the secret sauce is only in the training data. O1 seems like a variant of RLRF https://arxiv.org/abs/2403.14238
Soon you will see similar models from competitors." [2]
Winter is coming.
1. https://news.ycombinator.com/item?id=40624112
2. https://news.ycombinator.com/item?id=41526039
- tolerance•15d
  And when winter does arrive, then what? The technology is slowing down while its popularity picks up. Can sparks fly out of snow?
  - imiric•14d
    > And when winter does arrive, then what?
    If the technology is useful, the Slope of Enlightenment, followed by the Plateau of Productivity.
  - blibble•15d
    the trillion dollar funding tap is turned off, the prices charged then will have to reflect the costs
    shortly thereafter the entire ecosystem will collapse
russellbeattie•15d
Paradigm shifts are often just a conglomeration of previous ideas with one little tweak that suddenly propels a technology ahead 10x which opens up a whole new era.
The iPhone is a perfect example. There were smartphones with cameras and web browsers before. But when the iPhone launched, it added a capacitive touch screen that was so responsive there was no need for a keyboard. The importance of that one technical innovation can't be overstated.
Then the "new new thing" is followed by a period of years where the innovation is refined, distributed, applied to different contexts, and incrementally improved.
The iPhone launched in 2007 is not really that much different than the one you have in your pocket today. The last 20 years has been about improvements. The web browser before that is also pretty much the same as the one you use today.
We've seen the same pattern happen with LLMs. The author of the article points out that many of AI's breakthroughs have been around since the 1990s. Sure! And the Internet was created in the 1970s and mobile phones were invented in the 1980s. That doesn't mean the web and smartphones weren't monumental technological events. And it doesn't mean LLMs and AI innovation is somehow not proceeding apace.
It's just how this stuff works.
LarsDu88•15d
If datasets are what we are talking about, I'd like to bring attention to the biological datasets out there that have yet to be fully harnessed.
The ability to collect gene expression data at a tissue specific level has only been invented and automated in the last 4-5 years (see 10X Genomics Xenium, MERFISH). We've only recently figured out how to collect this data at the scale of millions of cells. A breakthrough on this front may be the next big area of advancement.
ctoth•15d
Reinforcement learning from self-play/AlphaWhatever? Nah must just be datasets. :)
- NitpickLawyer•15d
  And architecture stuff like actually useful long context. Whatever they did with gemini 2.5 is miles ahead in long context useful results compared to the previous models. I'd be very surprised if gemini 2.5 is "just" gemini 1 w/ better data.
  - shwouchk•15d
    i dont know what all the hype is with gemini 2.5, at least the currently running instance. from my experience at least in conversation mode, it cannot remember my instructions to avoid apologies and similar platitudes from either the “persona”, personal instructions, or from ine message to the next.
- grumpopotamus•15d
  https://en.wikipedia.org/wiki/TD-Gammon
  - Y_Y•15d
    You raise a really interesting point. I'm sure it's just missed my notice, but I'm not familiar with any projects from antediluvian AI that have been resurrected to run on modern hardware and see where they'd really asymptote if they'd had the compute they deserved.
    - rsfern•14d
      This paper “were RNNs all we needed?” explores this hypothesis a bit, finding that some pre-transformer sequence models can match transformers when trained at appropriate scale. Though they did have to make some modifications to unlock more parallelism
      https://arxiv.org/abs/2410.01201
    - FeepingCreature•15d
      To be fair, usually those projects would need considerable work to be ported to modern multicore machines, let alone GPUs.
      - genewitch•15d
        can you name a couple so i can see how much work is involved? markov chains compile fast and respond fast, sure, and neural nets train pretty quick too, so i'm wondering where the cutoff is; expert systems?
        FeepingCreature•14d
        Oh I have no idea, sorry if I gave the impression I have special knowledge here. I'm just deriving this from consumer multicore systems being historically rare and GPGPUs being nonexistant.
  - zahlman•15d
    For that matter, https://gist.github.com/deebs67/8fbcf8b127a63e70d4a3f8590c97... .
- nyrikki•15d
  Big difference between a perfect information, completely specified zero sum game and the real world.
  As a simple analogy, read out the following sentence multiple times, stressing a different word each time.
  "I never said she stole my money"
  Note how the meaning changes and is often unique?
  That is a lens I to the frame problem and it's inverse, the specification problem.
  The above problem quickly becomes tower-complete, and recent studies suggest that RL is reinforcing or increasing the weight of existing patterns.
  As the open domain frame problem and similar challenges are equivalent to HALT, finding new ways to extract useful information will be important for generalization IMHO.
  Synthetic data is useful, but not a complete solution, especially for tower problems.
  - genewitch•15d
    The one we use is "I always pay my taxes"
    and as far as synthetic vs real data, there's a lot of gaps in LLM knowledge; and vision models suffer from "limited tags", which used to have workarounds with textual embeddings and the like, but those went by the wayside as LoRA, controlnet, etc. appeared.
    There's people who are fairly well known that LLMs have no idea about. There's things in books i own that the AI confidently tells me are either wrong or don't exist.
    That one page about compressing 1 gig wikipedia as small as possible implicitly and explicitly states that AI is "basically compression" - and if the data isn't there, it's not in the compressed set (weights) either.
    And i'll reply to another comment here, about "24/7 rolling/ for looped" AI - i thought of doing this when i first found out about LLMs, but context windows are the enemy, here. I have a couple of ideas about how to have a continuous AI, but i don't have the capital to test it out.
- energy123•15d
  Self-play gives you a large explosion of data.
piinbinary•15d
AI training is currently a process of making the AI remember the dataset. It doesn't involve the AI thinking about the dataset and drawing (and remembering) conclusions.
It can probably remember more facts about a topic than a PhD in that topic, but the PhD will be better at thinking about that topic.
- jayd16•15d
  Its a bit more complex than that. Its more about baking out the dataset into heuristics that a machine can use to match a satisfying result to an input. Sometimes these heuristics are surprising to a human and can solve a problem in a novel way.
  "Thinking" is too broad a term to apply usefully but I would say its pretty clear we are not close to AGI.
- tantalor•15d
  Maybe that's why PhDs keep the textbooks they use at hand, so they don't have to remember everything.
  Why should the model need to memorize facts we already have written down somewhere?
- nkrisc•15d
  > It can probably remember more facts about a topic than a PhD in that topic
  So can a notebook.
cadamsdotcom•15d
What about actively obtained data - models seeking data, rather than being fed. Human babies put things in their mouths, they try to stand and fall over. They “do stuff” to learn what works. Right now we’re just telling models what works.
What about simulation: models can make 3D objects so why not give them a physics simulator? We have amazing high fidelity (and low cost!) game engines that would be a great building block.
What about rumination: behind every Cursor rule for example, is a whole story of why a user added it. Why not take the rule, ask a reasoning model to hypothesize about why that rule was created, and add that rumination (along with the rule) to the training data. Providing opportunities to reflect on the choices made by their users might deepen any insights, squeezing more juice out of the data.
- Centigonal•15d
  Simulation and embodied AI (putting the AI in a robotic arm or a car so it can try stuff and gather information about the results) are very actively being explored.
  - cadamsdotcom•15d
    What about at inference time? ie. in response to a query.
    We let models write code and run it. Which gives them a high chance of getting arithmetic right.
    Solving the “crossing the river” problem by letting the model create and run a simulation would give a pretty high chance of getting it right.
    - Centigonal•14d
      The newest Claude update comes with a python sandbox built right into the API for exactly this reason.
      https://docs.anthropic.com/en/docs/agents-and-tools/tool-use...
- kevmo314•15d
  That would be reinforcement learning. The juice is quite hard to squeeze.
  - cadamsdotcom•15d
    Agreed for most cases.
    Each Cursor rule is a byproduct of tons of work and probably contains lots that can be unpacked. Any research on that?
    - kevmo314•14d
      Yeah, at a very high level it's similar to an actor-critic reinforcement learning algorithm. The rule text is a value function and one could build a critic model that takes as input the rule text and the main model's (the actor's) output to produce a reward.
      This is easier said than done though because this value function is so noisy it's often hard to learn from it. And also whether or not a response (the model output) matches the value function (the Cursor rules) is not even that easy to grade. It's been easier to train the chain-of-thought style reasoning since one can directly score it via the length of thinking.
      This new paper covers some of the difficulties of language-based critic models: https://openreview.net/pdf?id=0tXmtd0vZG
      Generally speaking, the algorithm and approach is not new. Being able to do it in a reasonable amount of compute is the new part.
      - cadamsdotcom•14d
        Suggestion was even simpler - feed a reasoning model a prompt like “tell me a few reasons a user might’ve created this Cursor rule: {RULE_TEXT}”
        Do that for a bunch of rules scraped from a bunch of repos - and you’ve got yourself a dataset for training a new model with - or maybe for fine tuning.
        kevmo314•14d
        Yeah, go for it.
bladecd•3d
The only real important thing in AI is data, not infrastructure, not fancy methods.
somebodythere•15d
I don't know if it matters. Even if the best we can do is get really good at interpolating between solutions to cognitive tasks on the data manifold, the only economically useful human labor left asymptotes toward frontier work; work that only a single-digit percentage of people can actually perform.
seydor•15d
There are new ideas, people are finding new ways to build vision models, which then are applied to language models and vice versa (like diffusion).
The original idea of connectionism is that neural networks can represent any function, which is the fundamental mathematical fact. So we should be optimistic, neural nets will be able to do anything. Which neural nets? So far people have stumbled on a few productive architectures, but it appears to be more alchemy than science. There is no reason why we should think there won't be both new ideas and new data. Biology did it, humans will do it too.
> we’re engaged in a decentralized globalized exercise of Science, where findings are shared openly
Maybe the findings are shared, if they make the Company look good. But the methods are not anymore
JKCalhoun•14d
> It’s not crazy to argue that all the underlying mechanisms of these breakthroughs existed in the 1990s, if not before.
That's not super relevant in my mind. It's because they're showing fruit now that will allow research to move forward. And the success, as we know, draws a lot of eyeballs, dollars, resources.
If this path was going to hit a wall, we will hit it more quickly now. If another way is required to move forward, we are more likely to find it now.
sakex•15d
There are new things being tested and yielding results monthly in modelling. We've deviated quite a bit from the original multi head attention.
ahmedhawas123•14d
I think this is reflective of current state, but does not mean this will be the future. I think there is a lot of innovation to come on revisiting some of the 1990s principles of back propagation and optimization. Imagine if you could train current models to optimal weights in 1 day or 1 hour instead of weeks/months?
Just a hypothesis of mine
•14d
[deleted]
Leon_25•14d
At Axon, we see the same pattern: data quality and diversity make a bigger difference than architecture tweaks. Whether it's AI for logistics or enterprise automation, real progress comes when we unlock new, structured datasets, not when we chase “smarter” models on stale inputs.
Daisywh•14d
If we’re serious about data being more important than models, then where are the Similar to ISO standards for dataset quality? We have so many model metrics, but almost nothing standardized for data integrity or reproducibility.
tantalor•15d
> If data is the only thing that matters, why are 95% of people working on new methods?
Because new methods unlock access to new datasets.
Edit: Oh I see this was a rhetorical question answered in the next paragraph. D'oh
tim333•15d
An interesting step forward, although an old idea we seem close to is recursive self improvement. Get the AI to make a modified version of itself to try to think better.
- mdaniel•14d
  Inbreeding is illegal for a reason
mikewarot•15d
Hardware isn't even close to being out of steam. There are some breathtakingly obvious premature optimizations that we can undo to get at least 99% power reduction for the same amount of compute.
For example, FPGAs use a lot of area and power routing signals across the chip. Those long lines have a large capacitance, and thus cause a large amount of dynamic power loss. So does moving parameters around to/from RAM instead of just loading up a vast array of LUTs with the values once.
rar00•15d
disagree, there are a few organisations exploring novel paths. It's just that throwing new data at an "old" algorithm is much easier and has been a winning strategy. And, also, there's no incentive for a private org to advertise a new idea that seems to be working (mine's a notable exception :D).
scrubs•14d
True or false? One an llm is constructed, it mutates to include data from prompt-response interaction?
1vuio0pswjnm7•14d
What about hardware
Ideas are not new, according to author
But hardware is new and author never mentions impact of hardware improvements
lsy•15d
This seems simplistic, tech and infrastructure play a huge part here. A short and incomplete list of things that contributed:
- Moore's law petering out, steering hardware advancements towards parallelism
- Fast-enough internet creating shift to processing and storage in large server farms, enabling both high-cost training and remote storage of large models
- Social media + search both enlisting consumers as data producers, and necessitating the creation of armies of Mturkers for content moderation + evaluation, later becoming available for tagging and rlhf
- A long-term shift to a text-oriented society, beginning with print capitalism and continuing through the rise of "knowledge work" through to the migration of daily tasks (work, bill paying, shopping) online, that allows a program that only produces text to appear capable of doing many of the things a person does
We may have previously had the technical ideas in the 1990s but we certainly didn't have the ripened infrastructure to put them into practice. If we had the dataset to create an LLM in the 90s, it still would have been astronomically cost-prohibitive to train, both in CPU and human labor, and it wouldn't have as much of an effect on society because you wouldn't be able to hook it up to commerce or day-to-day activities (far fewer texts, emails, ecommerce).
SamaraMichi•15d
This brings us to the problem AI companies are facing, the lack of data, they have already hoovered as much as they can from the internet and desperately need more data.
Which make sit blatantly obvious why we're beginning to see products being marketed under the guise of assistants/tools to aid you whose actual purpose is to gather real world picture and audio data, think meta glasses and what Ives and Altman are cooking up with their partnership.
Kapura•15d
Here's an idea: make the AIs consistent at doing things computers are good at. Here's an anecdote from a friend who's living in Japan:
> i used chatgpt for the first time today and have some lite rage if you wanna hear it. tldr it wasnt correct. i thought of one simple task that it should be good at and it couldnt do that.
> (The kangxi radicals are neatly in order in unicode so you can just ++ thru em. The cjks are not. I couldnt see any clear mapping so i asked gpt to do it. Big mess i had to untangle manually anyway it woulda been faster to look them up by hand (theres 214))
> The big kicker was like, it gave me 213. And i was like, "why is one missing?" Then i put it back in and said count how many numbers are here and it said 214, and there just werent. Like come on you SHOULD be able to count.
If you can make the language models actually interface with what we've been able to do with computers for decades, i imagine many paths open up.
- cheevly•15d
  Many of us have solved this with internal tooling that has not yet been shared or released to the public.
  - layer8•15d
    This needs to be generalized however. For example, if you present an AI with a drawing of some directed graph (a state diagram, for example), it should be able to answer questions based on the precise set of all possible paths in that graph, without someone having to write tooling for diagram or graph processing and traversal. Or, given a photo of a dropped box of matches, an AI should be able to precisely count the matches, as far as they are individually visible (which a human could do by keeping a tally while coloring the matches). There are probably better examples, these are off the cuff.
    There’s an infinite repertoire of such tasks that combine AI capabilities with traditional computer algorithms, and I don’t think we have a generic way of having AI autonomously outsource whatever parts require precision in a reliable way.
    - snapcaster•15d
      What you're describing sounds like agentic tool usage. Have you kept up with the latest developments on that? it's already solved depending on how strict you define your criteria above
      - layer8•15d
        My understanding is that you need to provide and configure task-specific tools. You can’t combine the AI with just a general-purpose computer and have the AI figure out on its own how to make use of it to achieve with reliability and precision whatever task it is given. In other words, the current tool usage isn’t general-purpose in the way the LLM itself is, and also the LLM doesn’t reason about its own capabilities in order to decide how to incorporate computer use to compensate for its own weaknesses. Instead you have to tell the LLM what it should apply the tooling for.
        snapcaster•13d
        Sure, engineering is still required but this doesn't mean it's not a solution to the problem you posed
        Kapura•14d
        this is my understanding; it makes me ask where exactly the "intelligence" is here.
krunck•15d
Until these "AI" systems become always-on, always-thinking, always-processing, progress is stuck. The current push button AI - meaning it only processes when we prompt it - is not how the kind of AI that everyone is dreaming of needs to function.
- fwip•15d
  From a technical perspective, we can do that with a for loop.
  The reason we don't do it isn't because it's hard, it's because it yields worse results for increased cost.
nyrulez•15d
Things haven't changed much in terms of truly new ideas since electricity was invented. Everything else is just applications on top of that. Make the electrons flow in a different way and you get a different outcome.
- nomel•15d
  > Make the electrons flow in a different way and you get a different outcome.
  This happens to be the basis of every aspect of our biology.
blobbers•15d
Why is DeepSeek specifically called out?
TimByte•14d
What happens when we really run out of fresh, high-quality data? YouTube and robotics make sense as next frontiers, but they come with serious scaling, labeling, and privacy headaches
- ChaoPrayaWave•14d
  Feels like we’ve built this massive engine that runs on high octane data, but never stopped to ask what happens when the fuel runs dry. Maybe it’s time to focus more on efficient learning, not just feeding more and more.
AbstractH24•14d
Imagine if the original moores law tracked how often CPUs doubled the semi conductors while still functioning properly 50% of the time.
I don’t think it would have had the same impact
anon291•15d
I mean there's no new ideas for saas but just new applications and that worked out pretty well
ks2048•15d
The latest LLMs are simply multiplying and adding various numbers together... Babylonians were doing that 4000 years ago.
- bobson381•15d
  You are just a lot of interactions of waves. All meaning is assigned. I prefer to think of this like the Goedel generator that found new formal expressions for the Principia - because we have a way of indexing concept-space, there's no telling what we might find in the gaps.
- thenaturalist•15d
  But on clay tables, not in semi-conductive electron prisons separated by one-atom-thick walls.
  Slight difference to those methods, wouldn't you agree?
  - geysersam•14d
    No it's exactly the same. Everything old is new again...
    - thenaturalist•14d
      Results != methodology
NetRunnerSu•14d
Because the externally injected loss function will empty the brain of the model.
Models need to decide for themselves what they should learn.
Eventually, after entering the open world, reinforcement learning/genetic algorithms are still the only perpetual training solution.
https://github.com/dmf-archive/PILF
b0a04gl•15d
[dead]
Night_Thastus•15d
Man I can't wait for this '''''AI''''' stuff to blow over. The back and forth gets a bit exhausting.
- Culonavirus•14d
  Within the bounds of HN audience I would definitely describe myself as an A(G)I skeptic.
  But even I can see that this ""AI"" stuff is not going to blow over. That ship has sailed. Even if the current models get only marginal improvements, the momentum is unquestionably, inarguably there to make the adoption and productization 10x or even 100x wider than it is now. Robotics, automatization, self-driving, all kinds of kiosks, military applications (gathering and merging sensor data, controlling drone swarms, etc.)...
  Just the amount of money (it's going to be trillions before the decade is over) and the amount of students in the field (basically all computer science degrees nowadays teach AI in some form) guarantees we're stuck with ""AI"" forever (at least until it kills us or merges with us)
  - actionfromafar•14d
    1000x more than it is now, at least. Imagine every kind of electronic thing ever made, but with a sprinkle of AI.
    Why?
    For the same reason now 32-bit CPUs and a megabyte of RAM run some Javascript or MicroPython to check a handful of logical conditions and flip a couple of I/O bits is no longer a custom curcuit or a handful of TTL-chips wired together.
  - Night_Thastus•14d
    The reason I think it's going to blow over is that even the best models are frequently quite terrible. The fundamental problems of how they work can't really be fixed because they're a feature - the way that LLMs work.
    And no one has found a way to make any money with it. All the tech companies are burning money by the truckload so investors don't lose confidence, but none of them have actually shown it's a good financial investment.
    At the end of the day, I don't think anyone is going to want to pay what it really costs to run these models, just for a result that is so unreliable. Once they start to stagnate everyone will lose interest.
    The only reason it might stick around is because investors will get desperate to get returns and go full sunk-cost once it starts looking like they made a bad call. (Which they will blame the companies for, of course)
saltserv•15d
[dead]
luppy47474•15d
Hmmm
code_for_monkey•15d
[flagged]
- HeatrayEnjoyer•15d
  Everyone knows we'll always need horseshoes /s
alganet•15d
Dataset? That's so 2000s.
Each crawl on the internet is actually a discrete chunk of a more abstractly defined, constant influx of information streams. Let's call them rivers (it's a big stream).
These rivers can dry up, present seasonal shifts, be poisoned, be barraged.
It will never "get there" and gather enough data to "be done".
--
Regarding "new ideas in AI", I think there could be. But this whole thing is not about AI anymore.