Hacker News

Teever•4m

Bolt Graphics Zeus a New GPU Architecture with Up to 2.25TB of Memory and 800GbE servethehome.com

13 comments

Jasper_•4m
Every year they drop a "new GPU!" announcement with absolutely no details. All their numbers are "in simulation only".
Last year it was their "Thunder" architecture being the world's fastest GPU, now it's Zeus. Neither one actually exists. https://bolt.graphics/bolt-graphics-unveils-thunder-the-worl...
All their blog posts and marketing material are just generic hype about the concept of raytracing. I don't think these guys have an actual product.
- zamadatix•4m
  Is "every year" just "last year too"?
  I'm much more inclined to believe these guys have something to actually deliver but that something will be a lot less exciting to the average HN reader than the vague marketing implies.
semessier•4m
I don't get why more than 2 years after the ChatGPT release moment there is not a plethora of high-mem matrix-matrix matrix-vector hardware for the high-end available. Both high bandwidth and commodity DRAM. Both operations are very well understood for dense and sparse cases. There were FPGAs and ASICs early on but nothing really caught on relative to the GPU behemoth which is tons of silicon on the die that is not needed for base matmult. Hence, it's unbelievable how Nvidia continues to charging for memory to be one of the highest value companies in the world.
- dinfinity•4m
  I believe this is mainly due to everything ML/AI optimizing for CUDA, with even AMD cards (which are very similar to Nvidia cards) unable to compete due to lack of proper support for CUDA.
  - semessier•4m
    this was/is the chip opportunity of the century. Even more optimized than the still general purpose nvidia cards. And no matrix mult is abstracted away for decades, don't need CUDA. So a chip would likely be much much easier than a mixed signal chip with the Apple C1 being on the high end of nightmare in comparison.
- brador•4m
  GPUs are a managed duopoly through agreement. Nvidia takes the high end, AMD the low end. Notice they do not compete. This maximises returns for shareholders of either and both.
gymbeaux•4m
Certainly neat, especially the SFP port and Ethernet port next to the display ports.
- How much will it cost?
- How compatible will it be with existing AI assets (eg models, code for training and inference)?
- Will there have to be a translation layer between RISC and CISC (eg my CPU)? What’s the performance penalty?
- Will I actually be able to get one, or is this for “enterprise” customers only, who must buy a minimum of 100 at a time?
jl6•4m
Hard to know who this is aimed at until we see the price. I guess they are going for the “slow memory, but lots of it” market that is less sensitive to how fast their very large LLMs run as long as they run at all. Hobbyists will rejoice if they can afford it, but is there a commercial use case that can tolerate low tokens per second?
- gymbeaux•4m
  1TB/s memory bandwidth is competitive with NVIDIA GPUs.
  But yes, I’m skeptical that there’s really a market for “runs the LLM, but it’s not usable in real-time”. I guess if you had a list of prompts and you ran them throughout the night while you were sleeping?
  NVIDIA’s Digits or whatever it’s called now has mediocre memory bandwidth, but it only pulls like 200W at full load. So on the one hand, it can’t be used like most people use ChatGPT or Gemini or DeepSeek, but if the alternative is “not running the LLM at all”, I can see some people wanting Digits. It may run LLMs very slowly (technically we don’t know for sure, but it’s likely based on hardware specs), but at least it’s very small and doesn’t use a lot of electricity or put out a lot of heat.
  Although I guess 200W 24/7 is worse than 1kW for an hour or two a day.
- Palomides•4m
  a 400gbe network card runs like $1500, these aren't going to be priced for hobbyists
- gessha•4m
  Large scale unstructured text extraction? Anything you can set up a job for and go to bed?
- zamadatix•4m
  Ignoring the DDR5 completely the 4c-26 is still 256 GB of 1.092 TB/s memory in a single card. Looking to the "starter" 1c26-032 it still has a 400G connection on top of 32 lanes of PCIe Gen 5, so I'm not sure if the card overall will really be cost competitive for "local but slow" LLM use. Prices will say all though.
  The actual RT performance will be really interesting for rendering use cases too.
•4m
[deleted]