I benchmarked these against a few other RISC-V boards. They're pretty fast, relative to RISC-V (although not relative to x86): https://rwmj.wordpress.com/2024/11/19/benchmarking-risc-v-sp...
Note the benchmark is not very rigorous, but it reflects what we want to do with these boards which is to build Fedora packages.
Compile time vs x86 doesn't look so bad, considering the x86 machine has 4x the cores and is zen4 microarchitecture.
This P550 seems like the first "fast" RISC-V CPU actually available.
Cross-builds for GNU toolchain, LLVM, Linux Kernel are going to be much faster on a low end (similar price) x86.
But most packages don't have as solid (or any) cross-build infrastructure as those projects.
I did some benchmarks on RISC-V Linux kernel builds (commit 7503345ac5f5, defconfig) this week. Native on SBCs, cross-build on a couple of x86, docker (qemu-user) on x86.
- 1m45s i9-13900HX cross-compile
- 4m29s Milk-V Pioneer (youtube video from 9 months ago .. close enough)
- 10m10s Ryzen 5 4500U (6 cores, Zen2) laptop cross-compile
- 22m48s RISC-V docker on 24 core i9-13900HX laptop
- 67m35s VisionFive 2 (4x U74)
- 88m4s Lichee Pi 4A (4x C910)
I need a figure for SpacemiT in BPI-F3 / Jupiter / LPi3A / DC-Roma II / MuseBook. I think it'll be more or less the same as the VF2.
My guess is the Milk-V Megrez (EIC7700X @1.8 GHz) might come in around 25 minutes, and this HiFive Premier (same Soc @1.4 GHz) 30 minutes.
So the P550 machines will probably be a little slower than qemu on the i9. Not a lot. But even the VF2 and LPi4A are going to be much faster than qemu on that 6 core Zen2 -- I haven't measured it, but I'm guessing around 130m.
So if you already have that high core count x86, maybe you don't need a P550 machine.
On the other hand it's good to verify on real hardware.
On the gripping hand, with a 16 GB Megrez costing $199 and my i9 costing $1600, if you want a build farm with 5 or 10 or 100 machines then the P550 is looking pretty good.
VF2 (or Mars) is still looking pretty good for price/performance. The problem there is that being limited to 8 GB RAM isn't good. That's not enough for example to do riscv-gnu-toolchain without swapping. That build is fine on my 16 GB LPi4A, but not my VF2.
16 GB or 32 GB on the P550 boards is much more robust.
One more:
- 70m57s Lichee Pi 3A (8x SpacemiT x60 @1.6 GHz)
5% slower than the VisionFive 2, despite the extra cores.
Thanks for this, I was looking to upgrade my VF2 but I'm not sure it's worth it at this stage, the VF2 is painfully slow, and this board doesn't reach 2x perf
Turns out it's better if you use an NVMe adapter in the PCIe slot, in which case it is more than 2x performance. See updated chart.
[dead]
I get similar results here. The Banana Pi BPI-F3 was a big disappointment. I was expecting some improvement over the VisionFive 2, but no dice. A big Linux build at -j8 on the BPI-F3 takes essentially the same time as a -j4 build on the VF2.
Apparently the small level 2 caches on the X60 are crippling.
The P550 actually feels "snappy".
I'm surprised how much faster the Jupiter is than the BPI-F3: 28%.
That's a lot for the same SoC.
And, yes, ridiculously small caches on the BPI-F3 at 0.5 MB for each 4 core cluster, vs 2 MB on the VisionFive 2 and 4 MB on the P550.
The Pioneer still wins for cache and I think real-world speed though, with 4 MB L3 cache per 4 core cluster, but also access to the other 60 MB of L3 cache from the other clusters on the (near) single-threaded parts of your builds (autoconf, linking, that last stubborn .cpp, ...)
The test is probably somewhat disk bound, so I/O architecture matters. For example, we just retested the HiFive Premier P550, but using an NVMe drive (in an adapter in the PCIe slot) instead of the SATA SSD, and performance improved markedly for the exact same hardware. (See updated chart)
As long as you've got enough RAM for a file cache for the active program binaries and header files, I've never noticed any significant difference between SD card, eMMC, USB3, or NVMe storage for software building on the SBCs I have. It might be different on a Pioneer :-)
I just checked the Linux kernel tree I was testing with. It's 7.2 GB, but 5.6 GB of that is `.git`, which isn't used by the build. So only 1.6 GB of actual source. And much of that isn't used by any given build. Not least the 150 MB of `arch` that isn't in `arch/riscv` (which is 27 MB). Over 1 GB is in `drivers`.
riscv-gnu-toolchain has 2.1 GB that isn't in `.git`. Binutils is 488 MB, gcc 1096 MB.
This is all small enough that on an 8 GB or 16 GB board there is going to be essentially zero disk traffic. Even if the disk cache doesn't start off hot, reading less than 2 GB of stuff into disk cache over the course of a 1 hour build? It's like 0.5 MB/s, about 1% of what even an SD card will do.
It just simply doesn't matter.
Edit: checking SD card speed on Linux kernel build directory on VisionFive 2 with totally cold disk cache just after a reboot.
Yeah, so 2m37s seconds to cache everything. vs 67m35s for a kernel build. Maximum possible difference between hot and cold disk cache 3.9% of the build time. PROVIDED only that there is enough RAM that once something has been read it won't be evicted to make room for something else. But in reality it will be much less that that, and possibly unmeasurable. I think most likely what will actually show up is the 30s of CPU time.I'm having trouble seeing how NVMe vs SATA can make any difference, when SD card is already 25x faster than needed.
I'm not familiar with the grub build at all. Is it really big?
The build directory is 790M (vs 16GB of RAM), but nevertheless the choice of underlying storage made a consistent difference in our tests. We ran them 3+ times each so it should be mostly warm cache.
Weird. It really seems like something strange is going on. Assuming you get close to 400 MB/s on the NVMe (which is what people get on the 1 lane M.2 on VF2 etc) then it should be just several seconds to read 790M.
"Thanks Andrea Bolognani for benchmarking the VF2 and P550 with NVMe"
omg. I didn't notice that before.
The two tests were run on DIFFERENT MACHINES by different people.
NVMe result is 28.9% faster than SATA result.
1.8 GHz EIC7700X (e.g. Milk-V Megrez) is 28.6% faster than 1.4 GHz EIC7700X (HiFive Premier)
Mystery explained?
Try on an SD card ... I bet you don't see a significant difference.
Why can't the packages be cross-compiled on a platform with reasonable performance?
Many or most packages don't have good cross-build infrastructure. That's important when you're a Fedora or Ubuntu building 50k+ random packages, not just working on GCC / LLVM / Linux Kernel all the time.
Doing "native" build in an emulator works for just about everything, but with a 10x - 15x slowdown compared to native or cross-build.
While price/performance of RISC-V is currently significantly worse than x86, it's not 10x worse.
A $2500 Milk-V Pioneer (64 cores, 128 GB RAM) builds a RISC-V Linux kernel five times faster than a $1500 x86 laptop (24 cores) using RISC-V docker/qemu.
A $75 VisionFive 2 or BPI-F3 takes 3 times longer than the x86 with qemu but costs 20 times less.
If you're only building one thing at a time and already have the fast x86 ... then, sure, use that. But if you want a build farm then RISC-V native on either the Pioneer or the VF2 is already much better.
These P550 machines are an incremental improvement again, in price/performance.
For more numbers see:
https://news.ycombinator.com/item?id=42395796
The goal is to actually run RISC-V binaries on RISC-V hardware, to see what works and what doesn't. You wouldn't spot code generations bugs like this one if you merely cross-compile and never run the binaries: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=c65046ff2e...
For quite some time to come, the main user of the Fedora riscv64 port will be the Fedora riscv64 builders. With cross-compilation, we wouldn't even have that limited use of the binaries produced.
There's a lot of cases where you want to build something and run it afterwards, such as tests or intermediate tooling used in later steps in the build.
In any case, I actually want to use RISC-V machines for my development environment.
The experience of the Debian folks working on cross-compiling is that ~50% of packages are cross-compilable, and that was only achieved with a lot of work and a lot of patches merged. Also, it regresses quite a lot.
https://crossqa.debian.net/
Not everything can be easily cross compiled, unfortunately.
Its also just plain annoying to configure in many cases