We've [1] been using Hetzner's dedicated servers to provide Kubernetes clusters to our clients for a few years now. The performance is certainly excellent, we typically see request times half. And because the hardware is cheaper we can provide dedicated DevOps engineering time to each client. There are some caveats though:
1) A staging cluster for testing updates is really a must. YOLO-ing prod updates on a Sunday is no one's idea of fun.
2) Application level replication is king, followed by block-level replication (we use OpenEBS/Mayastor). After going through all the Postgres operators we found StackGres to (currently) be the best.
3) The Ansible playbooks are your assets. Once you have them down and well-commented for a given service then re-deploying that service in other cases (or again in the future) becomes straightforward.
4) If you can I'd recommend a dedicated 10G network to connect your servers. 1G just isn't quite enough when it comes to the combined load of prod traffic, plus image pulls, plus inter-service traffic. This also gives a 10x latency improvement over AWS intra-az.
5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).
6) MinIO clusters (via the operator) are not that tricky to operate as long as you follow the well trodden path. This provides you with local high-bandwidth, low-latency object storage.
7) The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.
8) You can still push ancillary/annoying tasks off onto cloud providers (personally I'm a fan of CloudFlare for HTTP load balancing).
[1]: https://lithus.eu
> dedicated 10G network to connect your servers
Do you have to ask Hetzner nicely for this? They have a publicly documented 10G uplink option, but that is for external networking and IMHO heavily limited (20TB limit). For internal cluster IO 20TB could easily become a problem
It is under their costing for 'additional hardware'[1]. You need to factor in the switch, uplink for each server, and the NIC for each server.
[1]: https://docs.hetzner.com/robot/general/pricing/price-list-fo...
Hetzner does not charge for internal bandwidth.
> 5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).
Are you willing to share example config for that part?
I don't have one I can share publicly, but if you send me an email I'll see what I can do :-) Email is in my profile.
You'll need a bit of baseline networking knowledge.
Should note that if you don't have enough networking knowledge, this is an excellent way to build a gun to shoot yourself in the foot with. If you misconfigure BGP or don't take basic precautions such as sanity filters on in- and outbound routes, you can easily do something silly like overwrite each server's default route, taking down all your services.
It's not rocket science, but it is complex, and building something complex you don't fully understand for production services can be a very bad idea.
> The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.
Perhaps you could take a look at https://syself.com (Disclaimer: I'm an employee there). We built a platform that gives you production-ready clusters in a few minutes.
> I'd put it at 2-4 months of undistracted skilled engineering time.
How much is that worth to your company/customer vs a higher monthly bill for the next 5 years?
As a consultancy company, you want to sell that. As a customer, I don't see how that's worth it at all, unless I expect a 10k/month AWS bill.
xkcd comes to mind: https://xkcd.com/1319/
> As a consultancy company, you want to sell that. As a customer, I don't see how that's worth it at all.
Well I do rather agree, but as a consultancy I'm biased.
But let's do some math. Say it's 4 months (because who has uninterrupted time), a senior rate of $1000/day. 20 days a month, so 80 days, is an $80k outlay. That's assuming you can get the skills (because AWS et al like to hire these kinds of engineers).
Say one wants a 3 year payback, that is $2,200/month savings you need. Which seems highly achievable given some of the cloud spends I've seen, and that I think an 80-90% reduction in cloud spend is a good ballpark.
The appeal of a consultancy is that we'll remove the up-front investment, provide the skills, de-risk the whole endeavour, even put engineers within your team, but you'll _only_ save 50%.
The latter option is much more appealing in terms of hiring, risk, and cash-flow. But if your company has the skills, the cash, and the risk tolerance then maybe the former approach is best.
EDIT: I actually think the(/our) consultancy option is a really good idea for startups. Their infrastructure ends up being slightly over-built to start with, but very quickly they end up saving a lot of money, and they also get DevOps staffing without having to hire for it. Moreover, the DevOps resource available to them scales with their compute needs. (also we offer 2x the amount of DevOps days for startups for the first year to help them get up and running).
this assumes there are no devops/consulting cost to setup something with AWS. My experience is that "the aws way of doing XYZ" is almost as complicated as doing it the non-AWS-way. On top of that: the non-AWS-way is much more portable across hosting providers, so you decrease your business risks considerably.
I wholeheartedly agree, I'm trying to be generous as I know I have a bias here.
I think the AWS way made clear sense in the days before the current generation of tooling existed, when we were SSH-ing into our snowflake servers (for example). But now we have tools like Kubernetes/Nomad/OpenShift/etc/etc, the logic just doesn't seem to add up any more.
The main argument against it is generally of the form, "Yes, but we don't want to hire for non-cloud/bare-metal". Which is why I think a consultancy provides a good middle ground here – trading off cost savings against business factors.
Can you recommend any resources on how to approach the topic for a startup? Most startups have very similar needs, but every single "batteries included" solution that I've encountered so far explicitly excluded infrastructure and DevOps – either because it's out of scope for the creators, or because that's what they monetize (e.g. supabase).
I don't have practical experience in that, but .NET Aspire looks like the thing you might want (and it has some support of non-.NET on both backend and frontend).
Basically the idea is that you define your infrastructure in a rather short .NET script (e.g. for example postgres + backend + frontend + auth service) and the tooling then lets you either download all the components and launch the whole thing locally, or generate a script of some kind to deploy it to an infrastructure provider (type of script depends on provider). And it provides extensive logging, monitoring, tracing etc out of the box for the majority of the included components with API endpoints and dashboards.
How about we have a chat? I think it is hard for startups to justify implementing this infrastructure from scratch because that is a lot of time & skills that are really best focussed elsewhere.
Ping me an email (see bio), always happy to chat.
Well, it at least assumes the cost to setup the AWS way is sunken. What is a given for anybody that may hire them.
But if you are starting from scratch instead of looking for someone to help you migrate, then yeah, the AWS way has probably higher setup costs than making it portable.