Tags: ML, managed services, infrastructure, production
Someone asked today:
I have some questions about managed ML offerings like AWS Sagemaker or GCP Vertex AI vs. running the ML workload in ECS, EKS or GKE…Does a managed solution run into significantly greater cost? Is a managed solution easier to maintain?
Remember the 300% production problem.
For everything that you build, you need expertise in:
With managed solutions, you offload some of #2 to your service provider and their support staff. Your expertise is still needed at the interface to the solution, which in theory is simplified.
With DIY solutions, you need a bunch of different pieces of expertise. Of course, building DIY on top of managed services still offloads some of that, but in the modern cloud, that’s not as much as you might hope—it just shifts around to new labels.
#2 is always where complexity and scope creep explode, so this is the place managed services can really shine, if they’re done right for you. (And for you is important. A solution that works for my team and your team is always going to be different.)
Lee talks about this more (in a slightly different context, but the same idea):
https://leebriggs.co.uk/blog/2023/09/28/300_percent_problem