Articles

One of the major ML-specific challenges involves storing and shipping large model files. Shipping a 7G or 100G binary around can become a real headache.

This challenge intensifies with containerized workflows – docker layer caching won’t help when a layer includes a 100G file. (In fact, some container/layer size limits could cause additional issues.)

For those of us coming from the video game world, this isn’t new. Fourteen years ago, I was shipping 400G of 3D assets and 400G of XML meshes around. Devising ways to do this fast (avoiding shipping unchanged parts) was crucial. Delivering these from edge to clients was just as important. I evolved our release process from a day or so to minutes for a typical release.

If your models don’t change often and you control them, the solutions you’ll need are straightforward. But if your models change frequently (due to retuning?) or come from a third party, you’ll face similar issues to those we had with video game assets.

How much of the file changes between iterations? 1%? 10%? 100%? Large Java JAR files (or anything using ZIP compression) were always problematic because even if 0% of the content changed, 100% of the file did. In contrast, GCC-compiled binaries and tar/gzipped directories were more deterministic.

If 100% changes, your options are limited – you’re stuck shipping the entire file. To mitigate this, keep the build location close to the distribution target. “Close” is relative, but avoid scenarios like tuning models in AWS us-east-1 and shipping them to us-west-2 unnecessarily. Sometimes distributed builds can be better than distributing builds.

When you’re dealing with partial changes, there’s more flexibility. Tools like rsync --no-whole-file shine when both ends act like filesystems. For non-filesystem object stores (e.g., S3), you can use split to divide files into manageable chunks, checksum them, and ship chunks in parallel alongside a manifest file. When files change, re-split them, compare checksums, and ship only the differences. This sounds like a lot of workm but it’s not that bad. Code even exists to do variations on this if you look around. This is far from a new pattern.

If possible, avoid embedding models directly in containers. This requires external state tracking – mapping the model’s location to the container runtime. For example, the container could pull the correct model at startup from a nearby storage location or use a volume to map the model’s on-machine location into the container.

I Expect container tooling to evolve and offer better support for managing large model files. There may even be good places to plug this in today. I know of at least one ML company actively building solutions in this space - I do hope they share them.

This is just speculation, but the challenges of MLOps and large model files are worth exploring. Got questions or want to brainstorm solutions? Reach out at https://www.brasstack.net/#contact.