I have thought about this, but also wondered if it would be as magic without the highly paid team of fantastic SRE and maintainers, and the ridiculous amount of disk and compute available to them.
I imagine it would be as magic as blaze vs bazel out in the wild. That is, you need still someone(s) to do a ton of hard work to make it work right but when it does you do get the magic.
WDYM this seems very familiar. At commit deadbeef I don't need to materialize the full tree to build some subcomponent of the monorepo. Did I miss something?
And as for pricing... are there really that many people working on O(billion) lines of code that can't afford $TalkToUs? I'd reckon that Linux is the biggest source of hobbyist commits and that checks out on my laptop OK (though I'll admit I don't really do much beyond ./configure && make there...)
Oh yea, this is "srcfs the idea" but not "srcfs the project".
I.e. this isn't something battel tested for hundreds of thousands of developers 24/7 over the last years.
But a simple commercial product sold by people that liked what they used.
Well, since android is their flagship example, anyone that wants to build custom android releases for some reason.
With the way things are, you don't need billions of code of your own code to maybe benefit from tools that handle billions of lines of code.
Well they also claim to be able to cache build steps somehow build-system independently.
> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused
> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.
Yeah, I agree. This part is hand waved away without any technical description of how they manage to pull this off since knowing what is even a build step and what dependencies and outputs are are only possible at the process level (to disambiguate multi threaded builds). And then there’s build steps that have side effects which come up a lot with CMake+ninja.
So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.
It’s definitely not ccache as they cover that under compiler wrapper. This works for Android because a good chunk of the tree is probably dead code for a single build (device drivers and whatnot). It’s unclear how they benchmark - they probably include checkout time of the codebase which artificially inflates the cost of the build (you only checkout once). It’s a virtual filesystem like what Facebook has open sourced although they claim to also do build caching without needing a dedicated build system that is aware of this and that part feels very novel
Re: including checkout, it’s extremely unlikely. source: worked on Android for 7 years, 2 hr build time tracks to build time after checkout on 128 core AMD machine; checkout was O(hour), leaving only an hour for build if that was the case.
Meh, content marketing for a commercial biz. There are no interesting technical details here.
I was a build engineer in a previous life. Not for Android apps, but some of the low-effort, high-value tricks I used involved:
* Do your building in a tmpfs if you have the spare RAM and your build (or parts of it) can fit there.
* Don't copy around large files if you can use symlinks, hardlinks, or reflinks instead.
* If you don't care about crash resiliency during the build phase (and you normally should not, each build should be done in a brand-new pristine reproducible environment that can be thrown away), save useless I/O via libeatmydata and similar tools.
* Cross-compilers are much faster than emulation for a native compiler, but there is a greater chance of missing some crucial piece of configuration and silently ending up with a broken artifact. Choose wisely.
The high-value high-effort parts are ruthlessly optimizing your build system and caching intermediate build artifacts that rarely change.
The world desperately needs a good open source VFS that supports Windows, macOS, and Linux. Waaaaay too many companies have independently reinvented this wheel. Someone just needs to do it once, open source it, and then we can all move on.
This. Such a product also solves some AI problems by matting you version very large amounts of training data in a VCS like git, which can then be farmed out for distributed unit testing.
HuggingFace bought XetHub which is really cool. It’s built for massive blobs of weight data. So it’s not a general purpose VCS VFS. The world still needs the latter.
I’d be pretty happy if Git died and it was replacing with a full Sapling implementation. Git is awful so that’d be great. Sigh.
Looks like it's similar in some ways. But they also don't tell too much and even the self-hosting variant is "Talk to us" pricing :/
And as for pricing... are there really that many people working on O(billion) lines of code that can't afford $TalkToUs? I'd reckon that Linux is the biggest source of hobbyist commits and that checks out on my laptop OK (though I'll admit I don't really do much beyond ./configure && make there...)
I.e. this isn't something battel tested for hundreds of thousands of developers 24/7 over the last years. But a simple commercial product sold by people that liked what they used.
Well, since android is their flagship example, anyone that wants to build custom android releases for some reason. With the way things are, you don't need billions of code of your own code to maybe benefit from tools that handle billions of lines of code.
We're going to 1 billion LoC codebases and there's nothing stopping us!
> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused
> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.
Which sounds way too good to be true.
So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.
At the start snapshot the filesystem. Record all files read & written during the step.
Then when this step runs again with the same inputs you can apply the diff from last time.
Some magic to automatically hook into processes and doing this automatically seems possible.
I was a build engineer in a previous life. Not for Android apps, but some of the low-effort, high-value tricks I used involved:
* Do your building in a tmpfs if you have the spare RAM and your build (or parts of it) can fit there.
* Don't copy around large files if you can use symlinks, hardlinks, or reflinks instead.
* If you don't care about crash resiliency during the build phase (and you normally should not, each build should be done in a brand-new pristine reproducible environment that can be thrown away), save useless I/O via libeatmydata and similar tools.
* Cross-compilers are much faster than emulation for a native compiler, but there is a greater chance of missing some crucial piece of configuration and silently ending up with a broken artifact. Choose wisely.
The high-value high-effort parts are ruthlessly optimizing your build system and caching intermediate build artifacts that rarely change.
I’d be pretty happy if Git died and it was replacing with a full Sapling implementation. Git is awful so that’d be great. Sigh.
Though from what I gather form the story, part of the spedup comes from how android composes their build stages.
I.e. speeding up by not downloading everything only helps if you don't need everything you download. And adds up when you download multiple times.
I'm not sure they can actually provide a speedup in a tight developer cycle with a local git checkout and a good build system.