I read a great article from from Armin Ronacher:
It focused on the problem he was having with the current experience available with local model hosting where we can do a lot to reach an optimal setup but do much worse on providing a desired experience. Currently, the experience of getting there can be very overwhelming and very easily lead to a heavily inoptimal setup. I believe the best setup is one that is built into a desired experienced and not just optimal.
Just what is the optimal setup? โ
It is easy to say there are a myrdiad of endlessly unique setups out there but it's not true. There is a matrix of setups split across three dimensions being: use case, response experience, and hardware. Everything you consider in the local inference space will be driven from these three axis.
All of these axis have their own form of variability.
Use Case: One moment you may be coding up a new feature in a complex long lived application and the next you may be writing documentation for a completely new project or perhaps trying to map out the functionality of an existing project and find something specific in there.
Response Experience: It is all about the flavor of response behavior you want whether that be high throughput, high quality, or high creativity. While it is easy to see this experience as part of the use case I believe it is helpful to separate them because you can easily have the use case of product development and want completely different things out of it based on whether you're building an mvp to see what sticks to the wall or the actual production application.
Hardware: It's easy to think in a very static way but what about when you bring in cloud spot-instances for on the fly requests or you get some new hardware temporarily thanks to a cloud free trial. Especially if you are not just serving your own inference needs but others as well then hardware may get very variable to stay cost-effective.
The most optimal inference server choice for you is going to be the one that min-maxs for these three axis. Often because of the variability just discussed this optimal fit will be a mix of different inference server options for different scenarios with tooling to conveniantly switch.
That said, the most optimal inference server setup is not the best setup for you as alluded to above.
Then what is the best setup? โ
The above section about the optimal setup introduces how a setup can be technically described with the work to be done on what but it leaves out the experience.
The experience is where Armin had a problem and where I believe many are. Armin talked about a lack of critical mass in the space as the source and I do believe that is a part of the problem but not the whole of it.
The other glaring issue is the lack of organized educative material in the space for both easily drawing from other's optimized setups and drawing from other's learning. Currently, we have a bunch of spread out technical papers that go deeper than the average user is looking for, project documentation spread over GitHub repositories, PRs, issues, AI doc sites, personal setups hidden away in personal blogs and GitHub gists, and business writeups with no technical help just selling products badly.
Without a good centralized place of learning the advantage of choice and diversity of optimal setups supported becomes a disadvantage to the desired experience because most users on entry to the local model space will face choice paralysis or mess their setup up and get ejected out not to mention those who walk away simply because there is nothing to clearly and confidently dive into.
Without the critical mass that Armin described or the central education and sharing platform that I described the current experience is one of running around planting your head into a great of walls until you find your door if you got stayed at it long enough and got lucky.
The best setup is one where the experience is what you want as well. That could be heavily exploratory to learn as much as possible or straight forward where you provide what you have and what you want to a system and get a moderately optimal setup out of that.
What am I doing here. โ
I'm a resource hoarder and have had to scour across the internet to try and get a unified understanding of effective local model hosting. Because I have some kind geek problem I don't really do the shallow side of the pool with research initiatives like this so for a long time I stayed away from AI and of late I have been reading late into the night and got an all nighter recently cause this stuff has insane potential.
I want to try to organize what I have found as best I can in one place and the idea I came up with for that is writing a mini-booklet inspired by the NixOS flake book which is a god send for every up and coming NixOS user. While I don't want to get as specific into the implementation as them, I do want to have the pointers to all of that organized so a similar experience is available.
My goal with the mini-booklet is to do two main things.
- Make decision framework and variables to consider for people more clear.
- Bring together information highly useful for making decisions in this space.
If we can beat down some of the context paralysis by limiting the scope of the setup problem in people's minds and providing them easy access to a lot of the relevant info in one space I think that could help a lot.
I won't be building anything for aggregating and organizing personal setups anytime soon but may one throw some agents at it one day. ๐