Anyone know of tools kind of like #languageServerProtocol but instead of keeping the AST in memory they do streaming parse / search on the fly? I'm not a huge fan of masssive memory use and it feels like we're leaving performance on the table by parsing entire files/folders instead of just enough to get to what you want.
@lvk ooo yeah. I wonder how hard it'd be. Was most of the state just the context and kv cache? Sounds like not a lot. For some reason I though Ollama had some sort of "continue from the last call" feature already but I might be hallucinatinf
@lvk I think llama.cpp does this with this save/load demo actually: https://github.com/ggml-org/llama.cpp/blob/master/examples/save-load-state/save-load-state.cpp
@mauve nice! OK so I'm waiting for the future to happen, while it already happened in some FOSS project.
Story of my life 😅
@mauve probably unrelated, but I'd love something similar for local LLM's too..something which does not require to keep it all in memory..restore ollama state-snapshot via swapfile etc..