I've just created c/Ollama!

catty@lemmy.world · edit-2 3 days ago

I've just created c/Ollama!

southernbeaver@lemmy.world · 1 day ago

My HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.

brucethemoose@lemmy.world · edit-2 3 hours ago

Oh actually that’s a great card for LLM serving!

Use the llama.cpp server from source, it has better support for Pascal cards than anything else:

https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: https://huggingface.co/unsloth/InternVL3-14B-Instruct-GGUF

Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

I’m a bit ‘behind’ on the vision model scene, so I can look around more if they don’t feel sufficient.