I am using a code-completion model for (will be open sourced very soon).
Qwen2.5-coder 1.5b though tends to repeat what has already been written, or change it slightly. (See the video)
Is this intentional? I am passing the prefix and suffix correctly to ollama, so it knows where it currently is. I’m also trimming the amount of lines it can see, so the time-to-first-token isn’t too long.
Do you have a recommendation for a better code model, better suited for this?
If you want in line completions, you need a model that is trained on “fill in the middle” tasks. On their Huggingface page they even say that this is not supported and needs fine tuning:
We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model.
A model that can do it is:
- starcoder2
- codegemma
- codellama
Another option is to just use the qwen model, but instead of only adding a few lines let it rewrite the entire function each time.