Getting sentence-transformers to work in my llm environment has been a real pain, and I’ve been looking for a way to avoid the hassle. So I did a code walk of other projects that offer embeddings, and found that ChromaDB encountered the same problem.

They solved it by using an onnx format model automatically downloaded on the first use. See the code for yourself here.