-
Notifications
You must be signed in to change notification settings - Fork 166
Description
What happened?
Fastembed is storing models in a /tmp/fastembed_cache folder by default, which is a pain to work with, and a waste of resources that have no reason to exist (because the tmp folder is often cleaned up, which causes the clients to often redownload these models without good reason, transfert of GBs through the network have a cost, and should not be encouraged when not needed).
I have been using fastembed for some time (python and rust lib, thanks for the great work!), and never had this issue until now, nowadays I have been greeted with this error, which is weird because I just run my script only twice today, so much less than usual
2025-10-31 16:09:31.611 | ERROR | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)
We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.
What is the expected behaviour?
Embedding models are large, they can be more than 1GB, forcing to download them every day is really a waste of resources and time for everyone that could be easily avoided
By default fastembed should store models in a non tmp folder (there are standard place in home folder to do this, or worst case just create a ~/.fastembed folder
Note also that how fastembed do caching is not well documented: cannot find anything in the docs
In my case I usually deploy this as long running service in docker containers, so I just persist the /tmp/fastembed_cache volume inside my container, e.g. compose conf:
volumes:
- ./.fastembed_cache:/tmp/fastembed_cacheBut I am still facing redownload issues whenever I run stuff out of the container
And imagine that every dev who did not do this volume mapping is redownloading the models at every container restart....
Is there an outstanding reason for causing all these downloads? Am I missing something?
A minimal reproducible example
No response
What Python version are you on? e.g. python --version
3.13
FastEmbed version
latest
What os are you seeing the problem on?
No response
Relevant stack traces and/or logs
2025-10-31 16:09:19.379 | ERROR | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d11f-1dfe1f91302937c221f352d0;5d8158c5-0e6a-4a0d-8c3e-667134688f78)
We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:19.380 | ERROR | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.
2025-10-31 16:09:22.494 | ERROR | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d122-0242b0b53748607116edfd62;9b1a42c9-9ad0-449a-90e1-fe6649713886)
We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:22.494 | ERROR | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 9.0 seconds, 1 retries left.
2025-10-31 16:09:31.611 | ERROR | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)
We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.