Skip to content

[Bug]: downloaded embedding models should not go to a temporary folder by default #569

@vemonet

Description

@vemonet

What happened?

Fastembed is storing models in a /tmp/fastembed_cache folder by default, which is a pain to work with, and a waste of resources that have no reason to exist (because the tmp folder is often cleaned up, which causes the clients to often redownload these models without good reason, transfert of GBs through the network have a cost, and should not be encouraged when not needed).

I have been using fastembed for some time (python and rust lib, thanks for the great work!), and never had this issue until now, nowadays I have been greeted with this error, which is weird because I just run my script only twice today, so much less than usual

2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.

What is the expected behaviour?

Embedding models are large, they can be more than 1GB, forcing to download them every day is really a waste of resources and time for everyone that could be easily avoided

By default fastembed should store models in a non tmp folder (there are standard place in home folder to do this, or worst case just create a ~/.fastembed folder

Note also that how fastembed do caching is not well documented: cannot find anything in the docs

In my case I usually deploy this as long running service in docker containers, so I just persist the /tmp/fastembed_cache volume inside my container, e.g. compose conf:

    volumes:
      - ./.fastembed_cache:/tmp/fastembed_cache

But I am still facing redownload issues whenever I run stuff out of the container

And imagine that every dev who did not do this volume mapping is redownloading the models at every container restart....

Is there an outstanding reason for causing all these downloads? Am I missing something?

A minimal reproducible example

No response

What Python version are you on? e.g. python --version

3.13

FastEmbed version

latest

What os are you seeing the problem on?

No response

Relevant stack traces and/or logs

2025-10-31 16:09:19.379 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d11f-1dfe1f91302937c221f352d0;5d8158c5-0e6a-4a0d-8c3e-667134688f78)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:19.380 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.
2025-10-31 16:09:22.494 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d122-0242b0b53748607116edfd62;9b1a42c9-9ad0-449a-90e1-fe6649713886)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:22.494 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 9.0 seconds, 1 retries left.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions