[Bug]: downloaded embedding models should not go to a temporary folder by default

### What happened?

Fastembed is storing models in a `/tmp/fastembed_cache` folder by default, which is a pain to work with, and a waste of resources that have no reason to exist (because the tmp folder is often cleaned up, which causes the clients to often redownload these models without good reason, transfert of GBs through the network have a cost, and should not be encouraged when not needed).

I have been using fastembed for some time (python and rust lib, thanks for the great work!), and never had this issue until now, nowadays I have been greeted with this error, which is weird because I just run my script only twice today, so much less than usual

```
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.
```

### What is the expected behaviour?

Embedding models are large, they can be more than 1GB, forcing to download them every day is really a waste of resources and time for everyone that could be easily avoided

By default **fastembed should store models in a non tmp folder** (there are standard place in home folder to do this, or worst case just create a `~/.fastembed` folder


Note also that how fastembed do caching is not well documented: cannot find anything in the docs

In my case I usually deploy this as long running service in docker containers, so I just persist the `/tmp/fastembed_cache` volume inside my container, e.g. compose conf:

```yml
    volumes:
      - ./.fastembed_cache:/tmp/fastembed_cache
```

But I am still facing redownload issues whenever I run stuff out of the container

> And imagine that every dev who did not do this volume mapping is redownloading the models at every container restart....


Is there an outstanding reason for causing all these downloads? Am I missing something?

### A minimal reproducible example

_No response_

### What Python version are you on? e.g. python --version

3.13

### FastEmbed version

latest

### What os are you seeing the problem on?

_No response_

### Relevant stack traces and/or logs

```shell
2025-10-31 16:09:19.379 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d11f-1dfe1f91302937c221f352d0;5d8158c5-0e6a-4a0d-8c3e-667134688f78)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:19.380 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.
2025-10-31 16:09:22.494 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d122-0242b0b53748607116edfd62;9b1a42c9-9ad0-449a-90e1-fe6649713886)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:22.494 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 9.0 seconds, 1 retries left.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: downloaded embedding models should not go to a temporary folder by default #569

What happened?

What is the expected behaviour?

A minimal reproducible example

What Python version are you on? e.g. python --version

FastEmbed version

What os are you seeing the problem on?

Relevant stack traces and/or logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: downloaded embedding models should not go to a temporary folder by default #569

Description

What happened?

What is the expected behaviour?

A minimal reproducible example

What Python version are you on? e.g. python --version

FastEmbed version

What os are you seeing the problem on?

Relevant stack traces and/or logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions