Below are some of the pre-requisites required to do work with this application. In particular, we assume basic AI inference services are provided by the users.
EPIC requires python 3.12
As an auxiliary for the locally hosted models, the environment will use the OpenAI API Key if the environment variable OPENAI_API_KEY is set. The environment variable will be picked up by pipelines and tests.
Install requirements using python3 (>=3.12) and virtualenv.
Virtual environment is automatically created using the Makefile included in
the repository.
# Initialize
$ make init
... install virtual environment & requirements ...
... runs the ./setup.py and installs the necessary packages in edit mode...
# Finalize (if needed to clean)
$ make fini
... deletes the .venv directory ...
EPIC's LLM training datasets and models are located on Hugging Face repositories. You should create an API key to be able to leverage these repositories.
# Tokens are acquired from https://huggingface.co/settings/tokens
#
# After this, tokens are stored in cat ~/.cache/huggingface/stored_tokens
#
# Below assumes you are in the environment
$ . ./.venv/bin/activate
# Logging in
$ huggingface-cli login
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
Setting a new token will erase the existing one.
To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible):
$This is a one-time task. The tokens will be stored locally and would be re-used.
The application requires a .env file to acquire the knowledge of the current environment. This is achieved by "logging in" to the environment. make env / make env-fini pairs will manage this process.
# Logging in will create a `.env` file based on the default local
# development profile.
$ make env
# Logging out will erase the `.env` file
$ make env-finiOnce "logged in" to the environemnt, you can edit the .env file as fit (i.e., adding an OpenAI API Key).
On a system that has multiple Nvidia GPU instances, you can designate a list of GPUs you
want EPIC to leverage. This is setup by the CUDA_VISIBLE_DEVICES variable in the generated
.env file.
By default this is set to 0 in which it uses the first GPU device found in the system.
You can set this to leverage multiple GPUs by providing a comma separated list with no spaces in between the device IDs.
In the example below, EPIC will use only GPU 3, 4, 5 on the system to run GPU workloads.
# .env file
...
CUDA_VISIBLE_DEVICES=3,4,5
...The main mechanism for this limit achieved by the get_settings() function within src.config. It will load the Settings() object from the .env file and export key environment
variables for the process. For the unit and integratino tests, you need to load the settings by either calling the get_settings() function or by using the settings pytest fixture.
import pytest
# Recommended way
@pytest.mark.integration
def test_using_gpus(settings):
print(os.environ['CUDA_VISIBLE_DEVICES'])
# Alternate way
def test_using_gpus_direct_load():
from src.config import get_settings
get_settings()
print(os.environ['CUDA_VISIBLE_DEVICES'])You can initialize the environment with different system profiles which are located as ./site/***-envs.sh in the ./site directory. The *** portion of the ***-envs.sh file will be the profile name.
$ SITE_PROFILE=<profile_name> make env
# Logging out does not require you to specify the profile
$ make env-finiIf you are running in localdev, you'll need to place some sample data in .data/private/ornl/frontier/jobsummary/.
If you aren't using real data, you can generate synthetic data using:
$ make generate-dataFor local inference, EPIC uses a selected set of Hugging Face models. If you have checked out EPIC for instruction tuning, testing, evaluation, or web application development, you need this step.
These models are downloaded in-line when the models are instantiated in the code, but would result in significant delays due to the large amount of data to be downloaded. To avoid this, prefetch the models by issuing the command below.
This step relies on the make init phase and the Hugging Face credentials
programmed in the previous steps.
# Download the models
$ make fetch-models
Fetching Hugging Face Models (pre-download)
- SQL model
.venv/bin/huggingface-cli download defog/llama-3-sqlcoder-8b
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 12478.10it/s]
/ccsopen/home/wg8/.cache/huggingface/hub/models--defog--llama-3-sqlcoder-8b/snapshots/0f96d32e16737bda1bbe0d8fb13a932a8a3fa0bb
- RAG Embedding model
.venv/bin/huggingface-cli download mixedbread-ai/mxbai-embed-large-v1
Fetching 21 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 5473.21it/s]
/ccsopen/home/wg8/.cache/huggingface/hub/models--mixedbread-ai--mxbai-embed-large-v1/snapshots/e7857440379da569f68f19e8403b69cd7be26e50
- RAG Reranker model
.venv/bin/huggingface-cli download mixedbread-ai/mxbai-rerank-large-v1
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 9599.64it/s]
/ccsopen/home/wg8/.cache/huggingface/hub/models--mixedbread-ai--mxbai-rerank-large-v1/snapshots/11ec161b6e8bed3aaf24797717198eb6191e1fb7
- General Purpose Instruct model
.venv/bin/huggingface-cli download meta-llama/Llama-3.2-3B-Instruct
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 12690.78it/s]
/ccsopen/home/wg8/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95
$ Among the models, the Meta-Llama models would potentially fail as they are "gated" models. Simply go to the corresponding model hub webpage and ask for usage permission. The request will be approved few hours later.
Note that the models are stored in ~/.cache/huggingface/hub. The steps above are idempotent.
The downloads would simply finish quickly if the models are already downloaded and exists
in the cache.
Using the CLI tool epc, run the chat UI locally.
You need to make sure you have activated the python virtual environment created
by make init.
# Type epc chat from the root of the repository
$ epc chat
2024-10-16 14:08:49 - Loaded .env file
INFO: Started server process [59277]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit)Optionally, the Makefile provides a directive that automatically activates
the virtual environment when starting the web chat UI.
$ make chat
2024-10-16 14:08:49 - Loaded .env file
INFO: Started server process [59277]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit)Running below will create a jupyterlab instance that installs a kernel spec
attached to the local virtual environment. Choose a notebook that uses the
epic kernelspec which will give you everything in requirements.txt.
$ make jupyter# Spawns a unix shell
$ make shell
# Spawns an ipython shell
$ make ipythonStarts the necessary background servers for EPIC. (WIP - not implemented yet)
$ make up
... services come up ...
$ make down
... services go down ...TBD: Add more
./src: Directory for python programs we will be using. Most of the code will be implemented within this python package../src/models: Model implementationsTBD...TBD...: Many more package implementations
./site: Site dependencies../data: directory for version controlled data (DVC or LFS)./lab: temporary explorations./pipelines: metaflow workflows
This project uses a workflow engine to implement its batch pipelines. Currently we use metaflow in order to manage DAG pipelines. Example use cases span from RAG ingest pipelines, model finetuning pipelines, evaluation pipelines and etc.
Pipelines in the project are meant to be run in an out-of-band fashion. In itself the pipelines are stand-alone applications that achieve certain goals mainly producing artifacts on the other side. The main application (i.e., web UI) will then consume the artifacts.
Workflows are located in the ./pipelines directory where any subdirectories
within that directory tree is a certain project boundary within this project.
Each project can have multiple workflows (DAG) implemented. Each project would
have its own README.md document that describes the details of the workflow.
The workflows currently support running on a local development laptop or a node that has multiple GPUs, or AWS, GCP kubernetes. On-premise kubernetes is not supported yet (TBD).
The workflows would either load environmental configuration from the
.env file or the configurations defined in ~/.metaflowconfig/ to leverage
the current environment. These configurations are meant to be managed by the
site profile functionality.
Beyond, we simply follow the conventions provided by metaflow. We recommend following the metaflow tutorials before working on creating a pipeline.
A model workflow can be found in ./pipelines/example (FIXME: improve example)
which is good place to start. Simply copy the example directory as a new
project and start implementing a new workflow.
Running the pipeline requires you to activate the repository virtual environment
# Assuming you are at the root of the repository
# Assuming you have already done `make init`
# Activate the environment
$ . ./.venv/bin/activate
# Or, jump into the shell using the provided makefile directive
$ make shellNote that this virtual environment is meant to be minimal to the repository just to get to the point being able to execute the pipelines.
The workflow should mainly rely on the @conda or @pypi metaflow
decorators to fulfill its dependencies for reproducibility. Metaflow has a
capability providing its own python version in this situation.
The recommended way of running the pipeline is from the repository root using the full path to the python file that implements the workflow.
# Example of running an example Tutorial pipeline
$ python3 ./pipelines/example/flow.py --environment=pypi run [options]FIXME: this restriction will be lifted when we gain more experience on running metaflow pipelines in the future.
The src directory is where almost all the common code needs to be
implemented, shared across many pipelines. Code in src is referenced by
simply performing import src from anywhere of the workflow.
One primary use cases of importing src from the workflow is to read the
.env file by leveraging the pydantic settings defined in src.config.
Due to the way how Metaflow packages the directories, a relative symbolic link
to the ./src directory of the repository root should exist alongside the workflow files.
The example of this can be found in ./pipelines/example/. The symbolic link ../../src -> src
is committed into git as with other files.
We provide a pytest pipeline marker (example in ./pipelines/example/flow_test.py) and
a makefile directive make pipeline-test that runs all the tests that should
be defined in ***_test.py alongside the main workflow definition python
files.
The main approach is to do the below.
- Simply run a DAG check (i.e.,
python3 pipelines/<project>/flow.py check) to see if the workflow can load all dependencies and has a sane structure. - Running the pipeline in test mode consuming much less resources with
potentially synthetic data all self contained to the test itself.
(i.e.,
python3 pipelines/<project>/flow.py run --testassuming the workflow itself implementing a test run mode)
This section describes the node local development setup. This development environment will not require additional infrastructure such as kubernetes or a spark cluster and will only rely on a single node configuration.
Initialization of the node local development environment is by default done through defaults that are embedded in the codebase which will be initialized by make localdev (or ./site/local-creds.sh) if there are no .env file existing in the root directory of the repository.
- CPU: More than 4 physical cores
- RAM: At least 8GB of memory to be able to serve models confortably with existing programs.
- GPU: Local models run faster with better GPUs. Supports NVidia GPUs or Apple silicon (M1, M2, M3). With at least one required for normal operations, more GPUs will help inference workloads as well as find tuning workloads.
Optionally, OpenAI API Key is required for the use of larger models during various phases of our model development. Simply set the OPENAI_API_KEY somewhere in the shell or add it into the current .env file.
The localdev environment will use local storage space defaulting to the locations below within the repository.
.scratch: location for temporary data files, downloads & etc.shared: location for persistant data potentially shared among multiple individuals
For localdev, object store access is simply done via local paths similar to the local storage spaces.
If the path does not start with s3a://, the tools will understand that the object store is actually
served by plain POSIX paths local to the machine.
These locations are meant to replicate the telemetry archival locations served by a specific deploy site. In a localdev environment, these locations are meant to be populated either manually or by data loading scripts that fetches
.data/shared: location for the shared open objects..data/private: location for objects with sensitive data.
Private vs. shared is a minimal structure inherited from the authors to support data that needs special care (i.e., PII).