Skip to content

Commit 8384409

Browse files
raidioslovepopesudowindbingochaosxhup
authored
v0.1.0 ready to push (HumanAIGC-Engineering#2)
The first release of OpenAvatarChat. --------- Co-authored-by: 程刚 <lovepope@gmail.com> Co-authored-by: 陈涛 <raidios.tony@gmail.com > Co-authored-by: 王丰 <wfpkueecs@163.com> Co-authored-by: 黄斌超 <523834173@qq.com> Co-authored-by: 徐辉 <csxh47@163.com> Co-authored-by: 何冠桥 <hegq1123@gmail.com>
1 parent 7eed349 commit 8384409

File tree

87 files changed

+3985
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+3985
-0
lines changed

.dockerignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.git/*
2+
models/*
3+
ssl_certs/*
4+
config/*

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.wav filter=lfs diff=lfs merge=lfs -text

.gitignore

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
*.egg-info
2+
*.pyc
3+
*.so
4+
.coverage
5+
.eggs
6+
.idea
7+
.mypy_cache
8+
.vscode
9+
/build
10+
/dist
11+
/docs/_build
12+
13+
# Environments
14+
.env
15+
.venv
16+
env/
17+
venv/
18+
ENV/
19+
env.bak/
20+
venv.bak/
21+
ssl_certs/localhost.crt
22+
ssl_certs/localhost.key
23+
24+
**/sample_output.mp4
25+
results
26+
27+
/sync*.sh
28+
/build_and_run*.sh
29+
models/MiniCPM-o-2_6*
30+
31+
resource/audio
32+
resource/avatar/*
33+
!resource/avatar/put_avatar_here.txt
34+
35+
dump_talk_audio.pcm

.gitmodules

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
[submodule "src/avatar/algo/tts2face_cpu"]
2+
path = src/avatar/algo/tts2face_cpu
3+
url = https://github.com/HumanAIGC/lite-avatar.git
4+
[submodule "src/third_party/silero_vad"]
5+
path = src/third_party/silero_vad
6+
url = https://github.com/snakers4/silero-vad.git
7+
[submodule "src/third_party/MiniCPM-o"]
8+
path = src/third_party/MiniCPM-o
9+
url = https://github.com/OpenBMB/MiniCPM-o.git
10+
[submodule "src/third_party/gradio_webrtc_videochat"]
11+
path = src/third_party/gradio_webrtc_videochat
12+
url = https://github.com/HumanAIGC-Engineering/gradio-webrtc.git

Dockerfile

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
FROM ubuntu:22.04
2+
LABEL authors="HumanAIGC-Engineering"
3+
4+
ENV DEBIAN_FRONTEND=noninteractive
5+
6+
# 替换为清华大学的APT源
7+
RUN sed -i 's/archive.ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list && \
8+
sed -i 's/security.ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list
9+
10+
# 更新包列表并安装必要的依赖
11+
RUN apt-get update && \
12+
apt-get install -y software-properties-common && \
13+
apt-get install -y python3.10 python3.10-dev python3-pip
14+
15+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
16+
17+
# 安装PyTorch GPU版本
18+
RUN pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
19+
20+
ARG WORK_DIR=/root/open-avatar-chat
21+
WORKDIR $WORK_DIR
22+
ADD ./requirements.txt $WORK_DIR/requirements.txt
23+
ADD ./src $WORK_DIR/src
24+
ADD ./resource $WORK_DIR/resource
25+
26+
RUN pip install -r $WORK_DIR/requirements.txt
27+
28+
ENTRYPOINT ["python3", "src/demo.py"]

README.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
<h1 style='text-align: center; margin-bottom: 1rem'> Open Avatar Chat </h1>
2+
3+
<div align="center">
4+
<strong>中文|<a href="./readme_en.md">English</a></strong>
5+
</div>
6+
<h3 style='text-align: center'>
7+
模块化的交互数字人对话实现,能够在单台PC上运行完整功能。
8+
</h3>
9+
<div style="display: flex; flex-direction: row; justify-content: center">
10+
<a href="https://github.com/HumanAIGC-Engineering/OpenAvatarChat" target="_blank"><img alt="Static Badge" style="display: block; padding-right: 5px; height: 20px;" src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"></a>
11+
</div>
12+
13+
## 系统需求
14+
* Python版本 3.10+
15+
* 支持CUDA的GPU
16+
* 未量化的多模态语言模型需要20GB以上的显存。
17+
* 使用int4量化版本的语言模型可以在不到10GB现存的显卡上运行,但可能会因为量化而影响效果。
18+
* 数字人部分使用CPU进行推理,测试设备CPU为i9-13980HX,可以达到30FPS.
19+
20+
## 性能
21+
我们在测试PC上记录了回答的延迟时间,10次平均时间约为2.2秒,测试PC使用i9-13900KF和Nvidia RTX 4090。延迟从人的语音结束到数字人的语音开始计算,其中会包括RTC双向传输数据时间、VAD判停延迟以及整个流程的计算时间。
22+
23+
## 组件依赖
24+
25+
|类型|开源项目|Github地址|模型地址|
26+
|---|---|---|---|
27+
|RTC|HumanAIGC-Engineering/gradio-webrtc|[<img src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"/>](https://github.com/HumanAIGC-Engineering/gradio-webrtc)||
28+
|VAD|snakers4/silero-vad|[<img src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"/>](https://github.com/snakers4/silero-vad)||
29+
|LLM|OpenBMB/MiniCPM-o|[<img src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"/>](https://github.com/OpenBMB/MiniCPM-o)| [🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6)&nbsp;&nbsp;[<img src="./assets/images/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6) |
30+
|LLM-int4|||[🤗](https://huggingface.co/openbmb/MiniCPM-o-2_6-int4)&nbsp;&nbsp;[<img src="./assets/images/modelscope_logo.png" width="20px"></img>](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-int4)|
31+
|Avatar|HumanAIGC/lite-avatar|[<img src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"/>](https://github.com/HumanAIGC/lite-avatar)||
32+
33+
34+
## 安装
35+
**注意1:本项目子模块以及依赖模型都需要使用git lfs模块,请确认lfs功能已安装**
36+
```bash
37+
sudo apt install git-lfs
38+
git lfs install
39+
```
40+
**注意2:本项目通过git子模块方式引用三方库,运行前需要更新子模块**
41+
```bash
42+
git submodule update --init --recursive
43+
```
44+
#### 下载模型
45+
本项目中大部分的模型与资源文件都包含在引入的子模块中了。多模态语言模型任然需要用户自行下载。本项目目前使用MiniCPM-o-2.6作为多模态语言模型为数字人提供对话能力,用户可以按需从[Huggingface](https://huggingface.co/openbmb/MiniCPM-o-2_6)或者[Modelscope](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6)下载相关模型。建议将模型直接下载到 \<ProjectRoot\>/models/ 默认配置的模型路径指向这里,如果放置与其他位置,需要修改配置文件。scripts目录中有对应模型的下载脚本,可供在linux环境下使用,请在项目根目录下运行脚本:
46+
```bash
47+
scripts/download_MiniCPM-o_2.6.sh
48+
```
49+
```bash
50+
scripts/download_MiniCPM-o_2.6-int4.sh
51+
```
52+
**注意:本项目支持MiniCPM-o-2.6的原始模型以及int4量化版本,但量化版本需要安装专用分支的AutoGPTQ,相关细节请参考官方的[说明](https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-int4)**
53+
54+
#### 准备ssl证书
55+
由于本项目使用rtc作为视音频传输的通道,用户如果需要从localhost以为的地方连接服务的话,需要准备ssl证书以开启https,默认配置会读取ssl_certs目录下的localhost.crt和localhost.key,用户可以相应修改配置来使用自己的证书。我们也在scripts目录下提供了生成自签名证书的脚本。需要在项目根目录下运行脚本以使生成的证书被放到默认位置。
56+
```bash
57+
scripts/create_ssl_certs.sh
58+
```
59+
60+
#### 运行
61+
本项目可以以linux容器方式被启动,或者也可以直接启动
62+
* 容器化运行:容器依赖nvidia的容器环境,在准备好支持GPU的docker环境后,运行以下命令即可完成镜像的构建与启动:
63+
```bash
64+
build_and_run.sh
65+
```
66+
* 直接运行:
67+
* 安装依赖
68+
```bash
69+
pip install -r requirements.txt
70+
```
71+
* 启动程序
72+
```bash
73+
python src/demo.py
74+
```
75+
76+
#### 配置
77+
程序默认启动时,会读取 **<project_root>/configs/sample.yaml** 中的配置,用户也可以在启动命令后加上--config参数来选择从其他配置文件启动。
78+
```bash
79+
python src/demo.py --config <配置文件的绝对路径>.yaml
80+
```
81+
可配置的参数列表:
82+
|参数|默认值|说明|
83+
|---|---|---|
84+
|log.log_level|INFO|程序的日志级别。|
85+
|service.host|0.0.0.0|Gradio服务的监听地址。|
86+
|service.port|8282|Gradio服务的监听端口。|
87+
|service.cert_file|ssl_certs/localhost.crt|SSL证书中的证书文件,如果cert_file和cert_key指向的文件都能正确读取,服务将会使用https。|
88+
|service.cert_key|ssl_certs/localhost.key|SSL证书中的证书文件,如果cert_file和cert_key指向的文件都能正确读取,服务将会使用https。|
89+
|chat_engine.model_root|models|模型的根目录。|
90+
|chat_engine.handler_configs|N/A|由各Handler提供的可配置项。|
91+
92+
目前已实现的Handler提供如下的可配置参数:
93+
* VAD
94+
95+
|参数|默认值|说明|
96+
|---|---|---|
97+
|SileraVad.speaking_threshold|0.5|判定输入音频为语音的阈值。|
98+
|SileraVad.start_delay|2048|当模型输出概率持续大于阈值超过这个时间后,将起始超过阈值的时刻认定为说话的开始。以音频采样数为单位。|
99+
|SileraVad.end_delay|2048|当模型输出的概率持续小于阈值超过这个时间后,判定说话内容结束。以音频采样数为单位。|
100+
|SileraVad.buffer_look_back|1024|当使用较高阈值时,语音的起始部分往往有所残缺,该配置在语音的起始点往前回溯一小段时间,避免丢失语音,以音频采样数为单位。|
101+
|SileraVad.speech_padding|512|返回的音频会在起始与结束两端加上这个长度的静音音频,已采样数为单位。|
102+
103+
* 语言模型
104+
105+
|参数|默认值|说明|
106+
|---|---|---|
107+
|S2S_MiniCPM.model_name|MiniCPM-o-2_6|该参数用于选择使用的语言模型,可选"MiniCPM-o-2_6" 或者 "MiniCPM-o-2_6-int4",需要确保model目录下实际模型的目录名与此一致。|
108+
|S2S_MiniCPM.voice_prompt||MiniCPM-o的voice prompt|
109+
|S2S_MiniCPM.assistant_prompt||MiniCPM-o的assistant prompt|
110+
111+
* 数字人
112+
113+
|参数|默认值|说明|
114+
|---|---|---|
115+
|Tts2Face.avatar_name|sample_data|数字人数据名,目前项目仅提供了"sample_data"可供选择,敬请期待。|
116+
|Tts2Face.fps|25|数字人的运行帧率,在性能较好的CPU上,可以设置为30FPS|
117+
|Tts2Face.enable_fast_mode|True|低延迟模式,打开后可以减低回答的延迟,但在性能不足的情况下,可能会在回答的开始产生语音卡顿。|
118+
119+
**注意:所有配置中的路径参数都可以使用绝对路径,或者相对于项目根目录的相对路径。**

assets/images/modelscope_logo.png

6.04 KB
Loading

build_and_run.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/usr/bin/env bash
2+
3+
docker build -t open-avatar-chat:0.0.1 .
4+
docker run --rm --gpus all -it --name open-avatar-chat \
5+
-v `pwd`/models:/root/open-avatar-chat/models \
6+
-v `pwd`/ssl_certs:/root/open-avatar-chat/ssl_certs \
7+
-v `pwd`/config:/root/open-avatar-chat/config \
8+
-p 8282:8282 \
9+
open-avatar-chat:0.0.1

config/sample.yaml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
default:
2+
log:
3+
log_level: "INFO"
4+
service:
5+
host: "0.0.0.0"
6+
port: 8282
7+
cert_file: "ssl_certs/localhost.crt"
8+
cert_key: "ssl_certs/localhost.key"
9+
chat_engine:
10+
model_root: "models"
11+
handler_configs:
12+
SileroVad:
13+
speaking_threshold: 0.5
14+
start_delay: 2048
15+
end_delay: 5000
16+
buffer_look_back: 1024
17+
speech_padding: 512
18+
S2S_MiniCPM:
19+
model_name: "MiniCPM-o-2_6"
20+
# model_name: "MiniCPM-o-2_6-int4"
21+
voice_prompt: "你是一个AI助手。你能接受视频,音频和文本输入并输出语音和文本。模仿输入音频中的声音特征。"
22+
assistant_prompt: "作为助手,你将使用这种声音风格说话。"
23+
Tts2Face:
24+
avatar_name: sample_data
25+
fps: 25
26+
debug: false
27+
enable_fast_mode: True
28+
outputs:
29+
video:
30+
handler: "Tts2Face"
31+
type: "avatar_video"
32+
audio:
33+
handler: "Tts2Face"
34+
type: "avatar_audio"

docs/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)