Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
d1a9ef8
feat(di-trainer/di-jobmonitor/di-lcm/di-cli):
renzhe-li Apr 6, 2022
7ef0b56
feat(appconns/dss-mlflow-appconn): Add MLFlow Appconn
James23Wang Apr 6, 2022
5898faf
"feat(appconns/dss-mlss-appconn): Add MLSS Appconn
hexudong111 Apr 6, 2022
cc5aa2a
feat(mf): Add Model Factory Module
bleachzk Apr 6, 2022
4ebbe3d
feat(isntall): Update install file
alexzyWu Apr 6, 2022
3e9746c
feat(isntall): Update install file
alexzyWu Apr 6, 2022
bd8b620
Merge pull request #46 from alexzyWu/master
alexzyWu Apr 6, 2022
c0b7652
docs(docs): Update docs module
alexzyWu Apr 6, 2022
444d2ce
feat(mllabis): Add resource modification and release
alexzyWu Apr 6, 2022
d691742
1.add
uuarttt Apr 6, 2022
e07d105
Merge pull request #48 from hanwutian/master
alexzyWu Apr 6, 2022
33b87f6
Merge pull request #47 from alexzyWu/mllabis-resource-control-feature
alexzyWu Apr 6, 2022
268f286
Merge pull request #45 from bleachzk/master
alexzyWu Apr 6, 2022
3119e8c
Merge pull request #44 from hexudong111/master
alexzyWu Apr 6, 2022
d42fd7e
Merge pull request #42 from James23Wang/dev
alexzyWu Apr 6, 2022
d0f5db6
Merge pull request #43 from renzhe-li/dev
alexzyWu Apr 6, 2022
f8ef67e
Merge branch 'WeBankFinTech:dev-0.3.0' into dev-0.3.0
alexzyWu Apr 7, 2022
a218e8c
fix(mllabis): Remove parameter limit in notebook creation
alexzyWu Apr 7, 2022
191b003
Merge pull request #50 from alexzyWu/mllabis-parameter-limit
alexzyWu Apr 7, 2022
f61b3e8
fix(ui): update ui
alexzyWu Apr 7, 2022
9263079
fix(ui): Remove output in UI
alexzyWu Apr 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs(docs): Update docs module
1、Add appconn deployment and development documents.
2、Update User Manual:Add MLFlow and Model Factory manual.
3、Update QuickStartGuide.
  • Loading branch information
alexzyWu committed Apr 6, 2022
commit c0b76527b53173aef0f3ddc82fd5389e4ec71246
90 changes: 62 additions & 28 deletions docs/zh_CN/Deployment_Documents/DeploymentGuide.md

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions docs/zh_CN/Deployment_Documents/Prophecis Appconn安装文档.md

Large diffs are not rendered by default.

136 changes: 136 additions & 0 deletions docs/zh_CN/Deployment_Documents/配置指引.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
[toc]

## 1. 配置结构

* Prophecis HelmChart
* Notebook Controller Helm Chart
* Seldon Core Helm Chart
## 2. Prophecis Helm Chart

### 2.1 目录结构

* 配置文件位置:**install/Prophecis**
* 文件内容:
* values.yaml #差异化变量
* templates #目录,存放prophecis对应服务的各类k8s资源对象
### 2.2 values配置

* namespace: 命名空间名字
* envir: 环境标志(部分共享目录依赖改标志,可忽略)
* platformNodeSelectors:
* mlss-node-role: prophecis服务运行label
* ceph_path:ceph存储位置
* db:数据库相关配置
* image:镜像仓库相关配置
* gateway:网关地址,默认配置为某一台node地址即可
* minio:对象存储相关配置
* mongo:mongo数据库配置,MLFlow Job及Notebook均有数据存放于此
* elasticsearch:es相关配置,存储日志
* persistent:对应localstack(storage pod)的相关存储目录
* trainer:MLFlow trainer服务的相关配置
* storage:MLFlow storage服务的相关配置
* uploadContainerPath
* share_storage_dir
* restapi:MLFlow restapi服务的相关配置
* lcm:MLFlow lcm服务的相关配置
* fluent_bit_log_path:job日志存放位置
* mlssGroupId:使用公共目录时,job用户的gid
* jm:MLFlow jm服务的相关配置
* trainingdata: MLFlow trainingdata服务的相关配置
* cc: 管理台的相关配置
* admin: 管理员用户相关配置
* user: hadoop
* password: hadoop
* ldap: 登录验证LDAP相关配置
* address: ldap://10.107.96.180:1389/
* baseDN: dc=webank,dc=com
* ccGateway:网关的相关配置
* aide:MLLbais服务相关配置
* startPort: Notebook SparkClient所使用的NodePort的起始端口;
* endPort: Notebook SparkClient所使用的NodePort的结束端口;
* maxSparkSessionNum: Notebook 可使用SparkClient的最大数量;
* hadoop
* enable: 是否开启Hadoop功能,若选择开启,会挂载以下配置路径
* installPath: hadoop、spark安装的相关路径
* commonlibPath: commonlib相关安装路径
* configPath: hadoop、spark配置文件路径
* javaPath: java安装文件路径
* sourceConfigPath: notebook需要加载配置文件路径,该配置会在jupyter lab启动时加载
* hostFilePath: notebook需要加载配置host文件配置,该配置会在jupyter lab启动时加载
* * ui: 前端相关配置
* service:
* bdap:
* nodePort: 网页访问端口
* grafana:
* url: 基础控制台的grafana url配置
* dashboard:
* url: 基础控制台的k8s dashboard url配置
* prometheus:
* utl: 基础控制台的k8s prometheus url配置
* kibana:
* url: 基础控制台的k8s kibana url配置
* linkis:linkis服务相关配置
* linkispro:linkis生产中心相关配置
* mlflow:databrick mlflow地址
* tfjob:分布式tf任务使用镜像地址
* mf:模型工厂服务相关配置
### 2.3 templates配置

* cc: 关于控制台的相关
* gateway: 网关服务和容器部署
* di:
* storage: 存储相关服务
* restapi: 处理外部相关微服务请求
* lcm: 任务队列管理
* jm: job任务监测
* trainer: 管理额模型训练任务
* trainingdata: 获取模型训练中产生的日志
* fluent-bit: 训练日志收集
* mlflow:databrick mlflow服务配置
* ingrastructure: 关于存储服务
* mongo:
* etcd:
* LogCollectorDS: 日志组件
* mllabis: jupyter lab 的相关
* aide:
* persistent: 关于持久化
* etcd:
* mongo:
* ui: 关于网页界面展示
* bdap-ui:
* mf:模型工厂相关
## 3. NotebookController Helm Chart

### 3.1 目录结构

* 配置文件位置:**install/notebook-controller**
* 文件内容:
* values.yaml
* templates目录:
### 3.2 values配置

* namespace: 命名空间名字
* platformNodeSelectors:
* mlss-node-role: prophecis服务运行label
* services:
* expose_node_port: false
* mllabis: jupyter lab 相关
* image:
* repostitory: 仓库位置
* tag: 镜像标签
* service:
* type: 节点类型
* port: 集群内部服务端口
* targetPort: 容器访问端口
* nodePort: 外部访问端口
* controller:
* notebook:
* repository: 镜像仓库
* tag: 镜像标签
* meta:
* repositroy:
* tag
### 3.3 template配置

* metacontroller: CRD的Resouce资源配置
* notebook-controller:notebook CRD相关配置
122 changes: 59 additions & 63 deletions docs/zh_CN/Development_Documents/DevelopmentGuide.md

Large diffs are not rendered by default.

81 changes: 55 additions & 26 deletions docs/zh_CN/QuickStartGuide.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,63 @@
[TOC]



## 1. 安装简述

Prophecis 使用`helm`来进行`kubernetes`包管理,主要安装文件位于install目录下。install目录包含了三个组件`notebook-controller`, `MinioDeployment`, `Prophecis`,主体为`Prophecis`。使用前,需要初始化MySQL数据库,并挂载NFS目录来存储数据。
Prophecis 使用`helm`来进行`kubernetes`包管理,主要安装文件位于install目录下。install目录包含了三个组件`notebook-controller`, `MinioDeployment`, `Prophecis`,主体为`Prophecis`。使用前,需要初始化MySQL数据库,并挂载NFS目录来存储数据。

## 2. 环境准备

### 2.1 机器需求

- 至少两台机器完成部署(单master+service node),单节点部署需去除master label

* 至少两台机器完成部署(单master+service node),单节点部署需去除master label
### 2.2 前置软件

|**软件**|**版本**|**位置**|
|:----|:----|:----|
|Helm|3.2.1|https://github.com/helm/helm/releases|
|Kubenertes|1.18.6|https://github.com/kubernetes/kubernetes|
|Docker|19.03.9||
|Helm|3.2.1|[https://github.com/helm/helm/releases](https://github.com/helm/helm/releases)|
|Kubenertes|1.18.6|[https://github.com/kubernetes/kubernetes](https://github.com/kubernetes/kubernetes)|
|Docker|19.03.9|[https://www.docker.com/](https://www.docker.com/)|
|Istio|1.8.2|[https://github.com/istio/istio](https://github.com/istio/istio)|
|Seldon Core|1.13.0|[https://github.com/SeldonIO/seldon-core](https://github.com/SeldonIO/seldon-core)|
|nfs-utils|1.3.0| |

* 验证Helm
```shell
```powershell
$ helm version
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}
```
* 验证Kubernertes
```shell
```powershell
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
```
* 验证Docker
```shell
```powershell
$ docker version
...
Client: Docker Engine - Community
 Version:           19.03.9
Version: 19.03.9
...
Server: Docker Engine - Community
 Engine:
  Version:          19.03.9
Engine:
Version: 19.03.9
```

### 2.3 目录挂载
* Seldon Core相关:
```powershell
#部署
kubeclt create namespace seldon-system
helm install seldon-core seldon-core-operator . \ --set usageMetrics.enabled=true \ --namespace seldon-system \ --set istio.enabled=true
#验证
kubectl -n seldon-system get pods
```
* Istio相关:
```powershell
#部署
istioctl install
#验证,查看相关Pod是否正常Running
kubectl -n istio-system get pods
```
### 2.3 目录挂载(若不需共享目录可先跳过)

Prophecis使用nfs来存储容器运行数据,需要挂载nfs

Expand Down Expand Up @@ -72,9 +85,8 @@ mount ${NFS_SERVER_IP}:${NFS_PATH_LOG} ${NFS_PATH_LOG}
```
## 3. 配置文件修改

- **修改**`./install/prophecis/values.yaml`中的信息。

```yaml
* **修改**`./install/prophecis/values.yaml`中的信息。
```powershell
## 配置数据库访问的信息
# MySQLIP地址 DATABASE_IP='127.0.0.1'
# MySQL端口号 DATABASE_PORT='3306'
Expand All @@ -87,7 +99,7 @@ database:
name: ${DATABASE_DB}
user: ${DATABASE_USERNAME}
pwd: ${DATABASE_PASSWORD}

## 配置UI的URL访问路径
# 网页访问地址 SERVER_IP='127.0.0.1'
# 网页访问端口 SERVER_PORT='30803'
Expand All @@ -113,7 +125,7 @@ cc:

在数据库内执行`./cc/sql`下的`SQL`文件`prophecis.sql`和`prophecis-data.sql`,需要使用SQL脚本来创建表结构和初始数据

```sql
```powershell
source prophecis.sql
source prophecis-data.sql
```
Expand All @@ -123,7 +135,7 @@ Prophecis部署需要三个组件`notebook-controller`,`MinioDeployment`,`Prophe

**部署执行目录为**`./install`目录下。

```shell
```powershell
## Prophecis默认使用kubernetes的命名空间prophecis,需要创建
kubectl create namespace prophecis
## Prophecis使用kubernetes的节点标签来启动节点,并识别用途
Expand All @@ -138,7 +150,26 @@ helm install minio-prophecis --namespace prophecis ./MinioDeployment
## 安装prophecis组件
helm install prophecis ./prophecis
```
## 5. 环境验证
### 4.3 MLFlow实验工作流相关(mlflow appconn安装)

* 服务配置:在values.yaml中配置Linkis相关地址
* MLFlow Appconn:将appconn相关lib部署于DSS Appconn目录下
* 数据库更新
```powershell
#sql目录
source mlflow-sql.sql
```
* 编译及使用可参考Deployment_Documents中的Prophecis Appconn安装文档
### 4.4 DataSphereStudio相关(mlss appconn安装)

* Appconn安装:将appconn相关lib部署于DSS Appconn目录下
* 数据库更新
```plain
#sql目录
source mlss-sql.sql
```
* 编译及使用可参考Deployment_Documents中的Prophecis Appconn安装文档
## 5. 环境验证

### 5.1 服务验证

Expand All @@ -147,7 +178,5 @@ helm install prophecis ./prophecis

* 所有Pod正常Running后,访问`http://${CLUSTER_IP}:30803`,默认账号为`admin`,密码`admin`。






294 changes: 294 additions & 0 deletions docs/zh_CN/User_Manual/MLFlow使用手册(MLFlowGuide).md

Large diffs are not rendered by default.

370 changes: 370 additions & 0 deletions docs/zh_CN/User_Manual/MLLabis用户手册(MLLabisGuide).md

Large diffs are not rendered by default.

155 changes: 155 additions & 0 deletions docs/zh_CN/User_Manual/模型工厂用户手册(ModelFactoryGuide).md

Large diffs are not rendered by default.

127 changes: 127 additions & 0 deletions docs/zh_CN/User_Manual/管理台用户手册(CCControllerGuide).md

Large diffs are not rendered by default.