Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,26 @@ python ./demo/run_demo.py
```python
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).half().cuda()
```
* 如果需要使用多显卡加载模型,可以将以下代码:
```python
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()
```
替换为

```python
def get_model():
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
from gpus import load_model_on_gpus
# gpus文件在demo文件夹中
model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
model = model.eval()
return tokenizer, model

tokenizer, model = get_model()
```


## 代码能力评测

Expand Down
19 changes: 19 additions & 0 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,25 @@ python ./demo/run_demo.py
```python
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).half().cuda()
```
* If you need to use Multiple GPUs to load the model, you can use the following code:
```python
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()
```
Replace with

```python
def get_model():
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
from gpus import load_model_on_gpus
# The "gpus" file is located in the demo folder
model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
model = model.eval()
return tokenizer, model

tokenizer, model = get_model()
```

## Evaluation

Expand Down
19 changes: 19 additions & 0 deletions README_FR.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,25 @@ python ./demo/run_demo.py
❗️Attention:
* Cette version de CodeGeeX2 est capable de compléter / expliquer / traduire du code mais n'a pas été fine-tuned pour être utilisé comme un chatbot. Pour accéder à la version chatbot de CodeGeeX, utilisez les extensions [VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex) et [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex).
* Pour controller le langage dans lequel CodeGeeX2 opère, utilisez des tags formattés ainsi: `# language: Python`. La liste de tous les langages de programmations que CodeGeeX supporte est accessible [ici](https://github.com/THUDM/CodeGeeX2/blob/main/evaluation/utils.py#L14).
* Si vous avez besoin d'utiliser plusieurs GPU pour charger le modèle, vous pouvez utiliser le code suivant:
```python
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()
```
Remplacer par

```python
def get_model():
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
from gpus import load_model_on_gpus
# Le fichier "gpus" se trouve dans le dossier de démonstration
model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
model = model.eval()
return tokenizer, model

tokenizer, model = get_model()
```

## Evaluation

Expand Down
19 changes: 19 additions & 0 deletions README_JA.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,25 @@ python ./demo/run_demo.py
❗️注意:
* CodeGeeX2 はベースモデルであり、チャット用の命令チューニングはされていません。コード補完/翻訳/説明のようなタスクは可能です。CodeGeeX のプラグイン([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex))で命令チューニングされたバージョンを試すことができます。
* プログラミング言語は、`# language: Python` のように `language tag` を追加することで制御できます。パフォーマンスを確保するため、書式を守る必要があります。完全なリストは[こちら](https://github.com/THUDM/CodeGeeX2/blob/main/evaluation/utils.py#L14)にあります。より良い結果を得るためには、選択したプログラミング言語のフォーマットでコメントを書いてください。
* 複数のグラフィックカードを使用してモデルをロードする必要がある場合は、以下のコードを使用できます:
```python
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()
```
をに置き換えてください

```python
def get_model():
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
from gpus import load_model_on_gpus
# gpusファイルはdemoフォルダにあります
model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
model = model.eval()
return tokenizer, model

tokenizer, model = get_model()
```
## 評価

CodeGeeX2 は多言語コード生成のベースモデルであり、前世代と比較してコーディング能力が大幅に向上しています。HumanEval、HumanEval-X、DS1000 ベンチマークでの評価結果を以下に示します(評価指標 Pass@k は[論文](https://arxiv.org/abs/2303.17568)と同じです):
Expand Down
59 changes: 59 additions & 0 deletions demo/gpus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import os
from typing import Dict, Tuple, Union, Optional

from torch.nn import Module
from transformers import AutoModel


def auto_configure_device_map(num_gpus: int) -> Dict[str, int]:
# transformer.word_embeddings 占用1层
# transformer.final_layernorm 和 lm_head 占用1层
# transformer.layers 占用 28 层
# 总共30层分配到num_gpus张卡上
num_trans_layers = 28
per_gpu_layers = 30 / num_gpus

# bugfix: 在linux中调用torch.embedding传入的weight,input不在同一device上,导致RuntimeError
# windows下 model.device 会被设置成 transformer.word_embeddings.device
# linux下 model.device 会被设置成 lm_head.device
# 在调用chat或者stream_chat时,input_ids会被放到model.device上
# 如果transformer.word_embeddings.device和model.device不同,则会导致RuntimeError
# 因此这里将transformer.word_embeddings,transformer.final_layernorm,lm_head都放到第一张卡上
# 本文件来源于https://github.com/THUDM/ChatGLM-6B/blob/main/utils.py
# 仅此处做少许修改以支持ChatGLM2,CodeGeeX2
device_map = {
'transformer.embedding.word_embeddings': 0,
'transformer.encoder.final_layernorm': 0,
'transformer.output_layer': 0,
'transformer.rotary_pos_emb': 0,
'lm_head': 0
}

used = 2
gpu_target = 0
for i in range(num_trans_layers):
if used >= per_gpu_layers:
gpu_target += 1
used = 0
assert gpu_target < num_gpus
device_map[f'transformer.encoder.layers.{i}'] = gpu_target
used += 1

return device_map


def load_model_on_gpus(checkpoint_path: Union[str, os.PathLike], num_gpus: int = 2,
device_map: Optional[Dict[str, int]] = None, **kwargs) -> Module:
if num_gpus < 2 and device_map is None:
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half().cuda()
else:
from accelerate import dispatch_model

model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half()

if device_map is None:
device_map = auto_configure_device_map(num_gpus)

model = dispatch_model(model, device_map=device_map)

return model
15 changes: 11 additions & 4 deletions demo/run_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,19 @@

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).to('cuda:0')
model = model.eval()
def get_model():
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).to('cuda:0')
# 如需实现多显卡模型加载,请将上面一行注释并启用一下两行,"num_gpus"调整为自己需求的显卡数量 / To enable Multiple GPUs model loading, please uncomment the line above and enable the following two lines. Adjust "num_gpus" to the desired number of graphics cards.
# from gpus import load_model_on_gpus
# model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
model = model.eval()
return tokenizer, model

tokenizer, model = get_model()

examples = []
with open(os.path.join(os.path.split(os.path.realpath(__file__))[0], "example_inputs.jsonl"), "r") as f:
with open(os.path.join(os.path.split(os.path.realpath(__file__))[0], "example_inputs.jsonl"), "r", encoding="utf-8") as f:
for line in f:
examples.append(list(json.loads(line).values()))

Expand Down