Text Moderation System

一个基于 DistilBERT 的多语言文本审核系统，支持中文、英文、日文的内容审核，适用于低性能设备。

特性

✅ 多语言支持：中文、英文、日文
✅ 多标签分类：正常、色情、暴力、涉政、敏感、辱骂、垃圾广告
✅ 双层检测：结合敏感词库（规则）+ 深度学习模型（语义）
✅ 灵活策略：支持严格、适中、宽松等多种审核策略
✅ 轻量化模型：基于 DistilBERT，适合低性能设备
✅ REST API：FastAPI 服务，易于集成
✅ 可扩展：支持自定义策略和敏感词库

项目结构

text-moderation/
├── configs/                # 配置文件
│   ├── config.yaml        # 主配置
│   └── policies.json      # 审核策略配置
├── data/                  # 数据目录
│   ├── raw/              # 原始数据
│   ├── processed/        # 处理后数据
│   └── lexicon/          # 敏感词库
├── models/                # 模型定义
│   ├── model.py          # DistilBERT 模型
│   └── policy.py         # 策略层
├── training/              # 训练脚本
│   └── train.py          # 训练主程序
├── inference/             # 推理模块
│   ├── predictor.py      # 预测器
│   └── api.py            # FastAPI 服务
├── utils/                 # 工具函数
│   ├── config_loader.py  # 配置加载
│   ├── data_downloader.py # 数据下载
│   ├── data_processor.py # 数据处理
│   ├── lexicon_matcher.py # 敏感词匹配
│   └── metrics.py        # 评估指标
├── scripts/               # 实用脚本
│   ├── download_data.py  # 下载数据
│   ├── quick_test.py     # 快速测试
│   └── demo.py           # 交互式演示
└── requirements.txt       # 依赖包

快速开始

1. 安装依赖

cd text-moderation
pip install -r requirements.txt

2. 下载数据和敏感词库

python scripts/download_data.py

这将下载：

Jigsaw Toxic Comment 数据集（英文）
Sensitive-lexicon 敏感词库（中文）
生成示例数据集用于快速测试

3. 训练模型

使用示例数据训练：

python training/train.py \
    --data data/raw/sample_data.csv \
    --use-lexicon \
    --lexicon-dir data/lexicon/Sensitive-lexicon \
    --output checkpoints \
    --epochs 5

使用真实数据训练：

python training/train.py \
    --data data/raw/jigsaw_toxic/train.csv \
    --use-lexicon \
    --lexicon-dir data/lexicon/Sensitive-lexicon \
    --output checkpoints \
    --batch-size 32 \
    --epochs 10

4. 测试模型

快速测试：

python scripts/quick_test.py

交互式演示：

python scripts/demo.py

5. 启动 API 服务

cd inference
python api.py --host 0.0.0.0 --port 8000

访问 API 文档：http://localhost:8000/docs

使用示例

Python 代码调用

from inference.predictor import ModerationPredictor

# 初始化预测器
predictor = ModerationPredictor(
    model_path="checkpoints/best_model.pt",
    lexicon_dir="data/lexicon/Sensitive-lexicon",
    policy_name="moderate"
)

# 单文本预测
result = predictor.predict("这是一段需要审核的文本", return_details=True)

print(f"安全: {result['is_safe']}")
print(f"动作: {result['action']}")
print(f"严重程度: {result['severity']:.2f}")
print(f"触发标签: {result['triggered_labels']}")
print(f"建议: {result['recommendation']}")

# 批量预测
texts = ["文本1", "文本2", "文本3"]
results = predictor.predict_batch(texts, batch_size=32)

# 切换策略
predictor.set_policy("strict")  # 严格模式

REST API 调用

单文本审核

curl -X POST "http://localhost:8000/moderate" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "这是一段需要审核的文本",
    "policy": "moderate"
  }'

响应：

{
  "text": "这是一段需要审核的文本",
  "is_safe": true,
  "action": "allow",
  "severity": 0.0,
  "policy": "moderate",
  "probabilities": {
    "normal": 0.95,
    "sexual": 0.01,
    "violence": 0.01,
    ...
  },
  "triggered_labels": [],
  "recommendation": "Content is safe for publication."
}

批量审核

curl -X POST "http://localhost:8000/moderate/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["文本1", "文本2", "文本3"],
    "policy": "moderate",
    "return_details": true
  }'

快速审核（简化返回）

curl "http://localhost:8000/moderate/quick?text=测试文本&policy=moderate"

审核策略

系统内置四种审核策略，可在 configs/policies.json 中配置：

1. 严格模式 (strict)

适用场景：青少年平台、教育场景
阈值：较低（0.2-0.4）
动作：直接屏蔽 (block)
人工审核：否

2. 适中模式 (moderate)

适用场景：一般社区、论坛
阈值：中等（0.4-0.6）
动作：警告 (warn)
人工审核：是

3. 宽松模式 (loose)

适用场景：成人社区、自由讨论区
阈值：较高（0.6-0.75）
动作：记录 (log)
人工审核：是

4. 自定义模式 (custom)

可通过 API 或配置文件自定义阈值和行为。

标签说明

标签	说明	示例
normal	正常内容	普通对话、新闻
sexual	色情内容	色情图片、文字
violence	暴力内容	血腥、暴力描述
political	涉政内容	政治敏感话题
sensitive	敏感信息	个人隐私、敏感数据
abuse	辱骂仇恨	谩骂、仇恨言论
spam	垃圾广告	推广、广告链接

配置说明

主配置文件 (config.yaml)

model:
  name: "distilbert-base-multilingual-cased"
  max_length: 128
  num_labels: 7
  dropout: 0.1

training:
  batch_size: 32
  learning_rate: 2.0e-5
  num_epochs: 5
  warmup_ratio: 0.1

device: "cuda"  # 或 "cpu", "mps"

策略配置文件 (policies.json)

{
  "moderate": {
    "description": "适中审核模式",
    "thresholds": {
      "sexual": 0.5,
      "violence": 0.5,
      "political": 0.4,
      "sensitive": 0.5,
      "abuse": 0.5,
      "spam": 0.6
    },
    "action": "warn",
    "require_human_review": true
  }
}

性能优化

1. 模型量化

# 使用 PyTorch 量化
import torch

model = torch.load("checkpoints/best_model.pt")
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)
torch.save(quantized_model, "checkpoints/quantized_model.pt")

2. ONNX 导出

import torch
from models.model import create_model

model = create_model()
model.load_state_dict(torch.load("checkpoints/best_model.pt")['model_state_dict'])
model.eval()

dummy_input = {
    'input_ids': torch.randint(0, 1000, (1, 128)),
    'attention_mask': torch.ones(1, 128)
}

torch.onnx.export(
    model,
    (dummy_input['input_ids'], dummy_input['attention_mask']),
    "checkpoints/model.onnx",
    input_names=['input_ids', 'attention_mask'],
    output_names=['output'],
    dynamic_axes={'input_ids': {0: 'batch'}, 'attention_mask': {0: 'batch'}}
)

数据集

自定义数据格式

训练数据应为 CSV 格式，包含以下列：

text,normal,sexual,violence,political,sensitive,abuse,spam
今天天气真好,1,0,0,0,0,0,0
这是色情内容,0,1,0,0,0,0,0
暴力血腥描述,0,0,1,0,0,0,0

或使用 labels 列（列表格式）：

text,labels
今天天气真好,"[1,0,0,0,0,0,0]"
色情内容,"[0,1,0,0,0,0,0]"

扩展开发

添加新标签

修改 configs/config.yaml：

labels:
  - "normal"
  - "sexual"
  - "violence"
  - "political"
  - "sensitive"
  - "abuse"
  - "spam"
  - "your_new_label"  # 新标签

num_labels: 8  # 更新数量

更新策略配置 configs/policies.json
准备包含新标签的训练数据
重新训练模型

添加自定义敏感词

from utils.lexicon_matcher import LexiconMatcher

matcher = LexiconMatcher("data/lexicon")
matcher.load_lexicon_files()

# 添加自定义词汇
matcher.add_words("sexual", ["新敏感词1", "新敏感词2"])
matcher.add_words("spam", ["广告词1", "广告词2"])

# 保存
matcher.save_lexicon("data/lexicon/custom")

API 文档

启动服务后访问：

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

主要端点：

端点	方法	说明
`/`	GET	API 信息
`/health`	GET	健康检查
`/policies`	GET	列出所有策略
`/moderate`	POST	单文本审核
`/moderate/batch`	POST	批量审核
`/moderate/quick`	POST	快速审核
`/policy/change`	POST	切换策略

常见问题

1. CUDA out of memory

减小 batch_size：

python training/train.py --batch-size 16  # 或更小

2. 敏感词库未加载

确保已下载敏感词库：

python scripts/download_data.py

或手动克隆：

cd data/lexicon
git clone https://github.com/konsheng/Sensitive-lexicon.git

3. 模型预测速度慢

使用 GPU（如果可用）
减小 max_length
使用模型量化
批量预测而非逐条预测

许可证

MIT License

贡献

欢迎提交 Issue 和 Pull Request！

致谢

联系方式

如有问题或建议，请提交 Issue。

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
examples		examples
inference		inference
models		models
scripts		scripts
training		training
utils		utils
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
FILE_LIST.txt		FILE_LIST.txt
FINAL_SUMMARY.md		FINAL_SUMMARY.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT_COMPLETION_REPORT.md		PROJECT_COMPLETION_REPORT.md
PROJECT_STATS.txt		PROJECT_STATS.txt
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
docker-compose.yml		docker-compose.yml
moderate.py		moderate.py
requirements.txt		requirements.txt

License

moecasts/text-moderation

Folders and files

Latest commit

History

Repository files navigation