Reddit_Business_Idea_Validator Reddit 商机解析智能体

📋 项目概述

Reddit 收集和分析数据来解析市场需求、用户痛点及竞争格局深度！评论分析！用户画像！找商机！

为什么找市场机会选择 Reddit？商机在具体的问题里

Reddit 汇聚着包罗万象的生活问题和经验分享，是年轻人常用的决策路径，他们相信能在这里找到答案。

对商家而言，要想深入了解今年的消费者在苦恼些什么、真正需要些什么，Reddit 是必经之路。

消费者不是没有需求，而是需求太具体。

核心功能

📊 Reddit 数据抓取: 自动抓取相关帖子和评论数据（使用用户输入作为搜索关键词）
🤖 AI 内容分析: 使用 LLM 分析用户痛点和市场需求
📄 自动化报告生成: 生成专业的市场验证报告

系统流程图

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              系统入口                                         │
│                    python run_agent.py "业务创意"                              │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           环境配置与初始化                                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │
│  │  Config     │  │ Context     │  │ MCP Clients │  │ Storage     │           │
│  │  Manager    │  │  Store      │  │             │  │  Server     │           │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘           │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        Orchestrator Agent 启动                                │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │ 任务: validate_business_idea                                           │   │
│  │ 业务创意: "用户输入的业务创意"                                          │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        1. 数据抓取阶段 (Scraper Agent)                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │ 任务: scrape_data                                                     │   │
│  │ - 使用业务创意作为搜索关键词                                           │   │
│  │ - 通过 Reddit MCP Server 抓取 Reddit 帖子和评论                        │   │
│  │ - 保存 checkpoint: scraping_complete.json                             │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        2. 数据分析阶段 (Analyzer Agent)                        │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │ 任务: analyze_data                                                    │   │
│  │ ├── analyze_posts: 分析帖子内容，提取用户痛点和需求                    │   │
│  │ ├── analyze_comments: 分析评论情感和用户反馈                           │   │
│  │ ├── comments_tag_analysis: 评论标签分析                                │   │
│  │ └── combined_analysis: 综合分析生成市场验证评分                        │   │
│  │ 保存 checkpoint: analysis_complete.json, comments_tag_analysis_complete.json│ │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        3. 报告生成阶段 (Reporter Agent)                        │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │ 任务: generate_and_save_report                                        │   │
│  │ ├── generate_html_report: 生成 HTML 格式报告                          │   │
│  │ ├── save_report: 保存报告到 reports/ 目录                            │   │
│  │ └── 保存 checkpoint: report_saved.json                               │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        4. 结果输出与存储                                      │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │ 输出文件:                                                             │   │
│  │ ├── reports/{business_idea}_{timestamp}.html                          │   │
│  │ ├── agent_context/checkpoints/{run_id}/                               │   │
│  │ │   ├── scraping_complete.json                                        │   │
│  │ │   ├── analysis_complete.json                                        │   │
│  │ │   ├── comments_tag_analysis_complete.json                           │   │
│  │ │   ├── combined_analysis_complete.json                               │   │
│  │ │   └── report_saved.json                                             │   │
│  │ └── 小提示: 相关资料请到 agent_context/checkpoints/{run_id}/ 目录下查看 │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              任务完成                                         │
│                    返回 TaskResult 包含执行结果                                │
└─────────────────────────────────────────────────────────────────────────────────┘

快速开始

1. 安装依赖

# 克隆项目
git clone <repository_url>
cd reddit_business_agent

# 创建虚拟环境（推荐）
python -m venv venv

# 激活虚拟环境
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# 安装依赖
pip install -r requirements.txt

2. 配置 Reddit API

步骤 1: 创建 Reddit 应用

登录 Reddit
访问 Reddit Apps
点击 "create another app..." 或 "create app"
填写应用信息：
- name: 应用名称（例如: BusinessResearchAgent）
- type: 选择 "script"
- description: 应用描述
- about url: 可以留空或填入你的网站
- redirect uri: 填入 http://localhost:8080

步骤 2: 获取凭证

创建成功后，你会看到：

client_id: 应用 ID（14 字符的字符串）
client_secret: 应用密钥

步骤 3: 配置环境变量

编辑 .env 文件，添加以下内容：

# Reddit API Configuration
REDDIT_CLIENT_ID="your_client_id_here"
REDDIT_CLIENT_SECRET="your_client_secret_here"
REDDIT_USER_AGENT="BusinessResearchAgent/1.0 by your_reddit_username"

# OpenAI API Configuration (用于 AI 分析)
OPENAI_API_KEY="your_openai_api_key_here"
OPENAI_BASE_URL="https://api.openai.com/v1"

注意：

client_id 和 client_secret 是从 Reddit Apps 页面获取的
user_agent 格式：<应用名称>/<版本> by <你的Reddit用户名>
OPENAI_API_KEY 用于 AI 内容分析功能

3. 运行测试

# 测试 Reddit API 连接
python test_reddit_connection.py

# 运行端到端测试
python test_end_to_end.py

4. 运行业务验证

# 验证业务创意
python run_agent.py "AI qwen in usa"

# 或使用其他关键词
python run_agent.py "AI deepseek r1 in usa"
python run_agent.py "sell labubu in usa"

使用说明

支持的搜索参数

系统支持以下搜索参数（通过配置文件或代码修改）：

参数	说明	默认值	可选值
`sort`	排序方式	`relevance`	`relevance`, `hot`, `top`, `new`, `comments`
`time_filter`	时间范围	`all`	`all`, `hour`, `day`, `week`, `month`, `year`
`limit`	每次搜索返回的帖子数	`100`	1-1000
`max_comments_per_post`	每个帖子获取的评论数	`50`	1-1000

配置文件说明

agents/config.py - Agent 配置

@dataclass
class ScraperAgentConfig(AgentConfig):
    max_pages_per_keyword: int = 2
    max_posts_to_analyze: int = 20
    max_comments_per_post: int = 50

.env - 环境变量配置

# Reddit API
REDDIT_CLIENT_ID="..."
REDDIT_CLIENT_SECRET="..."
REDDIT_USER_AGENT="..."

# OpenAI API
OPENAI_API_KEY="..."
OPENAI_BASE_URL="https://api.openai.com/v1"

# 其他配置
LOG_LEVEL="INFO"

测试脚本说明

test_reddit_connection.py - Reddit API 连接测试

测试 Reddit API 认证
测试搜索帖子功能
测试获取评论功能
测试批量获取评论功能

test_end_to_end.py - 端到端测试

测试搜索帖子功能
测试获取评论功能
测试批量获取评论功能
测试批量抓取功能
测试批量抓取并合并评论功能

输出文件说明

运行完成后，系统会生成以下文件：

reddit_business_agent/
├── reports/
│   └── {business_idea}_{timestamp}.html    # 市场验证报告（HTML 格式）
└── agent_context/
    └── checkpoints/
        └── {run_id}/
            ├── scraping_complete.json           # 抓取数据
            ├── analysis_complete.json           # 分析结果
            ├── comments_tag_analysis_complete.json  # 评论标签分析
            ├── combined_analysis_complete.json  # 综合分析
            └── report_saved.json                # 报告保存记录

常见问题

Q: Reddit API 请求失败怎么办？

A: 检查以下几点：

确认 .env 文件中的 Reddit 凭证正确
确认 Reddit 应用类型为 "script"
确认 user_agent 格式正确
检查网络连接

Q: 如何提高抓取效率？

A: 可以调整以下参数：

减少 max_posts_to_analyze（默认 20）
减少 max_comments_per_post（默认 50）
使用更具体的关键词

Q: OpenAI API 是必须的吗？

A: 是的，AI 分析功能需要 OpenAI API。如果没有 API Key，可以：

注册 OpenAI 获取 API Key
或使用其他兼容 OpenAI API 的服务

Q: 如何处理 Reddit API 限制？

A: Reddit API 有以下限制：

每分钟请求数限制（默认 60 次/分钟）
建议在 agents/config.py 中调整 retry_config 参数

技术架构

┌─────────────────────────────────────────────────────────────────┐
│                         用户层                                   │
│                    run_agent.py "业务创意"                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Orchestrator Agent                          │
│              任务编排和协调各个子 Agent                           │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│  Scraper Agent   │ │  Analyzer Agent   │ │  Reporter Agent  │
│  数据抓取         │ │  数据分析         │ │  报告生成         │
└──────────────────┘ └──────────────────┘ └──────────────────┘
         │                   │                   │
         ▼                   ▼                   ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Reddit MCP       │ │ LLM MCP           │ │ Storage MCP      │
│ Server           │ │ Server            │ │ Server           │
└──────────────────┘ └──────────────────┘ └──────────────────┘
         │                   │                   │
         ▼                   ▼                   ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Reddit API       │ │ OpenAI API        │ │ 本地文件系统     │
│ (PRAW)           │ │                  │ │                  │
└──────────────────┘ └──────────────────┘ └──────────────────┘

📁 目录结构

reddit_business_agent/
├── models/                          # 数据模型
│   ├── __init__.py
│   ├── agent_models.py              # TaskResult, ProgressUpdate, ExecutionPlan
│   ├── context_models.py            # RunContext, ContextQuery
│   └── business_models.py           # RedditPostModel, RedditCommentModel, etc.
│
├── agents/                          # Agent 核心
│   ├── __init__.py
│   ├── base_agent.py                # Agent 基类
│   ├── context_store.py             # 上下文存储
│   ├── config.py                    # 配置管理（支持 .env）
│   ├── orchestrator.py              # 主编排 Agent
│   ├── subagents/                   # 子 Agents
│   │   ├── __init__.py
│   │   ├── scraper_agent.py         # 数据抓取 Agent
│   │   ├── analyzer_agent.py        # 数据分析 Agent
│   │   └── reporter_agent.py        # 报告生成 Agent
│   └── skills/                      # Skills
│       ├── __init__.py
│       ├── scraper_skills.py
│       ├── analyzer_skills.py
│       └── reporter_skills.py
│
├── mcp_servers/                     # MCP 服务器
│   ├── __init__.py
│   ├── reddit_server.py             # Reddit MCP 服务
│   ├── llm_server.py                # LLM MCP 服务
│   └── storage_server.py            # 存储服务
│
├── tests/                           # 测试
│   ├── __init__.py
│   ├── test_integration.py          # 集成测试
│   └── test_e2e.py                  # 端到端测试
│
├── test_reddit_connection.py        # Reddit API 连接测试
├── test_end_to_end.py               # 端到端测试
├── run_agent.py                     # 主程序入口
├── requirements.txt                 # Python 依赖
├── .env                             # 环境变量配置
└── .env.example                     # 环境变量示例

📊 Reddit 数据指标说明

1. score (得分) ✅ 已利用

用途: 计算 Reddit 帖子的热度（点赞数 - 点踩数）
计算公式: score = upvotes - downvotes
位置: analyzer_skills.py
说明: Reddit 的核心指标，反映帖子的受欢迎程度

2. num_comments (评论数) ✅ 已利用

用途: 计算互动评分、分析用户参与度
计算公式: total_engagement = score + num_comments * 3
位置: analyzer_skills.py
说明: 评论数代表用户讨论热度

3. upvote_ratio (点赞率) ✅ 已利用

用途: 分析内容质量
计算公式: upvote_ratio = upvotes / (upvotes + downvotes)
位置: business_models.py
说明: 范围 0-1，越接近 1 表示内容质量越高

4. created_utc (创建时间) ✅ 已利用

用途: 分析最近活跃度
计算逻辑: 统计最近 30 天发布的帖子数量
位置: analyzer_agent.py
说明: Unix 时间戳格式

🎯 核心计算逻辑

互动评分 (engagement_score)

total_engagement = score + num_comments * 3

if total_engagement > 1000:
    engagement_score = 10
elif total_engagement > 500:
    engagement_score = 8
elif total_engagement > 100:
    engagement_score = 6
elif total_engagement > 50:
    engagement_score = 4
else:
    engagement_score = 2

加权策略

得分 (score): 权重 1×
评论数 (num_comments): 权重 3×（用户参与度高）

📈 指标应用场景

热门帖子排序: 按 total_engagement 降序排列，取 TOP 3
平均互动评分: 所有相关帖子的 engagement_score 平均值
报告展示: 在 HTML 报告中显示平均互动评分
活跃度分析: 统计最近 30 天发布的帖子比例

💡 总结

✅ 所有重要指标都已充分利用，包括：

得分、评论数都参与了互动评分计算
点赞率用于分析内容质量
创建时间用于分析内容活跃度
计算结果用于排序、评分和报告展示

系统对这些指标的利用是完整且合理的。

🔧 依赖说明

核心依赖

praw >= 7.7.0: Python Reddit API Wrapper，用于访问 Reddit API
openai >= 1.0.0: OpenAI API 客户端，用于 AI 内容分析
python-dotenv >= 1.0.0: 环境变量管理
pydantic >= 2.0.0: 数据验证和序列化
httpx >= 0.24.0: 异步 HTTP 客户端

开发依赖

pytest >= 7.0.0: 测试框架
pytest-asyncio >= 0.21.0: 异步测试支持

📝 开发指南

添加新的搜索参数

在 models/business_models.py 中添加参数定义
在 mcp_servers/reddit_server.py 中实现参数处理
在 agents/skills/scraper_skills.py 中添加参数传递

添加新的分析功能

在 agents/skills/analyzer_skills.py 中实现分析逻辑
在 agents/subagents/analyzer_agent.py 中添加任务处理
更新报告模板（如果需要）

📄 许可证

MIT License

🤝 贡献

欢迎提交 Issue 和 Pull Request！

📧 联系方式

如有问题，请提交 Issue 或联系项目维护者。

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agent_context/checkpoints/sell labubu to girls_20260107_145924_bfe11f3d		agent_context/checkpoints/sell labubu to girls_20260107_145924_bfe11f3d
agents		agents
config		config
mcp_servers		mcp_servers
models		models
reddit参考代码		reddit参考代码
reports		reports
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
QWEN.md		QWEN.md
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
run_agent.py		run_agent.py
test_comment_fix.py		test_comment_fix.py
test_end_to_end.py		test_end_to_end.py
test_reddit_connection.py		test_reddit_connection.py
test_tag_analysis.py		test_tag_analysis.py

Folders and files

Latest commit

History

Repository files navigation

Reddit_Business_Idea_Validator Reddit 商机解析智能体

📋 项目概述

核心功能

系统流程图

快速开始

1. 安装依赖

2. 配置 Reddit API

3. 运行测试

4. 运行业务验证

使用说明

支持的搜索参数

配置文件说明

测试脚本说明

输出文件说明

常见问题

技术架构

📁 目录结构

📊 Reddit 数据指标说明

1. score (得分) ✅ 已利用

2. num_comments (评论数) ✅ 已利用

3. upvote_ratio (点赞率) ✅ 已利用

4. created_utc (创建时间) ✅ 已利用

🎯 核心计算逻辑

互动评分 (engagement_score)

加权策略

📈 指标应用场景

💡 总结

🔧 依赖说明

核心依赖

开发依赖

📝 开发指南

添加新的搜索参数

添加新的分析功能

📄 许可证

🤝 贡献

📧 联系方式

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages