GitHub - santoshdannana/BettaFish: A Multi-Agent Public Opinion Analysis Assistant for Everyone - Break Through Information Echo Chambers, Restore the True Picture of Public Opinion, Predict Future Trends, and Assist Decision-Making! Built from Scratch, No Framework Dependencies.

Important

One-click online deployment experience will be available on Monday (11.3), stay tuned!

⚡ Project Overview

"WeiYu (MicroOpinion)" is an innovative multi-agent public opinion analysis system built from scratch, helping users break through information echo chambers, restore the true picture of public opinion, predict future trends, and assist in decision-making. Users simply need to submit analysis requirements like a chat conversation, and the intelligent agents automatically analyze 30+ mainstream domestic and international social media platforms and millions of public comments.

"WeiYu" (微舆) is a homophone of "WeiYu" (微鱼, meaning "small fish"). BettaFish is a small but very aggressive and beautiful fish, symbolizing "small but powerful, fearless of challenges"

View the system-generated research report using "Wuhan University public opinion" as an example: Wuhan University Brand Reputation In-depth Analysis Report

Not only reflected in report quality, compared to similar products, we have 🚀 six major advantages:

AI-Driven Full-Domain Monitoring: AI crawler clusters operate 24/7 non-stop, comprehensively covering 10+ key domestic and international social media platforms including Weibo, Xiaohongshu, Douyin, Kuaishou, etc. Not only capturing trending content in real-time, but also drilling down to massive user comments, allowing you to hear the most authentic and widespread public voice.
Composite Analysis Engine Beyond LLM: We not only rely on 5 types of professionally designed Agents but also integrate fine-tuned models, statistical models, and other middleware. Through multi-model collaboration, we ensure depth, accuracy, and multi-dimensional perspectives in analysis results.
Powerful Multimodal Capabilities: Breaking through text and image limitations, capable of deep analysis of short video content from Douyin and Kuaishou, and precise extraction of structured multimodal information cards such as weather, calendar, and stock data from modern search engines, giving you comprehensive control of public opinion dynamics.
Agent "Forum" Collaboration Mechanism: Endowing different Agents with unique toolsets and thinking patterns, introducing a debate moderator model, and conducting chain-of-thought collision and debate through a "forum" mechanism. This not only avoids the thinking limitations of a single model and homogenization caused by communication but also generates higher-quality collective intelligence and decision support.
Seamless Integration of Public and Private Domain Data: The platform not only analyzes public opinion but also provides highly secure interfaces supporting seamless integration of your internal business database with public opinion data. Breaking down data silos to provide powerful analytical capabilities of "external trends + internal insights" for vertical businesses.
Lightweight and Highly Extensible Framework: Based on pure Python modular design, achieving lightweight, one-click deployment. Clear code structure allows developers to easily integrate custom models and business logic, enabling rapid expansion and deep customization of the platform.

Starting with public opinion, but not limited to public opinion. The goal of "WeiYu" is to become a simple and universal data analysis engine driving all business scenarios.

For example, you only need to simply modify the API parameters and prompts in the Agent toolset to transform it into a financial market analysis system

Here's a fairly active project discussion thread on Linux.do: https://linux.do/t/topic/1009280

Say goodbye to traditional data dashboards. In "WeiYu", everything starts with a simple question. You only need to submit your analysis requirements like a conversation.

🏗️ System Architecture

Overall Architecture Diagram

Insight Agent Private Database Mining: AI agent for deep analysis of private public opinion databases

Media Agent Multimodal Content Analysis: AI agent with powerful multimodal capabilities

Query Agent Precise Information Search: AI agent with domestic and international web search capabilities

Report Agent Intelligent Report Generation: Multi-round report generation AI agent with built-in templates

Complete Analysis Workflow

Step	Phase Name	Main Operations	Participating Components	Loop Characteristics
1	User Query	Flask main app receives query	Flask main app	-
2	Parallel Startup	Three Agents start simultaneously	Query Agent, Media Agent, Insight Agent	-
3	Preliminary Analysis	Each Agent uses dedicated tools for overview search	Each Agent + dedicated toolset	-
4	Strategy Formulation	Develop segmented research strategy based on preliminary results	Internal decision module of each Agent	-
5-N	Loop Phase	Forum Collaboration + Deep Research	ForumEngine + All Agents	Multiple Rounds
5.1	Deep Research	Each Agent conducts specialized search guided by forum moderator	Each Agent + reflection mechanism + forum guidance	Each round
5.2	Forum Collaboration	ForumEngine monitors Agent speeches and generates moderator summary	ForumEngine + LLM moderator	Each round
5.3	Communication Integration	Each Agent adjusts research direction based on discussion	Each Agent + forum_reader tool	Each round
N+1	Result Integration	Report Agent collects all analysis results and forum content	Report Agent	-
N+2	Report Generation	Dynamically selects templates and styles, generates final report in multiple rounds	Report Agent + template engine	-

Project Code Structure Tree

Weibo_PublicOpinion_AnalysisSystem/
├── QueryEngine/                   # Domestic and international news breadth search Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interface wrapper
│   ├── nodes/                     # Processing nodes
│   ├── tools/                     # Search tools
│   ├── utils/                     # Utility functions
│   └── ...                        # Other modules
├── MediaEngine/                   # Powerful multimodal understanding Agent
│   ├── agent.py                   # Agent main logic
│   ├── nodes/                     # Processing nodes
│   ├── llms/                      # LLM interface
│   ├── tools/                     # Search tools
│   ├── utils/                     # Utility functions
│   └── ...                        # Other modules
├── InsightEngine/                 # Private database mining Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interface wrapper
│   │   └── base.py                # Unified OpenAI compatible client
│   ├── nodes/                     # Processing nodes
│   │   ├── base_node.py           # Base node class
│   │   ├── formatting_node.py     # Formatting node
│   │   ├── report_structure_node.py # Report structure node
│   │   ├── search_node.py         # Search node
│   │   └── summary_node.py        # Summary node
│   ├── tools/                     # Database query and analysis tools
│   │   ├── keyword_optimizer.py   # Qwen keyword optimization middleware
│   │   ├── search.py              # Database operation toolset
│   │   └── sentiment_analyzer.py  # Sentiment analysis integration tool
│   ├── state/                     # State management
│   │   ├── __init__.py
│   │   └── state.py               # Agent state definition
│   ├── prompts/                   # Prompt templates
│   │   ├── __init__.py
│   │   └── prompts.py             # Various prompts
│   └── utils/                     # Utility functions
│       ├── __init__.py
│       ├── config.py              # Configuration management
│       └── text_processing.py     # Text processing tools
├── ReportEngine/                  # Multi-round report generation Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interface
│   ├── nodes/                     # Report generation nodes
│   │   ├── template_selection.py  # Template selection node
│   │   └── html_generation.py     # HTML generation node
│   ├── report_template/           # Report template library
│   │   ├── 社会公共热点事件分析.md
│   │   ├── 商业品牌舆情监测.md
│   │   └── ...                    # More templates
│   └── flask_interface.py         # Flask API interface
├── ForumEngine/                   # Simple Forum Engine implementation
│   ├── monitor.py                 # Log monitoring and forum management
│   └── llm_host.py                # Forum moderator LLM module
├── MindSpider/                    # Weibo crawler system
│   ├── main.py                    # Crawler main program
│   ├── config.py                  # Crawler configuration file
│   ├── BroadTopicExtraction/      # Topic extraction module
│   │   ├── database_manager.py    # Database manager
│   │   ├── get_today_news.py      # Today's news acquisition
│   │   ├── main.py                # Topic extraction main program
│   │   └── topic_extractor.py     # Topic extractor
│   ├── DeepSentimentCrawling/     # Deep sentiment crawling
│   │   ├── keyword_manager.py     # Keyword manager
│   │   ├── main.py                # Deep crawling main program
│   │   ├── MediaCrawler/          # Media crawler core
│   │   └── platform_crawler.py    # Platform crawler management
│   └── schema/                    # Database structure
│       ├── db_manager.py          # Database manager
│       ├── init_database.py       # Database initialization
│       └── mindspider_tables.sql  # Database table structure
├── SentimentAnalysisModel/        # Sentiment analysis model collection
│   ├── WeiboSentiment_Finetuned/  # Fine-tuned BERT/GPT-2 models
│   ├── WeiboMultilingualSentiment/# Multilingual sentiment analysis (recommended)
│   ├── WeiboSentiment_SmallQwen/  # Small parameter Qwen3 fine-tuning
│   └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
├── SingleEngineApp/               # Streamlit applications for individual Agents
│   ├── query_engine_streamlit_app.py
│   ├── media_engine_streamlit_app.py
│   └── insight_engine_streamlit_app.py
├── templates/                     # Flask templates
│   └── index.html                 # Main interface frontend
├── static/                        # Static resources
├── logs/                          # Runtime logs directory
├── final_reports/                 # Final generated HTML report files
├── utils/                         # Common utility functions
│   ├── forum_reader.py            # Inter-Agent forum communication
│   └── retry_helper.py            # Network request retry mechanism tool
├── app.py                         # Flask main application entry
├── config.py                      # Global configuration file
└── requirements.txt               # Python dependency list

🚀 Quick Start

If you're learning how to build an Agent system for the first time, you can start with a very simple demo: Deep Search Agent Demo

Environment Requirements

Operating System: Windows, Linux, MacOS
Python Version: 3.9+
Conda: Anaconda or Miniconda
Database: MySQL (optional cloud database service available)
Memory: 2GB+ recommended

1. Create Conda Environment

# Create conda environment
conda create -n your_conda_name python=3.11
conda activate your_conda_name

2. Install Dependencies

# Install basic dependencies
pip install -r requirements.txt
# If you don't want to use local sentiment analysis models (very low computing requirements, CPU version installed by default), you can comment out the "Machine Learning" section in this file before executing the command

3. Install Playwright Browser Driver

# Install browser driver (for crawler functionality)
playwright install chromium

4. Configure System

4.1 Configure API Keys

Edit the config.py file and fill in your API keys (you can also choose your own models and search proxies, see details in the config file):

# MySQL database configuration
DB_HOST = "localhost"
DB_PORT = 3306
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_NAME = "your_db_name"
DB_CHARSET = "utf8mb4"

# LLM configuration
# You can change the API used by each LLM section, as long as it's compatible with OpenAI request format

# Insight Agent
INSIGHT_ENGINE_API_KEY = "your_api_key"
INSIGHT_ENGINE_BASE_URL = "https://api.moonshot.cn/v1"
INSIGHT_ENGINE_MODEL_NAME = "kimi-k2-0711-preview"
# Media Agent
...

4.2 Database Initialization

Option 1: Use Local Database

The MindSpider crawler system and public opinion system are independent of each other, so you need to configure MindSpider\config.py separately

# Local MySQL database initialization
cd MindSpider
python schema/init_database.py

Option 2: Use Cloud Database Service (Recommended)

We provide convenient cloud database service, including 100,000+ daily real public opinion data, currently free to apply!

Real public opinion data, updated in real-time
Multi-dimensional tag classification
High-availability cloud service
Professional technical support

Contact us to apply for free cloud database access: 📧 670939375@qq.com

For data compliance review and service upgrade, the cloud database will suspend accepting new applications from October 1, 2025

5. Start System

5.1 Complete System Startup (Recommended)

# In the project root directory, activate conda environment
conda activate your_conda_name

# Start the main application
python app.py

Note 1: After a run terminates, the streamlit app may end abnormally and still occupy the port. In this case, search for the process occupying the port and kill it.

Note 2: Data crawling requires separate operations, see section 5.3 for guidance

Note 3: If page display issues occur during remote server deployment, see PR#45

Visit http://localhost:5000 to use the complete system

5.2 Start Individual Agent Separately

# Start QueryEngine
streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503

# Start MediaEngine  
streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502

# Start InsightEngine
streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501

5.3 Crawler System Standalone Use

This section has detailed configuration documentation: MindSpider Usage Instructions

MindSpider Running Example

# Enter crawler directory
cd MindSpider

# Project initialization
python main.py --setup

# Run complete crawler workflow
python main.py --complete --date 2024-01-20

# Run topic extraction only
python main.py --broad-topic --date 2024-01-20

# Run deep crawling only
python main.py --deep-sentiment --platforms xhs dy wb

⚙️ Advanced Configuration

Modify Key Parameters

Agent Configuration Parameters

Each Agent has a dedicated configuration file that can be adjusted according to needs. Here are some examples:

# QueryEngine/utils/config.py
class Config:
    max_reflections = 2           # Number of reflection rounds
    max_search_results = 15       # Maximum search results
    max_content_length = 8000     # Maximum content length
    
# MediaEngine/utils/config.py  
class Config:
    comprehensive_search_limit = 10  # Comprehensive search limit
    web_search_limit = 15           # Web search limit
    
# InsightEngine/utils/config.py
class Config:
    default_search_topic_globally_limit = 200    # Global search limit
    default_get_comments_limit = 500             # Comment retrieval limit
    max_search_results_for_llm = 50              # Maximum results for LLM

Sentiment Analysis Model Configuration

# InsightEngine/tools/sentiment_analyzer.py
SENTIMENT_CONFIG = {
    'model_type': 'multilingual',     # Options: 'bert', 'multilingual', 'qwen', etc.
    'confidence_threshold': 0.8,      # Confidence threshold
    'batch_size': 32,                 # Batch size
    'max_sequence_length': 512,       # Maximum sequence length
}

Integrate Different LLM Models

Supports any LLM provider with OpenAI call format. Simply fill in the corresponding KEY, BASE_URL, and MODEL_NAME in /config.py.

What is OpenAI call format? Here's a simple example:

from openai import OpenAI

client = OpenAI(api_key="your_api_key", 
               base_url="https://api.siliconflow.cn/v1")

response = client.chat.completions.create(
   model="Qwen/Qwen2.5-72B-Instruct",
   messages=[
       {'role': 'user', 
        'content': "What new opportunities will reasoning models bring to the market"}
   ],
)

complete_response = response.choices[0].message.content
print(complete_response)

Change Sentiment Analysis Model

The system integrates multiple sentiment analysis methods, which can be selected according to needs:

1. Multilingual Sentiment Analysis

cd SentimentAnalysisModel/WeiboMultilingualSentiment
python predict.py --text "This product is amazing!" --lang "en"

2. Small Parameter Qwen3 Fine-tuning

cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
python predict_universal.py --text "这次活动办得很成功"

3. BERT-based Fine-tuned Model

# Use BERT Chinese model
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
python predict.py --text "这个产品真的很不错"

4. GPT-2 LoRA Fine-tuned Model

cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
python predict.py --text "今天心情不太好"

5. Traditional Machine Learning Methods

cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
python predict.py --model_type "svm" --text "服务态度需要改进"

Integrate Custom Business Database

1. Modify Database Connection Configuration

# Add your business database configuration in config.py
BUSINESS_DB_HOST = "your_business_db_host"
BUSINESS_DB_PORT = 3306
BUSINESS_DB_USER = "your_business_user"
BUSINESS_DB_PASSWORD = "your_business_password"
BUSINESS_DB_NAME = "your_business_database"

2. Create Custom Data Access Tool

# InsightEngine/tools/custom_db_tool.py
class CustomBusinessDBTool:
    """Custom business database query tool"""
    
    def __init__(self):
        self.connection_config = {
            'host': config.BUSINESS_DB_HOST,
            'port': config.BUSINESS_DB_PORT,
            'user': config.BUSINESS_DB_USER,
            'password': config.BUSINESS_DB_PASSWORD,
            'database': config.BUSINESS_DB_NAME,
        }
    
    def search_business_data(self, query: str, table: str):
        """Query business data"""
        # Implement your business logic
        pass
    
    def get_customer_feedback(self, product_id: str):
        """Get customer feedback data"""
        # Implement customer feedback query logic
        pass

3. Integrate into InsightEngine

# Integrate custom tool in InsightEngine/agent.py
from .tools.custom_db_tool import CustomBusinessDBTool

class DeepSearchAgent:
    def __init__(self, config=None):
        # ... other initialization code
        self.custom_db_tool = CustomBusinessDBTool()
    
    def execute_custom_search(self, query: str):
        """Execute custom business data search"""
        return self.custom_db_tool.search_business_data(query, "your_table")

Custom Report Templates

1. Upload in Web Interface

The system supports uploading custom template files (.md or .txt format), which can be selected when generating reports.

2. Create Template File

Create a new template in the ReportEngine/report_template/ directory. Our Agent will automatically select the most appropriate template.

🤝 Contribution Guidelines

We welcome all forms of contributions!

How to Contribute

Fork the project to your GitHub account
Create Feature branch: git checkout -b feature/AmazingFeature
Commit changes: git commit -m 'Add some AmazingFeature'
Push to branch: git push origin feature/AmazingFeature
Open Pull Request

Development Standards

Code follows PEP8 standards
Commit messages use clear Chinese and English descriptions
New features need to include corresponding test cases
Update relevant documentation

🦖 Next Development Plan

Currently, the system has only completed the first two steps of the "three axes": input requirements -> detailed analysis. It still lacks the prediction step. Directly handing it to LLM is not convincing.

Currently, after a long period of crawling and collection, we have accumulated a large amount of data on how topic popularity changes over time, explosive points, and other trend data across the web. We already have the conditions to develop prediction models. Our team will apply time series models, graph neural networks, multimodal fusion, and other predictive model technical reserves here to achieve truly data-driven public opinion prediction functionality.

⚠️ Disclaimer

Important Reminder: This project is for learning, academic research, and educational purposes only

Compliance Statement:
- All code, tools, and functions in this project are for learning, academic research, and educational purposes only
- It is strictly prohibited to use this project for any commercial purposes or profit-making activities
- It is strictly prohibited to use this project for any illegal, non-compliant, or rights-infringing behaviors
Crawler Function Disclaimer:
- The crawler function in the project is for technical learning and research purposes only
- Users must comply with the target website's robots.txt protocol and terms of use
- Users must comply with relevant laws and regulations and not conduct malicious crawling or data abuse
- Any legal consequences arising from the use of crawler functions shall be borne by the user
Data Use Disclaimer:
- The data analysis functions involved in the project are for academic research only
- It is strictly prohibited to use analysis results for commercial decisions or profit purposes
- Users should ensure the legality and compliance of the analyzed data
Technical Disclaimer:
- This project is provided "as is" without any express or implied warranties
- The author is not responsible for any direct or indirect losses caused by using this project
- Users should evaluate the applicability and risks of the project themselves
Liability Limitation:
- Users should fully understand relevant laws and regulations before using this project
- Users should ensure that their use behavior complies with local laws and regulations
- Any consequences arising from the use of this project in violation of laws and regulations shall be borne by the user

Please read and understand the above disclaimer carefully before using this project. Using this project indicates that you have agreed to and accepted all the above terms.

📄 License

This project is licensed under the GPL-2.0 License. For detailed information, please refer to the LICENSE file.

🎉 Support and Contact

Get Help

Project Homepage: GitHub Repository
Issue Feedback: Issues Page
Feature Suggestions: Discussions Page

Contact Information

📧 Email: 670939375@qq.com

Business Cooperation

Enterprise Custom Development
Big Data Services
Academic Cooperation
Technical Training

👥 Contributors

Thanks to these excellent contributors:

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
.github		.github
ForumEngine		ForumEngine
InsightEngine		InsightEngine
MediaEngine		MediaEngine
MindSpider		MindSpider
QueryEngine		QueryEngine
ReportEngine		ReportEngine
SentimentAnalysisModel		SentimentAnalysisModel
SingleEngineApp		SingleEngineApp
final_reports		final_reports
insight_engine_streamlit_reports		insight_engine_streamlit_reports
logs		logs
media_engine_streamlit_reports		media_engine_streamlit_reports
query_engine_streamlit_reports		query_engine_streamlit_reports
static/image		static/image
templates		templates
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README-EN.md		README-EN.md
README.md		README.md
app.py		app.py
config.py		config.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

⚡ Project Overview

🏗️ System Architecture

Overall Architecture Diagram

Complete Analysis Workflow

Project Code Structure Tree

🚀 Quick Start

Environment Requirements

1. Create Conda Environment

2. Install Dependencies

3. Install Playwright Browser Driver

4. Configure System

4.1 Configure API Keys

4.2 Database Initialization

5. Start System

5.1 Complete System Startup (Recommended)

5.2 Start Individual Agent Separately

5.3 Crawler System Standalone Use

⚙️ Advanced Configuration

Modify Key Parameters

Agent Configuration Parameters

Sentiment Analysis Model Configuration

Integrate Different LLM Models

Change Sentiment Analysis Model

1. Multilingual Sentiment Analysis

2. Small Parameter Qwen3 Fine-tuning

3. BERT-based Fine-tuned Model

4. GPT-2 LoRA Fine-tuned Model

5. Traditional Machine Learning Methods

Integrate Custom Business Database

1. Modify Database Connection Configuration

2. Create Custom Data Access Tool

3. Integrate into InsightEngine

Custom Report Templates

1. Upload in Web Interface

2. Create Template File

🤝 Contribution Guidelines

How to Contribute

Development Standards

🦖 Next Development Plan

⚠️ Disclaimer

📄 License

🎉 Support and Contact

Get Help

Contact Information

Business Cooperation

👥 Contributors

📈 Project Statistics

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages