Important
One-click online deployment experience will be available on Monday (11.3), stay tuned!
"WeiYu (MicroOpinion)" is an innovative multi-agent public opinion analysis system built from scratch, helping users break through information echo chambers, restore the true picture of public opinion, predict future trends, and assist in decision-making. Users simply need to submit analysis requirements like a chat conversation, and the intelligent agents automatically analyze 30+ mainstream domestic and international social media platforms and millions of public comments.
"WeiYu" (微舆) is a homophone of "WeiYu" (微鱼, meaning "small fish"). BettaFish is a small but very aggressive and beautiful fish, symbolizing "small but powerful, fearless of challenges"
View the system-generated research report using "Wuhan University public opinion" as an example: Wuhan University Brand Reputation In-depth Analysis Report
Not only reflected in report quality, compared to similar products, we have 🚀 six major advantages:
-
AI-Driven Full-Domain Monitoring: AI crawler clusters operate 24/7 non-stop, comprehensively covering 10+ key domestic and international social media platforms including Weibo, Xiaohongshu, Douyin, Kuaishou, etc. Not only capturing trending content in real-time, but also drilling down to massive user comments, allowing you to hear the most authentic and widespread public voice.
-
Composite Analysis Engine Beyond LLM: We not only rely on 5 types of professionally designed Agents but also integrate fine-tuned models, statistical models, and other middleware. Through multi-model collaboration, we ensure depth, accuracy, and multi-dimensional perspectives in analysis results.
-
Powerful Multimodal Capabilities: Breaking through text and image limitations, capable of deep analysis of short video content from Douyin and Kuaishou, and precise extraction of structured multimodal information cards such as weather, calendar, and stock data from modern search engines, giving you comprehensive control of public opinion dynamics.
-
Agent "Forum" Collaboration Mechanism: Endowing different Agents with unique toolsets and thinking patterns, introducing a debate moderator model, and conducting chain-of-thought collision and debate through a "forum" mechanism. This not only avoids the thinking limitations of a single model and homogenization caused by communication but also generates higher-quality collective intelligence and decision support.
-
Seamless Integration of Public and Private Domain Data: The platform not only analyzes public opinion but also provides highly secure interfaces supporting seamless integration of your internal business database with public opinion data. Breaking down data silos to provide powerful analytical capabilities of "external trends + internal insights" for vertical businesses.
-
Lightweight and Highly Extensible Framework: Based on pure Python modular design, achieving lightweight, one-click deployment. Clear code structure allows developers to easily integrate custom models and business logic, enabling rapid expansion and deep customization of the platform.
Starting with public opinion, but not limited to public opinion. The goal of "WeiYu" is to become a simple and universal data analysis engine driving all business scenarios.
For example, you only need to simply modify the API parameters and prompts in the Agent toolset to transform it into a financial market analysis system
Here's a fairly active project discussion thread on Linux.do: https://linux.do/t/topic/1009280
Say goodbye to traditional data dashboards. In "WeiYu", everything starts with a simple question. You only need to submit your analysis requirements like a conversation.
Insight Agent Private Database Mining: AI agent for deep analysis of private public opinion databases
Media Agent Multimodal Content Analysis: AI agent with powerful multimodal capabilities
Query Agent Precise Information Search: AI agent with domestic and international web search capabilities
Report Agent Intelligent Report Generation: Multi-round report generation AI agent with built-in templates
| Step | Phase Name | Main Operations | Participating Components | Loop Characteristics |
|---|---|---|---|---|
| 1 | User Query | Flask main app receives query | Flask main app | - |
| 2 | Parallel Startup | Three Agents start simultaneously | Query Agent, Media Agent, Insight Agent | - |
| 3 | Preliminary Analysis | Each Agent uses dedicated tools for overview search | Each Agent + dedicated toolset | - |
| 4 | Strategy Formulation | Develop segmented research strategy based on preliminary results | Internal decision module of each Agent | - |
| 5-N | Loop Phase | Forum Collaboration + Deep Research | ForumEngine + All Agents | Multiple Rounds |
| 5.1 | Deep Research | Each Agent conducts specialized search guided by forum moderator | Each Agent + reflection mechanism + forum guidance | Each round |
| 5.2 | Forum Collaboration | ForumEngine monitors Agent speeches and generates moderator summary | ForumEngine + LLM moderator | Each round |
| 5.3 | Communication Integration | Each Agent adjusts research direction based on discussion | Each Agent + forum_reader tool | Each round |
| N+1 | Result Integration | Report Agent collects all analysis results and forum content | Report Agent | - |
| N+2 | Report Generation | Dynamically selects templates and styles, generates final report in multiple rounds | Report Agent + template engine | - |
Weibo_PublicOpinion_AnalysisSystem/
├── QueryEngine/ # Domestic and international news breadth search Agent
│ ├── agent.py # Agent main logic
│ ├── llms/ # LLM interface wrapper
│ ├── nodes/ # Processing nodes
│ ├── tools/ # Search tools
│ ├── utils/ # Utility functions
│ └── ... # Other modules
├── MediaEngine/ # Powerful multimodal understanding Agent
│ ├── agent.py # Agent main logic
│ ├── nodes/ # Processing nodes
│ ├── llms/ # LLM interface
│ ├── tools/ # Search tools
│ ├── utils/ # Utility functions
│ └── ... # Other modules
├── InsightEngine/ # Private database mining Agent
│ ├── agent.py # Agent main logic
│ ├── llms/ # LLM interface wrapper
│ │ └── base.py # Unified OpenAI compatible client
│ ├── nodes/ # Processing nodes
│ │ ├── base_node.py # Base node class
│ │ ├── formatting_node.py # Formatting node
│ │ ├── report_structure_node.py # Report structure node
│ │ ├── search_node.py # Search node
│ │ └── summary_node.py # Summary node
│ ├── tools/ # Database query and analysis tools
│ │ ├── keyword_optimizer.py # Qwen keyword optimization middleware
│ │ ├── search.py # Database operation toolset
│ │ └── sentiment_analyzer.py # Sentiment analysis integration tool
│ ├── state/ # State management
│ │ ├── __init__.py
│ │ └── state.py # Agent state definition
│ ├── prompts/ # Prompt templates
│ │ ├── __init__.py
│ │ └── prompts.py # Various prompts
│ └── utils/ # Utility functions
│ ├── __init__.py
│ ├── config.py # Configuration management
│ └── text_processing.py # Text processing tools
├── ReportEngine/ # Multi-round report generation Agent
│ ├── agent.py # Agent main logic
│ ├── llms/ # LLM interface
│ ├── nodes/ # Report generation nodes
│ │ ├── template_selection.py # Template selection node
│ │ └── html_generation.py # HTML generation node
│ ├── report_template/ # Report template library
│ │ ├── 社会公共热点事件分析.md
│ │ ├── 商业品牌舆情监测.md
│ │ └── ... # More templates
│ └── flask_interface.py # Flask API interface
├── ForumEngine/ # Simple Forum Engine implementation
│ ├── monitor.py # Log monitoring and forum management
│ └── llm_host.py # Forum moderator LLM module
├── MindSpider/ # Weibo crawler system
│ ├── main.py # Crawler main program
│ ├── config.py # Crawler configuration file
│ ├── BroadTopicExtraction/ # Topic extraction module
│ │ ├── database_manager.py # Database manager
│ │ ├── get_today_news.py # Today's news acquisition
│ │ ├── main.py # Topic extraction main program
│ │ └── topic_extractor.py # Topic extractor
│ ├── DeepSentimentCrawling/ # Deep sentiment crawling
│ │ ├── keyword_manager.py # Keyword manager
│ │ ├── main.py # Deep crawling main program
│ │ ├── MediaCrawler/ # Media crawler core
│ │ └── platform_crawler.py # Platform crawler management
│ └── schema/ # Database structure
│ ├── db_manager.py # Database manager
│ ├── init_database.py # Database initialization
│ └── mindspider_tables.sql # Database table structure
├── SentimentAnalysisModel/ # Sentiment analysis model collection
│ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models
│ ├── WeiboMultilingualSentiment/# Multilingual sentiment analysis (recommended)
│ ├── WeiboSentiment_SmallQwen/ # Small parameter Qwen3 fine-tuning
│ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
├── SingleEngineApp/ # Streamlit applications for individual Agents
│ ├── query_engine_streamlit_app.py
│ ├── media_engine_streamlit_app.py
│ └── insight_engine_streamlit_app.py
├── templates/ # Flask templates
│ └── index.html # Main interface frontend
├── static/ # Static resources
├── logs/ # Runtime logs directory
├── final_reports/ # Final generated HTML report files
├── utils/ # Common utility functions
│ ├── forum_reader.py # Inter-Agent forum communication
│ └── retry_helper.py # Network request retry mechanism tool
├── app.py # Flask main application entry
├── config.py # Global configuration file
└── requirements.txt # Python dependency list
If you're learning how to build an Agent system for the first time, you can start with a very simple demo: Deep Search Agent Demo
- Operating System: Windows, Linux, MacOS
- Python Version: 3.9+
- Conda: Anaconda or Miniconda
- Database: MySQL (optional cloud database service available)
- Memory: 2GB+ recommended
# Create conda environment
conda create -n your_conda_name python=3.11
conda activate your_conda_name# Install basic dependencies
pip install -r requirements.txt
# If you don't want to use local sentiment analysis models (very low computing requirements, CPU version installed by default), you can comment out the "Machine Learning" section in this file before executing the command# Install browser driver (for crawler functionality)
playwright install chromiumEdit the config.py file and fill in your API keys (you can also choose your own models and search proxies, see details in the config file):
# MySQL database configuration
DB_HOST = "localhost"
DB_PORT = 3306
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_NAME = "your_db_name"
DB_CHARSET = "utf8mb4"
# LLM configuration
# You can change the API used by each LLM section, as long as it's compatible with OpenAI request format
# Insight Agent
INSIGHT_ENGINE_API_KEY = "your_api_key"
INSIGHT_ENGINE_BASE_URL = "https://api.moonshot.cn/v1"
INSIGHT_ENGINE_MODEL_NAME = "kimi-k2-0711-preview"
# Media Agent
...Option 1: Use Local Database
The MindSpider crawler system and public opinion system are independent of each other, so you need to configure
MindSpider\config.pyseparately
# Local MySQL database initialization
cd MindSpider
python schema/init_database.pyOption 2: Use Cloud Database Service (Recommended)
We provide convenient cloud database service, including 100,000+ daily real public opinion data, currently free to apply!
- Real public opinion data, updated in real-time
- Multi-dimensional tag classification
- High-availability cloud service
- Professional technical support
Contact us to apply for free cloud database access: 📧 670939375@qq.com
For data compliance review and service upgrade, the cloud database will suspend accepting new applications from October 1, 2025
# In the project root directory, activate conda environment
conda activate your_conda_name
# Start the main application
python app.pyNote 1: After a run terminates, the streamlit app may end abnormally and still occupy the port. In this case, search for the process occupying the port and kill it.
Note 2: Data crawling requires separate operations, see section 5.3 for guidance
Note 3: If page display issues occur during remote server deployment, see PR#45
Visit http://localhost:5000 to use the complete system
# Start QueryEngine
streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503
# Start MediaEngine
streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
# Start InsightEngine
streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501This section has detailed configuration documentation: MindSpider Usage Instructions
# Enter crawler directory
cd MindSpider
# Project initialization
python main.py --setup
# Run complete crawler workflow
python main.py --complete --date 2024-01-20
# Run topic extraction only
python main.py --broad-topic --date 2024-01-20
# Run deep crawling only
python main.py --deep-sentiment --platforms xhs dy wbEach Agent has a dedicated configuration file that can be adjusted according to needs. Here are some examples:
# QueryEngine/utils/config.py
class Config:
max_reflections = 2 # Number of reflection rounds
max_search_results = 15 # Maximum search results
max_content_length = 8000 # Maximum content length
# MediaEngine/utils/config.py
class Config:
comprehensive_search_limit = 10 # Comprehensive search limit
web_search_limit = 15 # Web search limit
# InsightEngine/utils/config.py
class Config:
default_search_topic_globally_limit = 200 # Global search limit
default_get_comments_limit = 500 # Comment retrieval limit
max_search_results_for_llm = 50 # Maximum results for LLM# InsightEngine/tools/sentiment_analyzer.py
SENTIMENT_CONFIG = {
'model_type': 'multilingual', # Options: 'bert', 'multilingual', 'qwen', etc.
'confidence_threshold': 0.8, # Confidence threshold
'batch_size': 32, # Batch size
'max_sequence_length': 512, # Maximum sequence length
}Supports any LLM provider with OpenAI call format. Simply fill in the corresponding KEY, BASE_URL, and MODEL_NAME in /config.py.
What is OpenAI call format? Here's a simple example:
from openai import OpenAI client = OpenAI(api_key="your_api_key", base_url="https://api.siliconflow.cn/v1") response = client.chat.completions.create( model="Qwen/Qwen2.5-72B-Instruct", messages=[ {'role': 'user', 'content': "What new opportunities will reasoning models bring to the market"} ], ) complete_response = response.choices[0].message.content print(complete_response)
The system integrates multiple sentiment analysis methods, which can be selected according to needs:
cd SentimentAnalysisModel/WeiboMultilingualSentiment
python predict.py --text "This product is amazing!" --lang "en"cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
python predict_universal.py --text "这次活动办得很成功"# Use BERT Chinese model
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
python predict.py --text "这个产品真的很不错"cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
python predict.py --text "今天心情不太好"cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
python predict.py --model_type "svm" --text "服务态度需要改进"# Add your business database configuration in config.py
BUSINESS_DB_HOST = "your_business_db_host"
BUSINESS_DB_PORT = 3306
BUSINESS_DB_USER = "your_business_user"
BUSINESS_DB_PASSWORD = "your_business_password"
BUSINESS_DB_NAME = "your_business_database"# InsightEngine/tools/custom_db_tool.py
class CustomBusinessDBTool:
"""Custom business database query tool"""
def __init__(self):
self.connection_config = {
'host': config.BUSINESS_DB_HOST,
'port': config.BUSINESS_DB_PORT,
'user': config.BUSINESS_DB_USER,
'password': config.BUSINESS_DB_PASSWORD,
'database': config.BUSINESS_DB_NAME,
}
def search_business_data(self, query: str, table: str):
"""Query business data"""
# Implement your business logic
pass
def get_customer_feedback(self, product_id: str):
"""Get customer feedback data"""
# Implement customer feedback query logic
pass# Integrate custom tool in InsightEngine/agent.py
from .tools.custom_db_tool import CustomBusinessDBTool
class DeepSearchAgent:
def __init__(self, config=None):
# ... other initialization code
self.custom_db_tool = CustomBusinessDBTool()
def execute_custom_search(self, query: str):
"""Execute custom business data search"""
return self.custom_db_tool.search_business_data(query, "your_table")The system supports uploading custom template files (.md or .txt format), which can be selected when generating reports.
Create a new template in the ReportEngine/report_template/ directory. Our Agent will automatically select the most appropriate template.
We welcome all forms of contributions!
- Fork the project to your GitHub account
- Create Feature branch:
git checkout -b feature/AmazingFeature - Commit changes:
git commit -m 'Add some AmazingFeature' - Push to branch:
git push origin feature/AmazingFeature - Open Pull Request
- Code follows PEP8 standards
- Commit messages use clear Chinese and English descriptions
- New features need to include corresponding test cases
- Update relevant documentation
Currently, the system has only completed the first two steps of the "three axes": input requirements -> detailed analysis. It still lacks the prediction step. Directly handing it to LLM is not convincing.
Currently, after a long period of crawling and collection, we have accumulated a large amount of data on how topic popularity changes over time, explosive points, and other trend data across the web. We already have the conditions to develop prediction models. Our team will apply time series models, graph neural networks, multimodal fusion, and other predictive model technical reserves here to achieve truly data-driven public opinion prediction functionality.
Important Reminder: This project is for learning, academic research, and educational purposes only
-
Compliance Statement:
- All code, tools, and functions in this project are for learning, academic research, and educational purposes only
- It is strictly prohibited to use this project for any commercial purposes or profit-making activities
- It is strictly prohibited to use this project for any illegal, non-compliant, or rights-infringing behaviors
-
Crawler Function Disclaimer:
- The crawler function in the project is for technical learning and research purposes only
- Users must comply with the target website's robots.txt protocol and terms of use
- Users must comply with relevant laws and regulations and not conduct malicious crawling or data abuse
- Any legal consequences arising from the use of crawler functions shall be borne by the user
-
Data Use Disclaimer:
- The data analysis functions involved in the project are for academic research only
- It is strictly prohibited to use analysis results for commercial decisions or profit purposes
- Users should ensure the legality and compliance of the analyzed data
-
Technical Disclaimer:
- This project is provided "as is" without any express or implied warranties
- The author is not responsible for any direct or indirect losses caused by using this project
- Users should evaluate the applicability and risks of the project themselves
-
Liability Limitation:
- Users should fully understand relevant laws and regulations before using this project
- Users should ensure that their use behavior complies with local laws and regulations
- Any consequences arising from the use of this project in violation of laws and regulations shall be borne by the user
Please read and understand the above disclaimer carefully before using this project. Using this project indicates that you have agreed to and accepted all the above terms.
This project is licensed under the GPL-2.0 License. For detailed information, please refer to the LICENSE file.
- Project Homepage: GitHub Repository
- Issue Feedback: Issues Page
- Feature Suggestions: Discussions Page
- 📧 Email: 670939375@qq.com
- Enterprise Custom Development
- Big Data Services
- Academic Cooperation
- Technical Training
Thanks to these excellent contributors:




