Note: Changes should only be made in the
masterbranch, asgh-pagesis the publishing branch.
personalAggregator.py is a Python script designed to aggregate content from various sources based on user-defined rules, and generate markdown files suitable for static site generators like Jekyll.
- Python 3.x
- pip (Python package installer)
- socialModules
-
Clone the repository:
git clone https://github.com/your-username/personalAggregator.git cd personalAggregator -
Install dependencies:
pip install -r requirements.txt
The script uses a JSON file (e.g., rules.json) to define the aggregation rules. This file should contain a list of rule objects, where each object specifies how to fetch and process content from a particular URL.
Each rule object can have the following properties:
url(string, required): The URL of the page to scrape.selector(string, required): A CSS selector to identify the main content block for each item.title_selector(string, required): A CSS selector to extract the title of each item within its content block.date_selector(string, required): A CSS selector to extract the date of each item within its content block.date_format(string, optional): The format of the date string (e.g.,%Y-%m-%d). If omitted,dateparserwill attempt to parse the date automatically.tags(array of strings, optional): A list of tags to apply to the generated markdown post.output_filename_prefix(string, optional): A prefix to use for the generated markdown filename.
Example rules.json:
[
{
"url": "https://example.com/blog",
"selector": ".post-item",
"title_selector": ".post-title a",
"date_selector": ".post-date",
"date_format": "%Y-%m-%d",
"tags": ["blog", "example"],
"output_filename_prefix": "example-blog"
},
{
"url": "https://anothersite.com/news",
"selector": "article.news-entry",
"title_selector": "h2.entry-title",
"date_selector": ".entry-meta .date",
"tags": ["news"],
"output_filename_prefix": "another-news"
}
]Run the personalAggregator.py script from the root of the repository.
python _bin/personalAggregator.py [OPTIONS]Options:
--config-file <path>: Path to your JSON configuration file (e.g.,rules.json). Required.--output-dir <path>: Directory where the generated markdown post files will be saved (e.g.,_posts/). Required.--num-posts <number>: The maximum number of posts to generate for each rule. Defaults to 1.--log-level <level>: Set the logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Defaults to INFO.
Example Command:
To generate 5 posts from your rules.json file and save them to the _posts/ directory with INFO level logging:
python _bin/personalAggregator.py --config-file rules.json --output-dir _posts/ --num-posts 5 --log-level INFO