Skip to content

Latest commit

 

History

History
199 lines (145 loc) · 12.9 KB

File metadata and controls

199 lines (145 loc) · 12.9 KB

Web Scraping Pandas python-pptx Matplotlib HTML5 Plotly Python OOP Automation


Email Email LinkedIn GitHub


Project Logo

🏏 Cricket Centuries Analysis - Automated PPT Generation 📊

A project demonstrating the automated generation of PowerPoint presentations from web-scraped cricket data, showcasing data analysis and visualization skills.
View on GitHub

Table of Contents
  1. Overview
  2. Key Skills Demonstrated
  3. Project Details
    1. Data Acquisition
    2. Data Processing
    3. PPT Generation
    4. Interactive Visualizations (HTML/Plotly)
    5. Code Structure
      1. How to Add Changes?
    6. How to Run?
  4. PPT Analysis Summary
  5. Dependencies
  6. Technical Considerations
  7. Conclusion
  8. Contact

Overview

This project showcases my ability to automate the generation of PowerPoint presentations from web-scraped data. I've developed a Python-based pipeline that extracts cricket statistics, processes it, and creates visually appealing PPTs with data visualizations.

(back to top)

Key Skills Demonstrated

  • Web Scraping
  • Pandas
  • python-pptx
  • Matplotlib
  • HTML5
  • Plotly
  • Python
  • OOP
  • Automation

(back to top)

Project Details

1. Data Acquisition

  • I employed web scraping techniques to gather data on cricket players and their century records from various online sources.
  • The collected data encompasses player details (name, date of birth, place of birth, family information), career statistics (total centuries, centuries per year), and match information.

(back to top)

2. Data Processing

  • The scraped data was organized into efficient Pandas DataFrames for streamlined manipulation.
  • Data cleaning and transformation steps were performed to address missing values and ensure data integrity.
  • The processed DataFrames were saved as Excel files (personal_data.xlsx and processed_data.xlsx) for intermediate storage.

(back to top)

3. PPT Generation

  • I developed a Python script leveraging the python-pptx library to automate the creation of visually informative PowerPoint presentations.
  • The script dynamically reads data from the generated Excel files to create individual slides for each player, including:
    • Comprehensive player information
    • Detailed century statistics
    • Well-structured tables summarizing key data points
  • Matplotlib was utilized to generate insightful visualizations of century trends over the years, presented as clear bar and line plots within the PPT.

(back to top)

4. Interactive Visualizations (HTML/Plotly)

  • To provide a more engaging data exploration experience, I created interactive visualizations using Plotly and embedded them in separate HTML pages.
  • These dynamic graphs allow for features like zooming and tooltips, enabling deeper data analysis.
  • (Note: The generated PPTs are static. The interactive graphs reside in separate HTML files. Future integration could involve linking from the PPT or exporting the PPT to an HTML format.)

(back to top)

5. Code Structure

The project's codebase is thoughtfully structured for clarity and maintainability:

  • runner.ipynb: This Jupyter Notebook orchestrates the entire PPT creation process, acting as the main execution point.
  • prepare_data.ipynb: This notebook manages the data gathering and web scraping phases of the project.
  • ppt_generator.py (Class): This Python class is responsible for data transformation, the generation of static graphs (for the PPT), and the creation of interactive HTML versions of these graphs.
  • custom_presentation.py (Class): This class handles the styling of the PowerPoint presentation, the creation of individual slides, and the population of these slides with text, tables, and images.

5.1 How to Add Changes?

  • runner.ipynb: Modify the PPT_DATA variable within this file to adjust the filters applied when generating the PPTs (e.g., specific player groups or data ranges).
  • prepare_data.ipynb: Update this file to modify the data sources or the web scraping logic to work with different or updated cricket statistics.
  • ppt_generator.py (Class): Alter the data filtering and transformation logic within this class. You can also customize the appearance of the static graphs (for the PPT) and the interactive Plotly graphs (in HTML) here.
  • custom_presentation.py (Class): Modify this file to change the overall style of the generated PowerPoint presentations, including the logo, color scheme, slide layouts, and font styles.

(back to top)

6. How to Run?

  1. prepare_data.ipynb: Execute this notebook first to fetch and process the latest cricket data. This step will generate or update the personal_data.xlsx and processed_data.xlsx files.
  2. runner.ipynb or main.py: After successfully running prepare_data.ipynb (or if you already have the personal_data.xlsx and processed_data.xlsx files), execute this notebook or the python file. This will trigger the PPT generation process, creating the output PowerPoint files.

(back to top)

PPT Analysis Summary

The project currently generates two distinct PowerPoint presentations based on player gender: "player_Male.pptx" containing analysis for male cricket players and "player_Female.pptx" for female players.

(back to top)

Dependencies

This project relies on the following Python libraries. Ensure they are installed in your environment:

  • Python
  • Matplotlib
  • python-pptx
  • Pandas
  • Plotly
  • SciPy

You can install these dependencies using pip:

pip install matplotlib python-pptx pandas plotly scipy

The application will open a window displaying the webcam feed with detected cards and the identified poker hand.

You can also adapt the capture variable in the script to process a video file instead of a live webcam feed.

(back to top)

Technical Considerations

  • Robust error handling has been implemented within the web scraping and data processing stages to ensure the pipeline's stability.
  • The codebase is designed with a modular architecture to enhance maintainability and facilitate future extensions.
  • Potential future improvements include:
    • Implementing dynamic data updates to keep the presentations current.
    • Utilizing external configuration files for easier customization.
    • Further breaking down the code into smaller, more specialized modules.
    • Adding comprehensive logging for better monitoring and debugging.
    • Incorporating unit tests to ensure the reliability of individual components.

(back to top)

Conclusion

This project effectively demonstrates my ability to integrate diverse technical skills to create a fully automated workflow, from extracting raw data to generating insightful and visually appealing presentations. I am enthusiastic about leveraging and further developing these skills in a professional environment.

(back to top)

Contact

Feel free to reach out if you have any questions, suggestions, or would like to collaborate!

  • Name
  • Email
  • Email
  • LinkedIn
  • GitHub

(back to top)