Skip to content

[DMP 2025]: Standardizing and Enhancing the Feluda Python Packages #516

@aatmanvaidya

Description

@aatmanvaidya

Description

Feluda is a configurable engine for analyzing multi-lingual and multi-modal content. It allows researchers, factcheckers and journalists to explore and analyze large quantity of multimedia content. Feluda has a component called operators , which are built keeping in mind the need to process data in various modalities (text, audio, video, images, hybrid) and various languages. Each operator is a python package, and we have a monorepo with a multi-package system. As Feluda continues to grow, ensuring consistency and robustness across these packages is crucial for long-term maintainability and ease of contribution.
The goal of the project is to improve the internal structure of Feluda packages, enhance documentation, and optimize performance, laying the groundwork for a stable v1.0.0 release.

Goals

  • Standardize interfaces and functions across all Feluda Python packages.
  • Write comprehensive documentation for all packages.
  • Achieve close to 100% test coverage with unit and integration tests.
  • Enhance package build robustness and integration capabilities with other applications.
  • Create practical tutorials (jupyter notebooks) and use cases for fact-checkers and researchers.

Expected Outcome

  • Consistent APIs across all packages with standardized functions
  • Basic documentation hosted on Read the Docs covering package functionality, input/output formats, and detailed use cases.
  • Improved error handling with clear, actionable messages for the user
  • A collection of tutorials and notebooks demonstrating real-world applications like extracting text from newspaper images, clustering large amounts of video to detect social thematic labels etc
  • Expanded test coverage for better reliability

Acceptance Criteria

  • Write Tests to validate compliance for current feluda operators
  • Write documentation and python notebooks to demonstrate use of current operators
  • Participate in weekly update calls and demonstrate progress

Implementation Details

  1. Standardization:
    • Define a common interface (API) for all packages with init() and run() functions.
    • Implement proper error handling and warnings
    • Ensure feluda raises exceptions when working with incompatible operators
  2. Documentation:
    • Create a Read the Docs template covering each package's purpose, function signatures, expected inputs/outputs, and common use cases.
    • Develop example notebooks (Google Colab/Marimo) showcasing practical applications for researchers and fact-checkers.
  3. Robustness:
    • Optimize model loading processes to reduce memory footprint and improve runtime efficiency.
  4. Testing:
    • Expand test coverage to include integration tests across packages.
    • Implement CI GitHub Action pipelines to run tests and other safety checks.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Python, Object-Oriented Programming, Machine Learning, Performance Improvement, Docker, Testing, Technical Writing.

Mentor

Denny George (@dennyabrain ), Aatman Vaidya (@aatmanvaidya )

Category

Data Science, Machine Learning

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions