ML
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Decoupling Reasoning from Observations for Efficient Augmented Language Models
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
A comprehensive guide to building RAG-based LLM applications for production.
A multi-voice TTS system trained with an emphasis on quality
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
kaldi-asr/kaldi is the official location of the Kaldi project.
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
A fast local neural text to speech engine for Mycroft
Mycroft Core, the Mycroft Artificial Intelligence platform.
Offline private voice assistant for many human languages
Robust Speech Recognition via Large-Scale Weak Supervision
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
We write your reusable computer vision tools. 💜
The code for some apps built with Sieve.
Mora: More like Sora for Generalist Video Generation
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
Paper Piano uses Python and OpenCV to detect key presses on a hand-drawn piano, translating them into digital notes and sound.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Official inference framework for 1-bit LLMs
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: …
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.






