| 2026 |
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes |
arXiv 2026 |
|
| 2026 |
TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation |
arXiv 2026 |
|
| 2026 |
SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization |
arXiv 2026 |
Code |
| 2026 |
AUHead: Realistic Emotional Talking Head Generation via Action Units Control |
ICLR 2026 |
|
| 2026 |
UniTalking: A Unified Audio-Video Framework for Talking Portrait Generation |
CVPR 2026 |
|
| 2026 |
DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization |
CVPR 2026 |
|
| 2026 |
ActAvatar: Temporally-Aware Precise Action Control for Talking Avatars |
CVPR 2026 |
|
| 2026 |
Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video |
CVPR 2026 |
|
| 2026 |
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation |
CVPR 2026 |
|
| 2025 |
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models |
arXiv 2025 |
Project |
| 2025 |
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modelling for Natural Talking Head Generation |
ICCV 2025 |
Project |
| 2025 |
OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation |
arXiv 2025 |
Code · Project |
| 2025 |
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation |
CVPR 2025 |
|
| 2025 |
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion |
CVPR 2025 |
Project |
| 2025 |
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations |
CVPR 2025 |
Project |
| 2025 |
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation |
ICLR 2025 |
Code |
| 2025 |
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency |
ICLR 2025 |
Project |
| 2025 |
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation |
ICLR 2025 |
Project · Code |
| 2025 |
AnyTalk: Multi-modal Driven Multi-domain Talking Head Generation |
AAAI 2025 |
|
| 2025 |
Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation |
AAAI 2025 |
|
| 2025 |
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation |
ICCV 2025 |
|
| 2025 |
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait |
ICCV 2025 |
Project |
| 2025 |
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation |
CVPR 2025 |
|
| 2025 |
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length |
arXiv 2025 |
Code |
| 2025 |
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models |
arXiv 2025 |
|
| 2025 |
GAIA: Zero-shot Talking Avatar Generation |
arXiv 2025 |
|
| 2024 |
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis |
ICLR 2024 |
Project · Code |
| 2024 |
Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions |
arXiv 2024 |
Project · Code |
| 2024 |
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style |
AAAI 2024 |
|
| 2024 |
Say Anything with Any Style |
AAAI 2024 |
|
| 2024 |
[MuseTalk] Real-Time High Quality Lip Synchorization with Latent Space Inpainting, [Code]. |
|
Code |
| 2024 |
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time |
NeurIPS 2024 |
Project |
| 2024 |
THQA: A Perceptual Quality Assessment Database for Talking Heads |
arXiv 2024 |
Code |
| 2024 |
Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior |
arXiv 2024 |
Code · Project |
| 2024 |
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis |
arXiv 2024 |
Code · Project |
| 2024 |
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations |
arXiv 2024 |
Code |
| 2024 |
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization |
arXiv 2024 |
|
| 2024 |
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio |
arXiv 2024 |
Code |
| 2024 |
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation |
arXiv 2024 |
Code |
| 2024 |
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions |
arXiv 2024 |
Code · Project |
| 2024 |
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network |
arXiv 2024 |
|
| 2024 |
Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation |
arXiv 2024 |
|
| 2024 |
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement |
arXiv 2024 |
|
| 2024 |
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model |
arXiv 2024 |
|
| 2024 |
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer |
arXiv 2024 |
|
| 2024 |
Style-Preserving Lip Sync via Audio-Aware Style Reference |
arXiv 2024 |
|
| 2024 |
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation |
arXiv 2024 |
Code · Project |
| 2024 |
Latent Diffusion Transformer for Talking Video Synthesis |
arXiv 2024 |
Code · Project |
| 2024 |
IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation |
arXiv 2024 |
Project |
| 2024 |
Memory-Guided Diffusion for Expressive Talking Video Generation |
arXiv 2024 |
Project · Code |
| 2024 |
Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks |
arXiv 2024 |
|
| 2024 |
VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization |
arXiv 2024 |
|
| 2024 |
Towards Customizable One-Shot Audio-to-Talking Face Generation |
arXiv 2024 |
|
| 2024 |
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync |
arXiv 2024 |
Code |
| 2024 |
Media2Face: Co-speech Facial Animation Generation with Multi-Modality Guidance |
SIGGRAPH 2024 |
|
| 2024 |
PersonaTalk: Bring Attention to Your Persona in Visual Dubbing |
SIGGRAPH Asia 2024 |
|
| 2024 |
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads |
TPAMI 2024 |
|
| 2024 |
JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation |
BMVC 2024 |
|
| 2024 |
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis |
arXiv 2024 |
|
| 2024 |
JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics |
arXiv 2024 |
Code |
| 2024 |
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level Conditions in Diffusion Models |
arXiv 2024 |
Code |
| 2024 |
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details |
arXiv 2024 |
|
| 2023 |
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation |
Arxiv 2023 |
Project |
| 2023 |
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis |
Arxiv 2023 |
Project · Code |
| 2023 |
[READ Avatars: Realistic Emotion-controllable Audio Driven Avatars](READ Avatars: Realistic Emotion-controllable Audio Driven Avatars) |
Arxiv 2023 |
|
| 2023 |
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder |
Arxiv 2023 |
|
| 2023 |
Emotionally Enhanced Talking Face Generation |
Arxiv 2023 |
Code |
| 2023 |
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert |
CVPR 2023 |
Code |
| 2023 |
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator |
CVPR 2023 |
Project · Code |
| 2023 |
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation |
arXiv 2023 |
Project · Code |
| 2023 |
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions |
ICCV 2023 |
|
| 2023 |
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior |
Arxiv 2023 |
Project · Code |
| 2023 |
IP_LAP: Identity-Preserving Talking Face Generation with Landmark and Appearance Priors |
CVPR 2023 |
Code |
| 2023 |
HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation |
CVPR 2023 |
Code |
| 2023 |
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation |
ICCV 2023 |
Project · Code |
| 2023 |
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Talking Head Animation |
CVPR 2023 |
Project · Code |
| 2023 |
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video |
AAAI 2023 |
Code |
| 2023 |
EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation |
ICCV 2023 |
|
| 2023 |
ToonTalker: Cross-Domain Face Reenactment |
ICCV 2023 |
|
| 2023 |
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning |
CVPR 2023 |
|
| 2023 |
DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions |
ICASSP 2023 |
Code |
| 2022 |
Expressive Talking Head Generation with Granular Audio-Visual Control |
CVPR 2022 |
|
| 2022 |
Talking Face Generation with Multilingual TTS |
CVPR 2022 |
Demo |
| 2022 |
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model |
SIGGRAPH 2022 |
|
| 2022 |
SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression |
arXiv 2022 |
Project |
| 2022 |
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers |
SIGGRAPH Asia 2022 |
|
| 2022 |
Memories are One-to-Many Mapping Alleviators in Talking Face Generation |
arXiv 2022 |
|
| 2021 |
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation |
CVPR 2021 |
Code · Project |
| 2021 |
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis |
ACM Multimedia 2021 |
|
| 2021 |
Audio-Driven Emotional Video Portraits |
CVPR 2021 |
Code |
| 2021 |
Talking Head Generation with Audio and Speech Related Facial Action Units |
arxiv 2021 |
|
| 2021 |
Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation |
IJCAI 2021 |
|
| 2021 |
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis |
ACM MM 2021 |
|
| 2021 |
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation |
ACM TOG 2021 |
Code |
| 2021 |
Audio2head: Audio-driven one-shot talking-head generation with natural head motion |
ArXiv 2021 |
|
| 2020 |
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild |
ACM Multimedia 2020 |
Code · Project |
| 2020 |
Talking-head Generation with Rhythmic Head Motion |
ECCV 2020 |
Code |
| 2020 |
MakeItTalk: Speaker-Aware Talking-Head Animation |
SIGGRAPH Asia 2020 |
Code · Project |
| 2020 |
Neural Voice Puppetry: Audio-driven Facial Reenactment |
ECCV 2020 |
Code · Project |
| 2020 |
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation |
ECCV 2020 |
Code · Project |
| 2020 |
Realistic Speech-Driven Facial Animation with GANs |
IJCV 2020 |
|
| 2019 |
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation |
AAAI 2019 |
Code |
| 2019 |
Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss |
CVPR 2019 |
Code |
| 2018 |
Lip Movements Generation at a Glance |
ECCV 2018 |
Code |
| 2018 |
VisemeNet: Audio-Driven Animator-Centric Speech Animation |
SIGGRAPH 2018 |
|
| 2017 |
Synthesizing Obama: Learning Lip Sync From Audio |
SIGGRAPH 2017 |
Project |
| 2017 |
You Said That?: Synthesising Talking Faces From Audio |
IJCV 2019 |
Code |
| 2017 |
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion |
SIGGRAPH 2017 |
|
| 2017 |
A Deep Learning Approach for Generalized Speech Animation |
SIGGRAPH 2017 |
|
| 2016 |
Lip Reading in the Wild |
ACCV 2016 |
|