Highlights
- Pro
Stars
Code for our (sebis) submission to ArchEHR-QA 2026 Shared Task (CL4Health @ LREC 2026)
[CVPR 2026] Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories
[arXiv 2026] MoKus: This repo is the official implementation of "MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization"
Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models
Motivation in LLMs - code and data
Unified KV cache management for multi-task VLA inference.
Sparsity as a Variance Regulator for Improved Depth Utilization in Language Models
A benchmark to measure AI progress on unsolved research problems in mathematics.
Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
Project page for "Training-free Detection of Generated Videos via Spatio-Temporal Likelihoods" [CVPR 2026]
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics
SING-analyzing-semantic-invariants-classifiers
[Arxiv] Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models
Meissa is a multi-modal medical agent, built on trajectory-based agentic behavior distillation framework.
ATM-Bench: A benchmark for long-term personalized memory QA spanning ~4 years of multimodal data (images, videos, emails). Features referential queries, evidence-grounded answering, and multi-sourc…
Codes for paper: "RbtAct:RebuttalasSupervisionforActionableReviewGeneration"
[CVPR2026] CodePercept: Code-Grounded Visual STEM Perception for MLLM
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams
RETROAGENT: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
Generate high resolution videos with a custom voice and appearance, based on LTX-2/LTX-2.3 + Identity In Context LoRA
Code for `LLM2VEC-GEN: Generative Embeddings from Large Language Models`
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents
In-Context Reinforcement Learning for Tool Use in Large Language Models
Satellite-based causal attribution of coastal water clarity degradation to nickel smelting expansion at Indonesia's Morowali Industrial Park using Bayesian structural time series, multi-algorithm c…



