EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control7просмотров2 месяца назад
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design7просмотров2 месяца назад
GOEDEL-PROVER-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction2просмотра2 месяца назад
DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis1просмотр2 месяца назад
LIVEMCP-101: Stress Testing and Diagnosing MCP-Enabled Agents on Challenging Queries3просмотра2 месяца назад