PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System2просмотра21 день назад
LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings3просмотра22 дня назад
The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks6просмотровМесяц назад