← アーカイブ一覧
論文 深掘り Hugging Face 2026-05-25 HF ↑48
While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and sp...
#alignment#robotics#benchmark
モデル 深掘り OpenAI 2026-05-27
Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows....
#agent#coding
論文 Hugging Face 2026-05-25 HF ↑4
Large Language Models (LLMs) are increasingly deployed as autonomous agents that reason, use tools, and act over multiple steps. Yet most hallucination benchmarks still evaluate only the final output, missing failures that originate in intermediate Thought-Action-Observation steps. We present Trajel...
#agent#benchmark#llm
論文 Hugging Face 2026-05-25 HF ↑17
We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deploy...
#agent#coding#rl#benchmark
論文 Hugging Face 2026-05-25 HF ↑13
Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity t...
#diffusion#benchmark
論文 Hugging Face 2026-05-25 HF ↑8
Agentic reinforcement learning (RL) has proven effective for training LLM-based agents with external tool-use capabilities. However, we identify that agentic RL training induces increasing redundant tool calls and blurs the model's intrinsic knowledge boundary, where the model fails to distinguish w...
#agent#rl#llm#benchmark
論文 Hugging Face 2026-05-25 HF ↑18
Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and can...
#speech#llm#benchmark
論文 深掘り Hugging Face 2026-05-25 HF ↑7
Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interacti...
#agent#benchmark#llm
論文 深掘り Hugging Face 2026-05-25 HF ↑3
We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all t...
#multimodal#benchmark
論文 Hugging Face 2026-05-25 HF ↑65
Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry ...
#coding#multimodal#benchmark
企業動向 OpenAI 2026-05-27
Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, and automate defect remediation....
企業動向 OpenAI 2026-05-27
See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows....
#agent
論文 Hugging Face 2026-05-25 HF ↑6
Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evo...
#agent#benchmark#llm
企業動向 Microsoft Research 2026-05-27
Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsoft Research ....
企業動向 OpenAI 2026-05-27
Ahead of global elections, we’re helping people access information, supporting cyber defenders, and increasing AI transparency...
企業動向 Hugging Face 2026-05-27
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM...
#agent#benchmark
企業動向 Hugging Face 2026-05-27
Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL...
論文 深掘り arXiv 2026-05-26
Deep unfolding neural networks derived from iterative optimization schemes and numerical ordinary/partial differential equations (ODEs/PDEs) have attracted much attention in data science over the last decade. Therein, numerous important network architectures were constructed from the basic forward-b...
企業動向 NVIDIA 2026-05-27
AI factories are token factories, converting power into intelligence in real time. And as agentic AI scales and autonomous, always-on special agents are deployed in the enterprise, performance per watt and cost per token become the economics that matter....
#agent
企業動向 Google Research 2026-05-27
Security, Privacy and Abuse Prevention...