2026-05-29

20件

論文深掘り Hugging Face 2026-05-27 HF ↑76

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment...

#agent#alignment#rl

論文深掘り Hugging Face 2026-05-27 HF ↑35

minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Recent video diffusion foundation models have achieved remarkable progress in high-quality video generation, yet turning them into real-time interactive video world models remains challenging. Interactive world models require controllable, causal, and low-latency rollout, which in practice demands a...

#diffusion#fine-tuning

論文深掘り Hugging Face 2026-05-27 HF ↑45

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision...

#robotics#benchmark

論文 Hugging Face 2026-05-27 HF ↑46

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of ...

#benchmark

論文 Hugging Face 2026-05-27 HF ↑14

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process its...

#llm#rl#benchmark

論文 Hugging Face 2026-05-27 HF ↑12

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as Contextual Belief Management (CBM): maintaining a predicted belief state aligned with formal evidence while i...

#llm#rl#benchmark

論文 Hugging Face 2026-05-27 HF ↑23

GenClaw: Code-Driven Agentic Image Generation

Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of pr...

#agent#vision#llm#multimodal

論文 Hugging Face 2026-05-27 HF ↑16

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving the quantitative capa...

#llm#benchmark#fine-tuning#coding

論文 Hugging Face 2026-05-27 HF ↑16

LoMo: Local Modality Substitution for Deeper Vision-Language Fusion

Vision-Language Models (VLMs) have achieved substantial progress across a wide range of understanding and reasoning tasks, driven by large-scale image-text training aimed at multimodal fusion. Ideally, replacing a textual question with its rendered-image counterpart should leave model performance es...

#multimodal#benchmark

論文 Hugging Face 2026-05-27 HF ↑6

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-de...

#agent#llm#benchmark

企業動向 OpenAI 2026-05-29

Boston Children’s uses AI to unlock new diagnoses

Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases....

モデル OpenAI 2026-05-29

Strengthening societal resilience with Rosalind Biodefense

OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI....

企業動向 OpenAI 2026-05-29

A shared playbook for trustworthy third party evaluations

OpenAI shares guidance on third-party AI evaluations, covering how to assess model capabilities, safeguards, and validity for frontier systems....

#benchmark

モデル OpenAI 2026-05-29

How Braintrust turns customer requests into code with Codex

How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster....

ツール OpenAI 2026-05-28

OpenAI’s Frontier Governance Framework

Explore OpenAI’s Frontier Governance Framework and how our AI safety, security, and risk practices align with emerging EU and California regulations....

#alignment

企業動向 OpenAI 2026-05-28

MUFG aims to become AI-native with OpenAI

MUFG uses ChatGPT Enterprise to build an AI-native organization, improve workflows, and deliver new AI-powered financial services at scale....

論文深掘り arXiv 2026-05-28

Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables

We use a mean-field-based transformer model to theoretically investigate how auxiliary variables, such as positional encoding, prevent mode collapse of self-attention mechanisms. The use of mean-field transformers to analyze the properties of self-attention mechanisms has garnered significant attent...

#coding

論文深掘り arXiv 2026-05-28

MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference

Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token verification, incurring cost even when most steps are stable. We ask wh...

#llm#coding#benchmark

企業動向 Microsoft Research 2026-05-28

Data Formulator 0.7: AI-powered data analytics for enterprise data

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into actionable insights. The post Data Formulator 0.7: AI-powere...

#agent

論文 arXiv 2026-05-28

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

Printed circuit board (PCB) schematic design defines nearly all electronic hardware, but it remains manual and expertise-intensive. While generative AI has advanced digital and analog IC design, PCB schematic generation from natural-language intent is largely unexplored. This paper presents SchGen, ...

#llm#agent