2026-05-07

17件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-05-05 HF ↑18

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the a...

#agent#multimodal#rl#benchmark
論文 深掘り Hugging Face 2026-05-05 HF ↑18

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For ...

#diffusion#fine-tuning#llm#multimodal#vision
企業動向 深掘り OpenAI 2026-05-07

Parloa builds service agents customers want to talk to

Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions....

#agent#speech
論文 Hugging Face 2026-05-05 HF ↑28

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in fu...

#robotics#diffusion#multimodal#agent
論文 Hugging Face 2026-05-05 HF ↑86

Stream-T1: Test-Time Scaling for Streaming Video Generation

While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structura...

#speech#diffusion#benchmark
企業動向 OpenAI 2026-05-07

Testing ads in ChatGPT

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control....

論文 Hugging Face 2026-05-05 HF ↑8

StableI2I: Spotting Unintended Changes in Image-to-Image Transition

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure...

#benchmark#llm
論文 Hugging Face 2026-05-05 HF ↑1

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments th...

#llm#rl#benchmark
論文 Hugging Face 2026-05-05 HF ↑7

Lightning Unified Video Editing via In-Context Sparse Attention

Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing....

モデル OpenAI 2026-05-07

Introducing Trusted Contact in ChatGPT

Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected....

#alignment
企業動向 OpenAI 2026-05-06

How frontier firms are pulling ahead

OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage....

#agent
モデル OpenAI 2026-05-06

Introducing ChatGPT Futures: Class of 2026

Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT....

論文 深掘り arXiv 2026-05-06

Transformed Latent Variable Multi-Output Gaussian Processes

Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions,...

#benchmark