PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
要約
Many operations on sensory data — comparison, memory, retrieval, and reasoning — are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstructi…