You *can* try to train an adapter from a raw 400-byte MP3 frame to an embedding ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ACCount37 64 days ago \| parent \| context \| favorite \| on: Neural audio codecs: how to get audio into LLMs You can try to train an adapter from a raw 400-byte MP3 frame to an embedding for a given LLM (4096+ floating point numbers, exact precision varies). But you'd need that information to be digestible for a neural network. Otherwise, you'll have a very hard time getting that adapter to work. As a rule: neural networks love highly redundant data, and hate highly compressed data at their inputs. Tokenized text good, GZIP compressed bytestream bad. But who knows, really. It's a rule of thumb, not a mathematical law. So you could have some success getting that MP3-based adapter to work. I've seen weirder shit work.

a-dub 64 days ago [–]

if you were able to normalize and quantokenize the distinct dct values in a consistent way, it could be an interesting approach. so yeah, undo the bit packing but keep the front end signal processing and compressed dct representation and viola! something quite weird that might actually work. :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact