Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Software Automatic Mouth – Tiny Speech Synthesizer (github.com/s-macke)
75 points by userbinator on June 3, 2022 | hide | past | favorite | 25 comments


I rewrote the main loop of this a few years ago to be re-entrant, so you can render N samples at a time. Synth projects wanting to use SAM should start somewhere like that probably.

https://github.com/boourns/SAM

I then ported it to Mutable Instruments Braids, a eurorack module:

https://burns.ca/eurorack.html


This is cool! I’m going to try porting this to a SuperCollider plugin at some point


Author here, nice to see this project on Hacker News. Another project of mine that needs a complete overhaul after years of neglect.

If you want to hear the output you can directly go to

https://simulationcorner.net/index.php?page=sam

The synthesizer was even able to sing: https://simulationcorner.net/SAM/sing.wav


Derivative work of a commercial product. Briefly mentions the FAIR USE Act of 2007, but that act never passed. Author is more or less banking on the fact that the company went bankrupt so who knows who owns the rights nowadays. Shouldn't impact anyone since it's basically just a curiosity, but just FYI.


So the legally-unrecognized "abandonware" concept.

(EDIT: author refers to it as abandonware.)


Worse than that it's Abandonware with extra steps. He tries to justify it by pointing to the FAIR USE Act which as I noted never passed. What's worse is that the amendments to the DMCA proposed in that act have nothing to do with what he's doing. Shouldn't really be an issue for someone who just wants a fun toy to play with, but anyone touching this should go in with full knowledge.


AFAIK this is just one (of many) implementation of formant synthesis, which I believe has a lot of prior art and little novelty by now. I wonder if there's active patents on this one?


For this particular implementation there shouldn't be since it is quite literally a disassembly of a product from the 1980s that was then translated to C. However, none of that matters as we're talking about copyright and the literal copying of the protectable expressions that constitute that work, not patents.


I don’t know if it was the original version or a hack. But I had a version that worked on an Apple // series using the 1 bit built in speaker. It actually was decent.

A few of my first assembly language hacks was to get it working when in 80 column mode and getting it to work in ProDOS.

The original version worked by redirecting text output to it and then you had to call another routine to redirect text output back to the monitor. But it would always revert the screen back to 40 column mode.

The second hack involved copying it to a ProDOS disk and changing the output to use the ProDOS routine for text output.


> But I had a version that worked on an Apple // series using the 1 bit built in speaker

Is it true the Apple // version came with a hardware card?


I’m thinking the original version came with a hardware card. But someone hacked it to work with the internal speaker. I definitely did not have a hardware card.

It worked both with my original Apple //e and later my Apple //e card for my LCII.


I had SAM on my Atari 800. It was notable that it turned off the display when talking. My understanding was that was either for code performance, timing consistency, or both.


The C64 version did the same thing, at least by default (I think the command to change it was ]LIGHTS). It was slightly faster and thus better quality since the VIC wasn't causing bad lines.

In some ways I like this resurrected SAM more than things like Rsynth. RECITER seemed like magic at the time, though now with a linguistics degree, I prefer raw SAY with phonemes.


yeah, from memory the display list interrupts messed with it


We definitely need some law allowing the use and distribution of abandonware. Things like Archive.org should not live under a legal limbo.


I have a couple of speech synthesis chips from that era I picked up on eBay or somewhere — made an "Arduino shield" for the chip some years ago. It ran warm and needed a surprising amount of current. Such was the technology of the time.

I love this era of speech synthesis. Robots were white and chrome and would one day be our cheerful servants. That they talked/sounded different from us only endeared them to us — reminded us that they were after all machines.

Siri and her sisters, like Pinocchio, want to be real humans. I would be surprised if in 40 years anyone will be nostalgic for them.


You didn’t happen to buy them from somebody through the Seattle Robotics Society, did you?

I had collected a bunch of General Instrument’s SP0256 chips that I sold (along with the rest of my electronics hobby stuff) maybe 15 years ago. I think there was a Basic Stamp or two, a Handyboard, and maybe a Pic chip or two.


Ha ha, maybe? That was a long time ago.

And thanks for reminding me of the awesome Handyboard and it's "Interactive C" language.


If you want to have a more state-of-the-art model (which are much bigger and slower though), you find plenty of models and code online (also in whatever license you like). E.g. look here as a starting point: https://huggingface.co/tasks/text-to-speech


which are much bigger

As in, orders of magnitude bigger. SAM needs only a few tens of kB and produces recognisable but very robotic speech, whereas the ones you've linked appear to be hundreds of MB and probably produce something closer to an actual human. I wonder if it's possible to have very convincing procedurally-generated human-like speech in a size larger than SAM, but less than those ML methods; e.g. hundreds of kB, or few MB.

Here's a formant synth on the same order of magnitude as SAM, but size-optimised by the demoscene: https://www.pouet.net/prod.php?which=50530


Interview with the creator of the original SAM:

https://ataripodcast.libsyn.com/antic-interview-385-software...


Vital synth has a similar feature

it converts speech to a waveform which you can modulate as you wish


I doubt the Vital featue is similar. It's more probably based on some AI implementation, as it's limited to 5 generations per day in the free version, and is pretty realistic (not perfect, but quite good).


they use Google Cloud Text-to-Speech


oh wow, that's some interesting code. cool little project though, thanks for sharing




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: