Knowing I like old pieces of engineering, my family gifted me a fully functioning old rotary phone, identical to the one we had at our home growing up (for many years). It is a Siemens-Halske model built in Argentina probably in the early 1970’s.

I disassembled it many times when I was young. It was super easy as everything was secured with screws. I remember marveling at it’s clean clock-like mechanics and all the clever switching and electric setup that allowed all these functions to work on just two twisted copper wires:

  1. Dial
  2. Ring
  3. Hang-up
  4. Receive audio
  5. Send audio

I thought it would make a great project to marry that last century technology with the new era of AI, so I started a new project!

Basically I want to use the same setup I have for e-paper displays but instead of printing a quote, it would be read and play it on the phone’s speaker.

Here’s the high level requirements:

  1. Whenever a quote is available for the phone, it will ring 3 times.
  2. If you pick up the receiver, the quote will be played on the speaker.
  3. If you pick up the receiver and the quote was already read, it will just play a tone; waiting for dialing to happen. a. If you dial “1”, it will play the latest quote sent to the device. b. If you dial “2”, it will retrieve a new quote from the back end.

This will be my MVP, with plenty of stuff to work around and solve.

The Phone

To begin with, I need a way of activating the electromagnetic ringer. This will require higher voltage power supply. 3.3V will definitely not do it. Most likely a motor driver of some kind, able to reverse current.

I will also need a way to decode the dialer signals. Based on my initial research I need to detect 2 different source of signals: 1. The “start/stop of dialing” 2. The pulses associated with each number.

This is relatively straightforward as these pulses are simply “digital” inputs for a micro controller.


The other challenge will be to generate “audio” out of text quotes. I’ve done experiments with both OpenAI and Deepgram. Both are super easy to use, and straight-forward APIs.

It is actually quite incredible how easy it is to generate high quality, synthetic voices, in any language, with tone inflections, accents. The systems even simulate the breathing pauses to achieve realism.

My OpenAI implementation looks like this:

exports.tts = (text, done) => {
  const openai = createOpenAI();{
    model: "tts-1",
    voice: process.env.OPENAI_TTS_VOICE || "shimmer",
    input: text
  }).then((data) => {
    data.arrayBuffer().then((buffer) => {
      done(null, buffer);
  }).catch((error) => { 

And Deepgram is equally simple (no module required in this case, just using HTTP):

const url = '';

exports.tts = (text, done) => {
    const options = {
        method: 'POST',
        headers: {
          accept: 'text/plain', 
              'Content-Type': 'application/json',
              Authorization: `Token ${process.env.DEEPGRAM_API_KEY}`
        body: JSON.stringify({text: text})
      fetch(url, options)
        .then(res => res.arrayBuffer())
        .then(buffer => {
            done(null, buffer);
        .catch(error => {

I am not quite an async guy… I actually like callbacks, so my code uses mostly that and promises.

Now for playing back the generated MP3 file, I am using the equally outstanding VS1053 Audio board. Which for me using the Adafruit Feather M0 board, it comes conveniently packaged in a way I can simply stack it.