The voice of WALL-E
Today’s article is about Disney-Pixar’s animation film Wall-E and the memorable voice of the main little robot character. According to sound designer Ben Burtt, the robot voices are “like a toddler […] universal language of intonation. ‘Oh’, ‘Hm?’, ‘Huh!’, you know?”.
Ben Burtt explains how he did the voice of Wall-E here: “You start with the human voice input and record words or sounds and then it is taken into a computer and I worked out a unique program which allowed me to deconstruct the sound into its component parts.
I could reassemble the Wall-E vocals and perform it with a light pen on a tablet. You could change pitch by moving the pen or the pressure of the pen would sustain or stretch syllables or consonants and you could get an additional level of performance that way, kind of like playing a musical instrument.
Voices are the hardest because the audience listens to them with much more critical ears than sound effects. We are all experts at interpreting the nuances of speech, so anything that might be interpreted as a vocal or expression the audience really listens carefully.”
Here’s a little example of what I’ve been able to recreate in Kyma with my own voice:
- Additive synthesis parameters in a discrete-time implementation can be determined using the Fast Fourier Transform (FFT).
- The analyzed time-domain signal is split into blocks or ‘frames”, each of which is processed using the FFT (referred to as the Short-Time Fourier Transform (STFT).
- The STFT provides a means for joint time-frequency analysis.
- As well, a time-domain signal can be resynthesized using the Inverse Fast Fourier Transform (IFFT). The resulting IFFT frames are “assembled” using overlap-add techniques.
- With improvements in computer processing speed, it is now possible to perform IFFT resynthesis in real time.
- FFT/IFFT synthesis lends itself well to sound transformations, such as time-stretching and pitch scaling.
Here’s a very short and simple explanation of the Kyma patch that you can see at the bottom of this page:
As Ben Burtt simplified:
We all know how pictures are pixels now and you can rearrange pixels to change the picture. You kind of do the same thing with sound.
FFT Synthesis/Resynthesis patch in Kyma: