Many, many years ago I had a friend who was very smart, and had a predilection for saying all sorts of stuff that seemed both facilitating and ridiculous. One thing I remember him saying is that any time there is a crowd of extras in a movie that have to make background conversation, they were instructed to say the phrase ‘rhubarb cluster’. He said that when everyone in a crowd says the phrase ‘rhubarb cluster’, it simulates a conversation, and keeps their lips moving in a way that looks realistic on film. There are other advantages too. He said it keeps the extras focused on doing something other than staring off into the distance in a way that could distract from the main movie scene, and it ensures that no actual words are heard or understood by the movie watcher (imagine the grief of two extras in a scene overheard saying “dude! I got so STONED this past weekend” on the audio track of a blockbuster). In short, by instructing the extras to say a specific phrase, a director keeps control of the soundscape.
I have wondered for years if this would work in the real world, or if my friend was making it all up, but I never really followed up any further. A few weeks ago I told my 6 year old daughter about this (almost certainly apocryphal) use of ‘rhubarb cluster’ in film, and she suggested I run an experiment using students in one of my classes to determine if it is at all plausible. So I did. Below, I present the methods and results.
I stood at the front of a class of about 70 students, turned on the voice recorder on my cell phone, and had them say three things. First, I had them say the phrase ‘rhubarb cluster’ repeatedly for about 10 seconds. Second, I had them say the alphabet repeatedly for about 10 seconds. Finally, I had them carry on a conversation with their neighbour for about 10 seconds.
Then I cleaned up each sound sequence. I cleaned them in two steps. Step one was to trim out the audio after a short ‘burn in’ period. I had to do this because for the first few seconds, the phrase ‘rhubarb cluster’ is quite audible:
I did this for all three audio clips. Then I normalised the volume levels for each clip to the same level.
I then combined the audio clips into a YouTube video:
Finally, I set up a short online survey for my students, asking them to watch the video, and then identify which of the clips was ‘rhubarb cluster’, and which was real conversation. Based on the method of administering the survey, I could not set up a proper choice set experiment (with random order of audio clips, for example), but I am not sure that would have affected the results much. Speaking of which…
Of the 80 students that answered the survey, around 63% could correctly identify the real conversation. By itself, that could suggest that ‘rhubarb cluster’ does not perfectly simulate audible conversation, and probably could not be used as background conversation in a film.
However, in processing the audio, I did notice something interesting about the sound levels of the three clips:
The first third of the clip is ‘rhubarb cluster’, the second is the alphabet, and the third is natural conversation. The first section has a much more stable noise level over time than the natural conversation (the third clip), even after adjusting for different average noise level. In other words, ‘rhubarb cluster’ yields a more predictable sound profile than natural conversation, especially after the burn in period. For audio engineers this could be an advantage, since it would allow them to record audio at a high volume without worrying about ‘peaking’ sound levels. Peaking sound levels results in unwanted noise and distortion on recordings, and is generally avoided in audio recording.
The experiment here was not perfect, but I think it’s fair to say that the results do suggest that when spoken by a small crowd (in a university lecture room) ‘rhubarb cluster’ is detectable on a digital audio recording, perhaps even to the majority of people hearing it.
Having written that, the idea of having a crowd of extras in a scene on a movie set saying some predefined phrase doesn’t seem totally ridiculous. It would give the crowd some predictable behaviour to simulate and it could make sound recording easier. In the experiment I conducted students could tell the difference, but perhaps a longer phrase or a longer burn in period would have made the phrase less detectable, and make the sound more natural. Perhaps I’ll try that in next year’s class!