AUDIO GAMES: FUN FOR ALL? ALL FOR FUN?
by Sue Targett and Mikael Fernström

ABSTRACT

In this paper we investigate if it is possible to create entertaining computer games that use only non-speech aural feedback and if such games could be used for skills acquisition or in therapeutic applications. To answer these questions we developed two computer games, Os & Xs (Tic Tac Toe) and Mastermind, representing all necessary information through auditory display. User testing confirmed that the games were playable and early indications are that the games can be entertaining, particularly for the blind community. Testing also suggested that playing audio games could assist in increasing both memory and ability to concentrate, thus showing potential for both skills acquisition and therapeutic applications.

1. INTRODUCTION

This work was motivated by the desire to explore two lines of enquiry:

  • whether computer games using only non-speech audio could be developed and entertain players.
  • if playing these games could develop skills that could be useful to the player beyond the gaming environment.

The number of visual games and puzzles is enormous. By playing these games and solving puzzles, players benefit in many ways. There is entertainment, without this the games and puzzles would lie unused, but there are also other benefits. Some games give the player an increased ability to solve problems by logical reasoning, some to plan ahead and develop tactics. Other games build manual dexterity or develop quick reaction times. Games that are played by more than one person at a time offer safe arenas for learning social skills. Some people relax by playing a quiet game of solitaire.
But do games have to rely so heavily on visual information? Could games rely solely on non-verbal sounds and still be fun? Would games created using only non-verbal audio enable sighted people to develop 'aural dexterity' or other skills? Would audio games contribute anything worthwhile to the partially sighted and blind communities?
We found several games that relied solely on using audio representation. Of the games found, two types emerged; those which used spoken descriptions of visual situations and one which used only non-verbal audio cues. A sophisticated example of the speech-based games is the 'Grizzly Gulch Western Extravaganza™' [1]. In this game the player is talked through the game setting, a Wild West town, and presented with options such as "do you want to go to the bank or the bar?" Once this decision is made, by navigating with arrow keys, the game paths the player chooses are accompanied by elaborate sound effects. Characters such as bartenders and bank clerks are established to ask many of the game's decision inducing questions.
Another example of an audio game based on visual scenarios is described by Drewes et al. [2]. The design, prototyping and evaluation of an audio version of the familiar board game Clue is outlined, and again much of the information was passed from game to player through verbal narrative.
We only found one example of a game relying solely on non-verbal audio information as the basis for the player's decision making. Winberg and Hellström describe the development of a purely sonic, non-verbal, game - The Towers of Hanoi [3][4]. Their studies clearly show that it is possible to create a game using only non-verbal sounds. The study was not, however, designed to investigate whether a purely auditory game can be entertaining or whether such a game could create a tool for skills acquisition. It therefore gives no reliable information on these issues.
Another, but very different application, is Phil Ellis's work with Soundbeam that proves that playing with sound through gesture control can be therapeutic. He records positive outcomes across a range of disabilities: autism, visual impairment, intellectual disability and profound physical and intellectual disability [5][6].

2. DESIGNING THE GAMES

Two games were selected for development; Os & Xs and Mastermind. The programming environment used for developing the two games was Opcode's Max/MSP.
Os & Xs was selected because it is so widely known. Players alternate placing their symbol (Xs or Os) on a 3 x 3 grid until one player gets three of their symbol in a row vertically, horizontally or diagonally, or the game ends in a draw.
Mastermind was chosen because it is a more challenging game to play and because it presented more challenging design issues. In Mastermind one player picks a combination of colours and the other player has to guess the combination correctly within a certain number of turns. Each time the player guesses the winning combination s/he is given an evaluation of how many 'Right Colours in the Right Place', 'Right Colours in the Wrong Place' and 'Wrong Colours' they have in their guess. This process is illustrated in Figure 1.

This picture shows the basics of Mastermind. It shows the three color-combination that the player has to guess, as well as three guesses from the player (with comments like:  "1 right colour in the right place and two wrong colors").
Figure 1 Mastermind - How the Game is Played

2.1. Os & Xs - Design Details
The sounds allocated to each square in order to enable accurate navigation of the grid were earcons [7]. Three earcons were designed for the grid, one for each vertical column. The first column used an ascending major triad. The middle column used one repeated pitch which was the same as the highest pitch of the ascending triad used in the left hand column. The right hand column used a descending major triad whose highest pitch was the pitch used in the middle column. To enable the user to distinguish whether the earcon was on the top, middle or bottom row the earcons were arranged in different registers. The top row used a high register, or treble / soprano pitch to start the triad on. The middle used a middle register pitch and the bottom row used low or bass register pitches. Graphically these arrangements are represented in figure 2 below. These earcon design principles follow the guidelines set out by Brewster.

This picture shows a 3x3 grid. the top row is commented with "high frequency", the middle row "mid range/frequency" and the bottom row with "low frequency". Each cel contains an icon, which is different per column. The cells in the left colum show an icon of 3 ascending dots. The cells in the centre column show an icon that consists of three dots in a horizontal row. The cells in the right column show an icon that consists of three descending dots.
Figure 2 Os & Xs Earcon Design and Arrangement

The sound that represents the X is an auditory icon [8]. It was designed to be a mapping of an X from the visual domain with two 'swish' sounds representing the drawing of the two lines of an 'X'.
When the computer has played its move it triggers a sound so the player knows that a move has been made. The sound created for the 'O' sound is also an auditory icon. It was modelled with two thoughts in mind; firstly the sound was representing an 'O' and should therefore have a round feeling, and secondly that the sound should be a little bit hostile, as the computer is the foe in this scenario. All of this is entirely subjective, and more importantly, may be relevant only to sighted people. Studies into what constitutes a 'O' sound, or an 'X' sound would likely yield different results with sighted and blind populations - if there were any consistent results at all. The player uses the arrow keys to move around the 9 square grid. When the game starts the player is automatically located on the middle square. The up, down, left, right arrow keys move the player one square in their respective directions.
The player selects a square simply by pressing the enter key when s/he hears the earcon representing the square s/he wishes to 'take'.
The device used to enable the player to distinguish between a square that has not been taken, a square that is taken by the computer, or a square that has been taken by the player, is to assign each scenario a different timbre, while maintaining each earcon's pitch arrangement.
The timbres used were selected primarily based on their amplitude envelopes. The supposition being that it would be easier for players to distinguish between very different amplitude envelopes than it would for them to distinguish different sounds solely on the basis of spectral distribution alone.

2.2. Mastermind - Design Details
The earcons used to represent the three colours are the same as those used for the middle row in Os & Xs; an ascending triad, a repeated single pitch, a descending triad.
When the game is started the computer selects the winning combination and waits for the player to make their first guess.
A guess comprises an ordered selection of three earcons. Any of the three earcons can be selected in each of the three positions; first, second or third. The player decides which earcon to put in first position by moving to the desired earcon using the left / right arrow keys and then pressing the enter key. When the enter key is pressed a sound is triggered to verify that the player has made a selection.
After selecting the third earcon or the player's guess, the combination is assessed to determine which of the following outcomes is correctly describes the players guess. There are three evaluation sounds communicating:

  • Right Sound Right Place
  • Right Sound Wrong Place
  • Wrong Sound

The sounds created in order to communicate these different outcomes are short so that the player does not feel that s/he is waiting for them to finish. It was felt that this information would be better delivered via an auditory icon rather than an earcon.
The icon designed to communicate the right sound in the right place was modelled on a high bell being hit twice in rapid succession and was generated using fm synthesis. It was felt that players might associate this with a 'go ahead' message reminiscent of a boat's bell, or the conductor's bell on a bus, or that it might create a sort of sonic tick (ü). The icon designed to communicate the right sound in the wrong place sounds like a plucked string; musical, but not 'sweet'. The Wrong Sound icon was the same as the 'O' icon in Os & Xs.
After the evaluation sounds have been played, a discreet pattern of notes is played to let the player know what number turn s/he is about to use. The system developed to convey numbers was modelled on the Roman numeral system, but was cognisant of the fact that we hear temporally and therefore need to be able to add as we go along. The communication of numbers in the audio domain has of course been done successfully in Morse code, but because these were designed to be used in conjunction with additional code for the 26 letters of the alphabet they are necessarily long. Each number comprises a string combining five dots and/or dashes. For this project, three earcons were devised to impart number information. Unlike the Morse code numbers, which use just one frequency, these earcons make use of both frequency and rhythm to facilitate learning and comprehension.
The premise used for the pitch values was that when the earcons were combined to create a bigger number the pitch sequence should rise in frequency, thus any number when converted into audio will use pitches that rise in frequency. The audio numbers are presented to the player at a rate of one note every 1/10th of a second and therefore take very little time to play so they are not frustrating to listen to.
In order to review previous moves the player must use the up and down arrow keys. Each time the down arrow key is pressed the audio number of the previous move is played. When the player reaches the turn number s/he wishes to review s/he presses the numerical pad 'enter' key. The combination that the player guessed on that turn is played and the appropriate evaluation result sounds played.

3. USER TESTING

The method of evaluation used was 'cooperative evaluation' using the 'thinking aloud' method / protocol [9]. The blind population is clearly a potential user group for this product, but this is not necessarily the only user group. From the sighted population, five postgraduate students were recruited.
Usage data was collected through a combination of direct observation, with notes being made as the user played the games, and indirect observation by recording all users on videotape. One camera angle was sufficient because there was no visual aspect to the interface. In debriefing sessions questionnaires were used with questions aimed at the usability of the software, its entertainment value, and its potential to generate new skills in the user.

4. RESULTS

Os & Xs (Tic Tac Toe)
Everyone found the game easy to learn, presumably assisted by the fact that everyone knew the game previously, and it is an easy game.
The game achieved and overall fun/boring rating between 'Quite Fun' and 'Slightly Fun'. Considering this is a game that almost everyone is familiar with and presents an extremely limited challenge to adults, this rating is good. Even though the participants only played the game once or twice after getting familiarised with it, they all felt there was potential for using the game to build additional skills:
"Maybe the game could be used for ear training for learning major / minor triads."
"Increase memory. Visualise sounds. Speed up visualisation. Map a sound to a place."
"Memory."
"Aural training."
"Better visualisation and localisation."

Mastermind
Participants found this game much more challenging. But the challenge was in learning the game, and particularly how to interpret the evaluations of each guess; the challenge did not lie in having to rely on sounds to carry information. On the fun/boring scale the game scored 'Quite Fun', with only one possible higher ranking - 'Extremely Fun'. The complexity of the game compared to Os & Xs made for a more entertaining game, once people got over the obstacle of learning how to play it.
Again all participants felt the game afforded skills acquisition:
"Same skills as visual Mastermind would, logic."
"Memory and logic."
"Could learn ear training, sound recognition, and logic patterns."
"Concentration, memory, recall."
"Concentration, memory."

5. DESIGN ISSUES

Temporal Issues
The main problem encountered in the interface was that the earcons were too long and this caused an overlapping effect when players moved from one to another, particularly as they gained confidence and started to move between them more quickly. This problem could be much reduced by decreasing the length of the earcons. In order to eliminate the problem entirely a different approach to playing the earcons would have to be taken. Instead of simply triggering, the earcon could be stopped as soon as the player moved to a new location.

Audio numbers
These never got tested adequately because it was felt that the user tests were getting too long when attempting to teach volunteers the review/replay mechanism. If the audio numbers are found to be functional in subsequent testing then thought may have to be given to devising a system that could expand to encompass a larger range of numbers.

Auditory Icon Design
Studies into what constitutes an 'O' sound, or an 'X' sound are likely to be different with different populations. A study into the most intuitive sonifcation of various graphical icons such as 'ü', 'X', '?' and '!' could be an interesting area of investigation.

6. Conclusions

This work builds on Winberg & Hellström's sonified Towers of Hanoi [3][4], and provides further evidence that games can be constructed using solely non-speech audio representation. Neither of the games constructed for this study made use of panning or direct manipulation to effect data change as the Towers of Hanoi did. The work completed here does, however, contribute to the field by strongly suggesting that audio games can be fun, and could provide a vehicle for the acquisition of skills, memory and concentration in particular.
It would be extremely interesting to use the games developed here with the Soundbeam [5][6] interface. There are many possible approaches that could be taken to realise the games using Soundbeam and different types of sensors for input devices. An example for Os & Xs would be to locate sounds on a grid presented in space in front of a user, with square selection being controlled by, for example, eye movements. There would be many presentation and input permutations that could be developed to motivate users to engage in therapeutic movement. Alternatively the different presentation and input options could be used to make the games accessible to profoundly disabled users who currently have few or no avenues of entertainment.

6.1. Earcons & Auditory Icons
This project also found both earcons and auditory icons useful for communicating information - and that they can work comfortably together sonifying different kinds of data within a single interface.
Four of the five people who volunteered to test the games had significant musical experience; one holding a Bachelors degree in music and three had done some ear training. This probably reduced the task of learning the earcons significantly. It should also be noted, however, that the participant 4 who said he played "a bit of guitar years ago" and had "not really" done any music theory did not have a strong musical background and had no problem learning and differentiating the earcons.

6.2. Audio Games: Fun for all?
From this work it is reasonable to state that audio games can be entertaining for sighted users. Unfortunately there is no data available yet in relation to blind or partially sighted users, but it is not far-fetched to predict that this population will find more entertainment value from these kinds of games than the sighted community have.

6.3. Audio Games: All for fun?
This study strongly suggests that prolonged use of audio games could bring about the acquisition of various skills. Different design approaches could be taken in order to develop particular skills, for example memory, or pitch relationship perception. Much more extensive testing over longer time would have to be done to prove this hypothesis. But, these preliminary indications are encouraging.

7. REFERENCES
[1] Bavisoft. 2002. Software for the Visually Impaired. Available from: http://www.bavisoft.com/index.htm (July 2002).
[2] Drewes, T., E. Mynatt and M. Gandy. 2000. Sleuth: An Audio Experience. ICAD 2000, Atlanta, Georgia.
[3] Winberg, F. and Hellström, S. O. "Investigating Auditory Direct Manipulation: Sonifying the Towers of Hanoi." CHI 2000, 1-6 April 2000.
[4] Winberg, F., and Hellström, S. O. 2001. "Qualitative Aspects of Auditory Direct Manipulation A Case Study of the Towers of Hanoi." Proceedings of ICAD 2001, Espoo, Finland.
[5] Soundbeam News. Issue 7, January 2000. Available at: http://www.soundbeam.co.uk/ on 2nd April 2002.
[6] Soundbeam News. Issue 8, March 2001. Available at: http://www.soundbeam.co.uk/ on 2nd April 2002.
[7] Brewster, S., 1994. Providing a Structured Method for Integrating Non-Speech Audio into Human-Computer Interfaces, Ph.D. Thesis, University of York.
[8] Gaver, W.W., 1997. Auditory Interfaces in: Handbook of Human-Computer Interaction, 2nd, completely revised edition, M. Helander, T.K. Landauer and P. Prabhu, (Eds.) 1003-1041. Amsterdam: Elsevier Science B.V.
[9] Preece, J., Y. Rogers, H. Sharp, D. Benyon, S. Holland, T. Carey. 1994. Human-Computer Interaction. Harlow, England: Addison-Wesley.

8. AUTHOR INFORMATION

Sue Targett (click here to visit the authors website)
EIRÍ Corca Baiscinn
Community Centre, Circular Road, Kilkee,
Co. Clare, Ireland
[email protected]

Mikael Fernström (click here to visit the authors website)
Interaction Design Centre
Dept. of Computer Science and Information Systems, University of Limerick, Ireland
[email protected]

(C)2003 S. Targett and M. Fernström