|the international journal of computer game research
|volume 4, issue 1
Zach Whalen is
a PhD student at the University of Florida. His current writing and
research includes work on video game genre theory, comics studies,
House of Leaves, and digital pedagogy.
Play Along - An Approach to Videogame Music
by Zach Whalen
As recent attention in Associated Press stories and a New York Times article attest, videogame studies is beginning to emerge from its murky status as an "academic ghetto." Videogames provide rich opportunity for interdisciplinary study, but at least one aspect of videogames remains to a large extent unrecognized for its impact on the game player. While the game industry invests heavily in the creation of music, and nostalgic themes from early games resonate powerfully with mature gamers, music in videogames has so far remained a tangential footnote to preliminary studies that attempt to account for the medium within the academy. While game studies is becoming increasingly assimilated into current strains of academic discourse, "grand unified theories" of games fail to account for the ways by which the musical soundtrack of a game affects the user's experience and creates a seamless impression of gameplay.
This essay builds on a small body of critical writing dealing with videogame music including Mathew Belinkie's (1999) useful history of game music, online at the Video Game Music Archive (www.vgmusic.com), David Bessell's (2002) chapter in Screenplay: cinema/videogame/interface and Paul Weir's dissertation on sound design and structural approaches to music in games. Robert Bowen has also provided an insightful analysis of Atari 2600 games as musical products themselves, mapping musical structure onto the sound effects and programming capabilities of the console. Belinkie's paper is a rich history of the most influential composers working in videogames, and though Bessell's chapter provides an interesting analysis of several games, his approach fails to take game type into consideration and instead compares and contrasts three games of wildly different type and structure without significant contextualization to make his argument useful beyond itself.
In this essay, I attempt to develop a workable theory of videogame music that approaches the question of music as a part of the narrative component of games. While I intend to steer clear of the ludology versus narratology debate, certain assumptions and allowances must be made in my approach that will ultimately favour one side; but, as the necessarily limited scope of this inquiry requires a certain focus, I hope to move quickly beyond the meta-critical questions paralyzing certain conversations in the field. Accordingly, my argument is that cognitive theories of perception and questions of immersion versus engagement as a means of understanding "flow" or pleasurability in games allows for a richer understanding of the complex communication involved in videogame music.
I use the terms "immersion" and "engagement" to refer generally to the process of reading, specifically reading for pleasure, but in emphasizing a third term "flow," J. Yellowlees Douglas and Andrew Hargadon present a context for describing the quality of interacting with a hypertext or interactive narrative such that an ideal condition of flow in which "self-consciousness disappears, perceptions of time become distorted, and concentration becomes so intense that the game . . . completely absorbs us" is achieved as a dialectic between unconscious states of immersion and conscious moments of engagement (Douglas and Hargadon, 2004). Immersion is giving in to the seduction of the text's story, to be blissfully unaware of one's surroundings and the passing of time as one escapes into the pleasure of reading. By contrast the experience of being engaged with narrative (or any other semantic object or expression) involves an abstracted level of awareness of the object qua object. In schematic terms, immersion is the act of relying on learned behavioural scripts at a level of instinct – being "in the moment" without having to be aware of what it takes to be in the moment – while engagement is the process of learning the scripts and requires an objective awareness of the object supplying the new schema. In practice, immersion and engagement provide a continuum of experience and, to the extent that texts rely on the same cognitive processes as the "real world," successful achievement of a flow state can be likened to being actively immersed in the moment of engagement. Douglas and Hargadon provide the examples of artists, musicians and athletes who because of their skill in manipulating a schema exhibit symptoms of flow (for example, distorted sense of time, sense of freedom or abstraction): their interaction with schema relies on a proficient degree of agency. In videogames, successful play often involves both an understanding of prior scripts and an ability to intuitively engage new scripts by acting within an abstracted schema.
It is important that the videogame medium adopts certain roles for music from prior narrative media. Specifically, early cartoon music and horror films established certain tropes that videogames rely on today. Furthermore, studies of the relationship between audial and visual elements in older media (for example, film) prove useful for understanding game music because certain basic ideas (for example, diegetic versus non-diegetic musical sound) apply to videogames. Experiments that analyze how viewers interpret purely visual media versus combined visual and aural media are helpful for exploring the cognitive processes involved. The interactive element of videogames requires its own analysis, so a combination of theories of "flow" with these earlier studies of visual/aural media lead to a pair of concepts that describe two common functions of music. Specifically instances of musical sound in videogames generally follow one of two trajectories: to expand the concept of a game's fictional world or to draw the player forward through the sequence of gameplay. One way to think about these functions is by analogy to the metaphor and metonymic behaviours of language described by Roman Jakobson in the context of a paradigmatic/syntagmatic model of communication. 
Obviously, these two functions are interrelated, but some games may have a preponderance of one or the other. Tetris, for example, has very little in the way of fictional space. Though there is an extent to which all games have a rhythm of alternation between "safety" and "danger," it is necessary to limit this initial argument to certain game types. I hope to show that these concepts actually provide a way of distinguishing game types and that they correspond to recognizable generic distinctions in games; but, for the purposes of this argument, I focus on the platform and survival horror genres . 
It is important to note as well that this argument does not apply to games' cut-scenes – the short movies between levels that advance the plot or provide back story. Cut-scenes’ widespread adoption of filmic perspectives and techniques renders their analyses more appropriate for straight-forward film theory or a more thorough videogame narratology.
Also, by "videogame music" I generally mean the parts of the soundtrack that are pre-composed and recorded for playback to accompany specific locations or events in the game. The differences between game music and game sound can be subtle, especially if the music has an "industrial" style (that is, incorporating mechanical sounds as part of the music) as in American McGee's Alice (2000) or Silent Hill (1999)< or if the sound is ambient as in Silent Hill's "school" levels. Therefore, I often conflate the two for purposes of brevity and relevance. There are some important ways in which videogame sound deserves an entire analysis of its own, but the broad strokes of my current argument apply generally to sound as well.
The music/sound problem is further complicated by a distinction between diegetic and non-diegetic music in that the diegetic music functions similarly to the incidental diegetic sounds that populate an environment. Sue Morris writes that sound in first-person shooter (FPS) games is used "to provide an audio complement to action on the screen . . . and to create a sense of a real physical space" (Morris, 2002). A successful player, Morris argues, must perceive the game's space in 360 degrees, most of which are provided as audial information through a player's speakers or headphones; music playing on a radio in the game world, for example, fits well into this purpose of implying space through sound. The character of the music is inconsequential to its function as a location-specific sound. My argument is more appropriate for non-diegetic music, but many of the sounds a player hears are also not generated "from" any visually represented object. In practice, the combined term "musical sound" may be the most appropriately inclusive label. In other words, my argument applies to many instances of game sound as well as game music.
Comparing games and films at all is a controversial approach, but it may be worth pointing out some of the basic points of similarity from which we can derive a useful model of analysis. The key, fundamental overlap between videogames and films is the fact that films and videogames basically rely on both aural and visual cues to convey a sense of a consistent diegesis or gameworld. But a more appropriate and useful comparison comes in the relationship between videogames and animation. Paul Ward (2002) puts forth an interesting argument about games as a form of animation in that both games and animation strive for a form of representation that is more appropriately termed "emulation" rather than "simulation" in that both the game and the animated film rely on similar production techniques. Significantly, both the game's interactive world and the diegesis presented by the animated film respond to the characters in a manner that can only be believed if it is not realistic. Paradoxically, the amazement we feel at the level of detail presented in the environment of the characters may draw us into the alternate reality as a spectacle of technology,  but the actual dimensions of the represented world are not dependent on their referent, reality, but on the capabilities and narrative goals of the characters. Gaps between ledges in the worlds of Tomb Raider are spaced exactly according to the abilities of Lara Croft and are not imitating the product of erosion or other natural causes. In both cases, animated film and game animation, timed musical cues and sound effects typically suggest a responsive, narrative-specific environment aimed at either immersing the viewer/user in the spectacle of storytelling or engaging the viewer/user in the kinesthetic emulation of problem solving in a narrativized context.
Cartoons rely on music to reinforce the impact of their visuals, so the relationship of the viewer to the character operates under the "redundant" mode identified in standard film theory (in a "death scene," for example, the music restates the narrative action with a minor chord), but in cartoons the restatement is of a different sense. Percussive musical sounds accompany any violent or rapid motion so the movement itself is reinforced and appears to be more “alive” than if it were silent. Cartoon music can be emotionally expressive, of course, but its primary function is as a kinesthetic vehicle so that live-action film is often deemed "cartoon-like" when musical cues accompany or emphasize violent physical action (think of The Three Stooges, for example). The point may be to provide a humorous counterpoint to the visual of the violence and to characterize the violence as not hurtful so that we laugh at it (Strauss, 2002). But the power of the reinforcement itself can be directed toward more profound impressions.
Studies of cognition and animation suggest that objects are perceived as alive and exhibiting anthropomorphic behaviour when their motions are accompanied by a synchronized soundtrack (Cohen, 2000). This phenomenon relates to Chion's (1994) poetic description of the sublime incorporation of sound into the space of the animation, and the two observations (Chion's theory and Cohen's research) are linked because, for Chion, one of the key functions of sound is to aid in an audience's perception of a spatial diegesis. The cartoon character acting in the space of the animation relies on musical accentuation to make the illusion compelling, and the result is that the musical cues and non-musical sound effects instill objects with even more life than the simple appearance of figures in motion.
This synchronization is termed "mickey mousing" and is used pejoratively by composers who disdain the practice. A familiar example of the practice occurs in the well-known Fantasia (1960) scenario where an army of broomstick soldiers march to the ominous, martial beat of Paul Dukas' L'apprenti sorcier , but the practice goes back to the earliest days of animation when theatre pianists would accompany silent cartoons with appropriate music. Mickey mousing, or "mickeymousing," occurs in both animated and live-action cinema when the music provides a synchronized, aural imitation of what is happening on the screen (Neumeyer and Buhler, 2001). The goal in simple mickey mousing certainly seems to be more physically or kinesthetically oriented, but in that it represents a character’s relation to its fictive universe, mickey mousing roots the cartoon character in a whimsical world whose space is responsive only to the constraints of the character. This is consistent with Ward's emulation context because the soundtrack exaggerates rather than imitates physical motion. A very common example is a simple ascending sequence of light staccato notes drawing attention to a character tiptoeing. Music which synchronizes musical cues to physical, slapstick violence also places the character within the aural characterization of the story's space. A very similar practice occurs in early videogames as the examples below help demonstrate.
A pair of examples of the earliest uses of the mickey mousing effect demonstrate its ability to inscribe character and setting. Scored by the legendary Carl Stalling, Skeleton Dance (1929) demonstrates the complicated blend of diegetic and non-diegetic music and sound that merges to create an immersive story. The narrative is structured around a group of skeletons performing a dance routine to an orchestrated adaptation of Grieg's March of the Trolls (1994). But the introduction of the story blends music with sound effects to set the spooky atmosphere. The following image (Figure 1a) shows two cats reacting to a skeleton rising from his grave. As the skeleton rises, we hear an ascending D minor scale (Figure 1b) on a stringed instrument – a common figure in cartoon music, and the cats’ fright is mimicked in a similar descending arpeggio (Object 1).
Figure 1. An example of mickey mousing in Disney's The Skeleton Dance. A) The skeleton’s rise from the grave is synchronized with an ascending scale. B) Musical approximation of sound accompaniment for A. © 1929 Disney.
Similarly, as the skeleton begins to skulk about, his footsteps are punctuated with a staccato harmonic-minor scale in D, employing hollow, wooden timbres in the percussion. As the piece transitions from a "foxtrot in a minor key" (Stalling, 2002) to an adaptation of from Edvard Grieg's March of the Trolls (1994), the orchestration and instrumental choices mimic a hollow, dry sound one might expect from dancing skeletons by using a marimba or a similar wooden instrument for the melody to accentuate the skeletons' percussive motions.
These harmonic and instrumental choices demonstrate uses of music to convey a specific mood in accompanying the visuals. Diegetic and non-diegetic music blend with the sound to create a specific and compelling mood. The music that acts as a non-diegetic underscore of the skeleton's action soon blends into a choreographed dance number where the skeletons clearly respond to and play the obviously diegetic music that we hear as well. Thus the "source" of the music has become diegetic, whereas the initial mickey mousing gestures are non-diegetic underscores. The fact that this shift is accomplished seamlessly corresponds to the contiguity of the videogame in that atmospheric or "tone setting" sound of the videogame's worlds quickly and smoothly give way to didactic or motivational implementations that emphasize the videogame's sequence.
Another piece scored by Stalling illustrates this same type of synchronicity with a different spatial characterization in play. Galloping Gauchos (1928) uses an Argentinean setting, and the score is appropriately tango-oriented. The same blend of mickey mousing with a non-diegetic soundtrack takes place (enacted in this case by Mickey himself) in the opening sequences. This time, Stalling employs similar ascending figures to accompany and characterize ascending objects. Mickey tosses a cigarette into the air and catches it with his teeth to impress Minnie, and the lighter timbre of a slide whistle match the daylight atmosphere of the event and set the light-hearted tone intended to associate favourably with Mickey as opposed to the spooky timbres of Skeleton Dance.
These examples illustrate the importance of non-diegetic music and sound to the communication of cartoon stories. In accordance with the simple redundant versus contrapuntal continuum theory of film music (where a score either reinforces or opposes the onscreen action), the character we are supposed to view as loathsome is "mickey moused" with predominantly minor or diminished scales and arpeggios while Mickey's action is predominantly narrated with "happy sounding" major or diatonic scales. This same principle will apply to videogame music in its ability to create a compelling and entertaining emulation.
Perspectives on Animation, Games and Causality
Seeking an explanation for why mickey mousing is so effective, psychologists have experimented with the potential for simple shapes and sounds to evoke a narrative, cognitive meaning. Annabel J. Cohen (2000) has conducted studies which test subject's interpretations of certain types of movement into emotional states as well as the effect musical accompaniment had on the interpretation of the same moving figures. Her first of several studies identified musical features that correspond to emotions along a five-point happy/sad scale. Specifically, major triads played in different octaves at different speeds revealed that higher, faster repetitions yielded a higher ("happier") score than lower, slower repetitions (Cohen, 2000). The fact that such a simple sound system could correlate so strongly to an emotional scale hints at the complex emotional interpretations made possible by different harmonies, chords and key changes. Understanding such complexities would require a more elaborate emotional model, and the results would, no doubt, vary more for each individual listener, specifically across cultures. At any rate, Cohen's studies suggest similar conclusions to Alan Leslie's: that emotional interpolation may be inherently part of interpreting sensory information.
Cohen has also led studies which tested the correlation between visual and aural stimuli by asking subjects to use the same five-point scale to comment on a simple animation of a bouncing ball. The ball's movements matched the triads, moving up and down at slower or faster rates and at higher or lower positions. Accordingly, "low, slow bounces were judged as sad, and high, fast bounces were judged as happy" (Cohen, 2000). When the two stimuli are combined, the results are consistent with either the motion or the music, but when the two diverge, the musical accompaniment was shown to influence the interpretation of the visual.
A slightly more involved study subsequently experimented with the affective meaning of story interpretation. Using shapes that again were generally perceived as two lovers escaping a bully, two soundtracks were tested for their effect on viewer's interpretation of the scene. There were differences: one "character" was seen as more active when viewed with a soundtrack which expressed temporal congruence to "his" movements (Cohen, 2000). This apparent association led Cohen to develop the "Congruence-Associationist framework" which holds that "through structural congruence, music directs specific visual attention and conveys meaning or associations" (1998).
In Actual Minds, Possible Worlds Jerome Bruner(1986) mentions several studies of the perception of causality that were performed by cognitive psychologists seeking to determine if perceiving causality is an innate or learned feature of understanding. Michotte demonstrated that "when objects move with respect to one another within highly limited constraints, we see causality" (Bruner, 1986) emphasis in original). Further studies – Alan Leslie's, Fritz Heider and Marianne Simmel's – indicate that we also see "intentionality" and that the ability or desire to interpret information as essentially a story may be fundamental or automatic from birth (Bruner, 1986). One can draw interesting conclusions from this type of study, notably the implied anthropomorphism of simple objects that we see as exhibiting intention, but the implications are clearly that just about anything can be a story. Annabel Cohen carries these studies in a different direction by addressing the kinds of stories we make out of the perceptions we have.
Furthermore, Heider and Simmel tested subjects' interpretation of a series of moving shapes on a blank background. According to Bruner, the test subjects invariably interpreted the scene as "two lovers being pursued by a large bully who, upon being thwarted, breaks up the house in which he has tried to find them" (Bruner, 1986). It may be that the testers intentionally modelled their moving shapes after every episode of "Popeye," or it may be that certain elements in that test film, such as the proximity or similarity of the two "lover" shapes led to certain, inherent conclusions. With this cognitive framework as a tool, one can begin piecing together the cognitive functions and semiotic relationships that make up the interrelation of visual and aural elements which create meaning in cinema, cartoons, and videogames. The congruence-associationist framework also provides a way of discussing the phenomenological difference between what happens when we watch movies and when we play videogames. Much work in film studies already assume the kind of correlation that Cohen and her colleagues found to be a cognitive function, and a system of conventions have developed these pre-existing schemas for instrumental musical narration. Applying similar findings to specific videogames will show that the semantic operations of music and sound in videogames employ similar but fundamentally unique strategies.
Super Mario Brothers
Since my argument is essentially that videogame music encourages and enhances the narrative experience of game play (that is, the music in videogames is one of several elements that make game play a compelling visual and aural experience which immerses players in a fictional space), I offer three "readings" of familiar videogames that exemplify the compelling and immersive properties of game music. The following examples, Super Mario Brothers (1985), Legend of Zelda: Ocarina of Time (1998) and Silent Hill (1999), illustrate different uses of these functions. While Super Mario Brothers is one of Nintendo's most influential games and successful franchises, I should note that I am using it as an example because of its familiarity and relative simplicity. I am not suggesting that Super Mario Brothers provides a template of game music archetypes, and it is important to point out that my observation of two primary, complimentary functions of music in Super Mario Brothers is not a cookie cutter that I intend to force other games into.
In 1985, Nintendo of America released what would become arguably the most influential console game so far, Super Mario Brothers. The side-scrolling platform game would spawn several spin-offs through four generations of consoles and dozens of rip-offs inspired by Mario's success. While it would be an oversimplification to say that Super Mario Brothers is important because it initiated videogame tropes like power-ups, extra lives and a metaphor of geographic expansion conveyed by progress through progressively difficult levels (Poole, 2000), Super Mario Brothers is, like Disney's Skeleton Dance, an opportunity to examine important aspects of its medium. More importantly, Super Mario Brothers provides a ready example of musical functions borrowed from animation at an early stage in videogames' development. Specifically, Mario's (or Luigi's) movement on the screen is accompanied by a musical mickey mousing gesture. In line with Koji Kondo's peppy theme music, Mario's "jump" (Figure 2a) is accompanied by an ascending chromatic glissando or slide – think of a "boing" sound (Figure 2b). Like Mickey's cigarette toss in Galloping Gauchos, Mario's leap has a pleasant sound (i.e., it does not use minor or diminished intervals), not only because we are supposed to identify favourably with Mario, but also because a typical game player will likely hear the same sound repeated hundreds of times in a dedicated period of gameplay. The mickey mousing effect is also intended to emphasize the physicality of Mario and his kinesthetic involvement with his environment. Accordingly, in Figure 3a, Mario has "powered up" to Super Mario, so, as Figure 3b demonstrates, the sound effect of his jumping is mimicked as the same glissando figure an octave below the original. Other movements and collisions in the game respond to Mario in a way that enhances the impact of the represented on-screen events. In this case, the musical mickey mousing is in tune with the creation of a believable gameworld, one which is fully characterized by the non-diegetic theme music.
Similarly, music and sound effects function in encouraging successful game play by providing positive reinforcement as consequences for actions in the game. "Dying" in Super Mario Brothers (Figure 5) produces a staccato pulse followed by a conciliatory musical cadence reminiscent of the music one hears upon a contestant's misestimating the value of a vacuum cleaner or dish set on The Price is Right. The music is a descending figure, mimicking Mario's ejection from the playing field. The music is a coded message of failure reinforcing the consequence of having to replay the level one more time, but similar messages of success reinforce the successful completion of levels in the game. Also, at a smaller level, the satisfying "ching" of collecting gold coins reinforces that behaviour which is strategically advantageous to advancing in the game. Considering an entire level as musical composition, "death" or "success" musical messages serve as cadences to that world's musical structure. In a similar manner, Robert Bowen's presentation at the 2004 Princeton conference on videogames analyzed Atari 2600 games as musical composition identifies death music as a cadenza to the incidental rhythmical music produced by the game's sound effects (Bowen, unpublished). In these ways, music works across a game's structure to encourage the user's continued play. The game's sequence is dependent on user input, so music that engages further participation can be said to function toward the continuity of the game play experience.
Object 5. Sound clip of "failure" cadence. The phrase of music a player hears each time Mario dies is in the same key as the "Overworld Theme" and provides a solid musical transition from one trial to the next in the trial-and-error pattern – a central part of the experience of Platform games – while avoiding the finality of the dirge-like Game Over music. WAV file. (obj5.wav; 3 seconds; 58.4kb)
Complementing this motivational function of music, Kondo's "Overworld Theme" (Object 6) reinforces the bright environment of the Mushroom Kingdom. The theme has been described as a funk or jazz tune "but with so much energy pumped into each articulated note, one is not sure whether it invokes cheesy Vegas lounge music or a Dixieland band" (Belinkie, 1999). This sunny-sounding tune is heard only in areas of the gameworld (the Mushroom Kingdom or "Overworld") where the level is above ground (Figures 2a and 3a). Transporting via tunnel to the underworld (Figure 5a), one hears the "Underworld Theme" (excerpt Figure 5b) which modulates to the key of G minor and has a hollow, eerie feel. Also, though the key of the piece is scored at G minor, the melody lacks a tonal centre (i.e., it never comes to rest on the tonic, G) and relies on tense chromatic passages. These chromatic tone clusters contribute to the feeling of enclosed claustrophobic space of the underworld, and the lack of tonal centre conveys the disorientation appropriate for underground spaces.
Object 7. Sound clip from underworld. Note how the lack of tonal centre and use of minor tone clusters accentuates the loneliness and "angularity" that somehow seems appropriate for being under ground. WAV file (obj7.wav; 12 sec.; 275kb)
Other areas of the gameworld have their own musical signature as well. The musical accompaniment for the underwater stages is a lilting, peaceful waltz. Certain reasonably predictable associations with different types of music allow the game designers to use the music to enhance our belief in the consistency of a particular emulated world. Each world has its own theme which characterizes the environment, and the theme loops to indicate a static consistency. When the theme's tempo increases, however, the music provides a signal that the time limit is approaching and the player must make extra effort to complete the level in time. The music remains in the same key, but doubles its tempo, adding a sense of urgency to the specific mood of the game space. This cue acts as a motivational device, and it breaks the lull of immersion encouraged by the repeating. The music is, therefore, shifting into a mode of engaging a player's response by calling her to a faster or more skillful interaction with the game.
At a broad level, the musical character's of each world's levels advances concludes with the ominous fourth level which acts as a motivational cue similar to the tempo increase at the end of each individual level. These "castle" levels build up to an ultimate battle with a boss character, a subordinate manifestation of Bowser. This music is similar to the Underworld theme in its lack of tonal centre and reliance on chromatics. Here, the confined space afforded to the player (Figure 6a) is mirrored in the dense cluster of notes that carry the theme (Figure 6b).
In the case of Super Mario Brothers, there are two types of musical changes that involve the player in the gameworld. First, the composer and game designers match appropriate music to specific game environments. This type of music functions to draw the player into the fictional world of the game by making the environment more believable. Second, musical shifts in tempo or character motivate the player to perform the actions that connect the sequence of the game experience by rewarding successful behaviour and punish failure. The second, motivational function takes the form of punctuating cadences and audible shifts from a "safety state" to a "danger state."
Legend of Zelda: Ocarina of Time
Another game that elaborates on these basic functions patterns is Legend of Zelda: Ocarina of Time. This game extended the popular and influential Legend of Zelda series into the three-dimensional world made possible by the Nintendo 64© console. Like Super Mario Brothers, Ocarina of Time employs music to function with the same patterns of music, but the complexity of the musical score and the real-time blending and fading allowed by the game engine creates a more lush, cinematic feel. Composer Koji Kondo again uses particular melodic themes to identify specific areas of the gameworld in something like Wagner's leitmotifs acting in reverse. Furthermore, Ocarina of Time employs diegetic music directly as a heuristic device to further game play in that players must successfully memorize short musical themes which unlock special areas and abilities.
The genre of Ocarina of Time is not as straightforward as classic platform title Super Mario Brothers. It is usually classified under "adventure" game category (or the unhelpful action/adventure uber-category), but it clearly has elements from the platform genre (jumping to solve puzzles, exploring space, defeating "bosses" to complete areas of the game) and the roleplaying genre (keeping track of, and purchasing, items, using a map and "levelling up" one's character). The setting for the game is clearly one of fantasy in that one encounters elves, fairies, wizards and humans tensely coexisting in a world powered by magic and potions. The tone of the story and visuals is also more serious than Mario's. Accordingly, the game's mickey mousing effects blend with realistic sound effects. The player-character, Link, does employ a jumping sound effect, but the "boing" sound is replaced by an aggressive grunt. Collecting coins has a similar "ching" which is, at this point, universal in games that involve collecting coins, and success in the game is similarly reinforced by a musical "reward." Ocarina of Time thus employs the same musical structures as Super Mario Brothers but the complexity of the fictional space and the use of music as a literal motivational device involved in gameplay allow the identification of some more interesting musical operations through the eponymous ocarina.
The player-character Link's most important item is his ocarina, which a gamer must learn to play with the controller. In "ocarina mode" (Figure 7), a player presses keys that correspond to notes on the potato-shaped wind instrument. Figure 8 shows the basic five-note scale one needs to unlock key melodies, though additional manipulation from other control buttons makes it possible for a skilled player to reproduce a complete scale. Successfully playing a melody fragment unlocks an animation which completes the melody and performs the specified action when appropriate. Not only do these musical themes flavour the experience of play, they are also reproduced in the backgrounds of several of the game's environments.
Figure 9 shows the melody that must be played to perform "Saria's Song" which permits teleportation to the Lost Woods area of Hyrule. In the Lost Woods, the looping theme music (Object 9) extends and elaborates Saria's song in a straightforward "theme and variations" structure. Thus, the musical heuristic merges with the fictional space of the Lost Woods' theme. Similarly, the "Temple of Time Theme" restates the "Song of Time" with a chorale effect mimicking a cathedral's echoing dimensions. The importance of this blending of functions is also significant in both of these cases because the player hears a melodic figure repeated in the orchestrated underscore that Link will have to "learn" at a later time when he uses the ocarina to unlock the related power. The powers of the melodic fragments cannot be unlocked until the player has reached the appropriate moment in the game, so the atmosphere music also acts as melodic foreshadowing to the extent that often goes unrecognized, and as a result, players report feelings of déjàvu as the melodies they must learn have an eerie familiarity.
The significance of Ocarina of Time's musical score goes beyond the subtle interactions of foreshadowing and heuristic, however. The game engine's sophistication is such that, for the first time, musical phrases can blend seamlessly as Link crosses from one musical signature area into another and, more importantly, as Link encounters a dangerous enemy. Object 10 is a clip of what happens musically as Link approaches an enemy. The blending effect is initially subtle, but blossoms into full-blown "attack" music which, much like the Castle Theme from Super Mario Brothers, heightens the drama of the conflict and alerts the player to more focused performance. The sound engine of the Zelda game demonstrates the same principle of maintaining contiguity, but the role it plays is more flexible and dynamic since the three-dimensional construction of Link's environment often allows a player to choose whether or not to move toward the source of the "danger music," but the same broad structure of concluding a level or world with danger music holds true as Link encounters level bosses and the final enemy, Ganondorf. The application of this safety/danger binary in the fluid schematic of the three-dimensional space of Hyrule exhibits the complexity and richness of this fictional space. The character of the soundtrack is both charming and haunting, and the complexity of the blending and overlapping musical themes invite serious immersion in the game's world.
In survival horror games, the game sequence is for the most part the same as platform and adventure games, but the musical choices are not as straightforward. The classic Silent Hill has a rich and varied soundtrack, but it contains no music in a major key. In fact, the "safe state" is not present at all in the same sense, so the music never settles on or even moves toward any kind of resolution. This is, in part, because the actual play of survival horror games (generically derivative of the "adventure" genre) is not punctuated with the same trials and errors one makes to master emulated feats of super-human ability. The dominant problem of adventure and survival horror games is puzzle solving, and the survival horror game is differentiated from standard adventure games in that armies of zombies and other undead creatures often block the path to puzzles' solutions and the total atmosphere is densely creepy. Other game types involve monstrous enemies – Doom or Duke Nukem, for example, require the straightforward annihilation of enemies – but the limited ammunition and often inefficient camera angles that characterize survival horror make avoiding enemies as much of a priority as fighting them. Therefore, musical scores like Akira Yamaoka's for Silent Hill never have the safe moments of exploration allowed by platform games, and they must sustain a consistent and pervasive mood of terror or apprehension in the player. The adventure genre format, however, often calls for exploration of space as essentially a large scale puzzle, so in Silent Hill the music is always in a degree of "danger state" in order to impel the player through the game's spaces. The mood of the game is crucial to the horrific "feel," but it is also provides motivation by compelling continual progress through the game. The town of Silent Hill is never a safe place: players maintain the game's contiguity by trying to escape Silent Hill, a geographical embodiment of the musical danger state. This embodiment creates an interesting parallel with horror films.
In general survival horror games rely on conventions of horror film sound to effectively create the mood of horror required for the game (echoing effects, screeching violins, dissonant bursts of symphonic noise at "startle" moments, etc.) but a psychoanalytic analysis reveals a shifted trajectory, at least in the case of Silent Hill. Neumeyer and Buhler (2001) write:
Silent Hill does provide a "debilitating loss of centre" for the main character, Harry Mason, and the music is significantly atonal, often eschewing melody at all and utilizing a percussive, "industrial" sound. But the environment itself is the site of "dystopian projection," more so than any of the actual monsters. As Figures 10 and 11 illustrate, the space of Silent Hill undergoes rapid and grotesque physical changes from a foggy, empty town that is otherwise normal to a blood-soaked, nightmarish parody of the same space. This change is always reflected musically as the quietly unnerving throb of the foggy Silent Hill gives way to a cacophonous ringing of metallic noises and atonal chaos. This musical chaos is the only cue to reflect the player-character's psychological state since he is nearly always facing away from the camera and the pre-rendered cut scenes and voiceovers are delivered in as dull a voice-acting performance as any in recent memory.
Another survival horror title Resident Evil: Code Veronica (2000) typifies the displacement of the safe state/danger state binary. Figure 12a shows the player-character, Claire, exploring a hallway in the opening sequences of the game. There are no enemies, so non-diegetic music is silent. The next scene initiates an encounter with zombies (Figure 12b), and enacts the standard danger state accompaniment of rhythmically intense music in a diminished or minor key. In other words, the silence has replaced the safe state music, and the danger music is more intense than similar music in, say, Ocarina of Time. As is the case with horror films, the silence of the first scene puts the player on edge rather than reassuring him that there is no danger in the immediate environment, increasing the expectation that danger will soon appear. The appearance of the danger is, therefore, heightened in intensity by way of its sudden intrusion into silence.
These moments from the opening sequences of Resident Evil: Code Veronica are the first chance for the player to encounter and deal with forces of the undead, but Silent Hill's opening sequences reveal a different approach to breaking the threshold of the supernatural that also reveals an allegiance to horror filmic uses of sound. In Kubrick's The Shining, for example, the music will often rise steadily to a cacophonous crescendo to parallel a character's escalating terror or psychosis, and in Silent Hill a similar effect is created by overlapping musical sequences that are cued as "event triggers" when the player enters progressively horrific spaces of the game.
The introductory full motion video (FMV) of Silent Hill provides the set-up for the story, which has to do with Harry Mason taking his daughter, Cheryl, on a vacation to the resort town of Silent Hill. After a mysterious accident en route, Harry awakens to find himself alone in a mysteriously foggy and strangely abandoned Silent Hill with no sign of Cheryl. The music is faint, atmospheric ambience barely above the clarity of white noise that matches the foggy streets with a "swooshing" sound or a low throb. Harry hears footsteps, and, in one of the eeriest sequences in any game, the player must follow a shadowy figure – who may or may not be Cheryl – who always stays just at the edge of vision. The figure eventually leads Harry into an alley, which enacts the sequence of images and sound clips in Figures 13 to 16. The fixed camera perspectives cause the point-of-view to careen wildly as Harry enters different rooms of the alley, and as the course way becomes suddenly darker, Harry's terror (and the player's) is respectively reflected and dictated by the soundtrack growing in volume and atonal chaos.
Object 13. Sound clip corresponding to Figure 14. Note the “air-raid siren” sound effect which has increased its volume significantly from its minimal presence in the basic ambience. WAV file (obj13.wav; 21 seconds; 457kb)
Object 15. Sound clip to accompany Figure 19. The ascending organ sound begins to provide a sense of key or tonality – since all the sounds so far have been industrial or percussive, and this organ sound is the first "real" organ – but the organ melody ultimately avoids any resolution. WAV file (obj15.wav; 20 seconds; 440kb)
At each successive stage of the alley, the visuals become more nightmarish, and at each stage represented above in Figures 13 to 16, a new voice is added to the soundtrack. Finally, after passing by a few ominous hospital implements and discovering what appears to be a flayed and crucified human corpse, Harry is trapped inside a room with a pair of child-like, knife-wielding zombies. The player has control over Harry, but since Harry has no weapons, he is powerless to fight back and can only run away from the creatures in a tight space. In a horrifying moment, the creatures attack and appear to chew on Harry, and the player must watch helplessly. The anxiety of this moment is heightened by the gruesome visuals, the sound track and by the standard videogame trope of player-character death. The consequence or punishment in an adventure game for allowing the player-character to die is being forced to repeat material that has already been explored, and since the overarching, eponymous goal of survival horror is to survive, actual character death may only occur a handful of times throughout playing Silent Hill as opposed to the thousands of deaths that Mario or Link must endure to conquer their respective kingdoms.
The music that drives the growing terror of this alley sequence leads to an apparent death (i.e., Harry does not really die in the game; this scene leads to an FMV scene of Harry waking up in a diner wondering if what just happened was a dream) uses a filmic technique of building suspense, but the musical metaphor of the sequence mimics the visuals of the environment, the embedded internal experience of Harry, and our own emotional response as the player because the music is non-diegetic. That is, the musical underscore seems to happen "outside" of the world of the story as a device to charge the emotional response to the sequence, and the music is, therefore, acting symbolically from Harry's point of view in that he does not "hear" it. But another feature of the game, unique to Silent Hill, suggests a more complicated possibility for the diegetic/non-diegetic question of musical origination.
Harry is eventually equipped with weapons to fight against the various creatures that he will encounter as he proceeds through his quest to locate his daughter, but his most important tool is a "broken" radio that emits sound of a signature frequency whenever a monster is near. The claustrophobic player perspective and ubiquitous fog or darkness make hearing this radio more important to successful game play than seeing. Once a player is used to the system, she can use Harry's targeting ability to automatically aim at the nearest enemy (whether it is visibly on screen or not) upon hearing the specific noise emitted by the radio. Since most of the enemies will approach from above or behind Harry, a player may not ever see certain enemies, and since the sounds appear gradually and swell to a crescendo as the monster gets nearer, the effect works on the same principle as the alley sequence in the opening of the game. Since this is also a strategic device built into the game and because it merges with the soundtrack though its source is visibly present in the game environment, the radio's sounds again blend a motivational cue with atmospheric sounds of the fictional space.
By combining conventions of both videogame and horror film, the designers of Silent Hill create an experience that is driven musically by the grotesque exaggeration of musical functions familiar from earlier videogames. The safety state/danger state binary of music which drives the motivational function of the music is shifted to correspond to the threatening, intrusive atmosphere of the city. Overall, the music in Silent Hill drives home the unique game play aspects that drive home its status as a classic Survival Horror title.
In this essay, I have sought to explore various functions of videogame music in specific videogames, but these observations are aimed at only a few of potentially dozens of genres of games. While music may not always employ the narrativised safety state/danger state binary, there is probably an element of motivation in nearly every type of game. At least, the function of positively reinforcing game interaction toward the achievement of a game's goal state is common practice. The use of music to characterize fictional spaces in game environments is obviously more relevant in games revolving around quest narratives and not as applicable for, say, Tetris or even The Sims.
Douglas and Hargadon's description of the processes of immersion, engagement and flow in the reception of hypertext and digital narratives provides one potential context for these initial observations about game music. While Douglas and Hargadon focus on the "Fifth Business" as the medium for assisting the user in his mastery of the games scripts, it might be possible to assign music the role of the Fifth Business where it acts as a motivating agent, and in it's characterization of fictional spaces, music may be encouraging the immersion which must compliment engaging with the game's scripts. It may not be necessary to apply this framework, but understanding a game as a narrativised sequence of interactions lends itself to a cognitive model. Identifying discrete functions of game music and sound as complimenting two distinct aspects of game play allows for an association of these functions with other approaches to game studies.
By simultaneously enriching the worlds of videogames and assisting the player's navigating the space of videogames, music is essential to the semantic operations of a videogame. Studying the reception of musical cues in relation to animation and tactile input demonstrates that – though videogames borrow from and adapt filmic musical practices – games rely on important cognitive associations between types of music and interpretations of causality, physicality and character. Furthermore, the problems of identifying diegetic and non-diegetic music in videogames demonstrates the complexity of videogame space and its importance to the play experience and the involvement of the avatar and the player. With these tools and the observations of Bowen, Paul Weir and others as a starting point, the study of music in videogames is off to a good start.
 Wadhams, Nick. (2004) Of Ludology and Narratology. [Online article], 14 February.
 Erard, Michael. (2004) The Ivy-Covered Console. [Online Newspaper], 26 February.
 Some examples of this "paralysis" can be seen in the remarkable volume First Person Shooter: New Media as Story, Performance, and Game where much ink is spilled defending certain approaches to studying videogames. The suspicion that these meta-critical questions still relate to political biases within the academy currently impedes progress the field might be making toward establishing an autonomous, disciplinary position in academics.
 The metaphoric behaviour of game music is that which relates to the game as a story or world. It is the function that draws the player into the experience, giving shape and semantic meaning to that experience. When the constant background music in the classic Super Mario Brothers switches from its sunny major theme to a tense minor theme, the visible environment of the player-character has switched from broad daylight to a subterranean cavern. This switch can be seen as paradigmatic in that the game's syntagmatic structures of play are still in place – Mario must still move from left to right and progress toward the final castle. The metonymic function of game music facilitates the player's accomplishing the goals of the game. To remain with the Mario Brothers example, whatever music is currently representing the environment increases in tempo as the end of the level approaches. This teaches the player to move faster toward the level's completion, and thus enforces the syntagmatic properties of the game by pushing it forward in a contiguous progression.
 Wikipedia, an open-source web database of knowledge, contains an entry that defines the platform genre as follows: "Tradtionally [sic], the platform game usually scrolls right to left, with the playable character viewed from a side angle. The character climbs up and down ladders or jumps from platform to platform, fighting enemies, and often has the ability to gain powers or weapons" (2004a). Definitions of genres in games are inherently problematic, but the Wikipedia definition allows me to temporarily avoid definitional debates. Another entry defines survival horror as "a genre of video game in which the player has to survive an onslaught of undead or creepy opponents, usually in claustrophobic environments in a third-person perspective" (Wikipedia, 2004b).
 Andrew Darley argues that the potential computer games offer for "immersion" (he is using the term in a slightly different sense than Douglas and Hargadon) is a question of degrees. The technology of computer games allows for a better or more convincing exploitation of the normal visual codes we have adopted from earlier media to the extent that computer games can offer a more realistic illusion of being in the space of the game (Darley, 2000).
 "The Sorcerer's Apprentice" episode is not technically "mickey mousing" (even though the scene features Mickey himself) because the music was not composed for the visual animation but the other way around. Still, the original orchestral work was meant to tell the same story that the animation tells, so the sonic effect of the crescendoing melodic figure implies marching in both cases. In other words, "The Sorcerer's Apprentice" is a good example both of the effect of mickey mousing and the narrative potential of music to imply and characterize space and action within that space.
 Chion and Claudia Gorbman (1987) have both drawn up complicated formulas of filmic sound that identify degrees of origination between the simple "black and white" analysis of diegetic and non-diegetic sound, but for the purpose of the present argument, the smooth, unproblematic combination of the two accomplishes the metonymic combination I associate with videogame music.
 In her playful but insightful Picture This: Perception and Composition, Molly Bang (1991) attempts to tell the "little red riding hood" story with as few shapes as possible. A small red triangle represents the main character, for example. Through running commentary, Bang explains how the proximity and relative sizes of other shapes, their colours and location on the page affect the sense of the story. Her "ground up" approach nicely demonstrates some of the conclusions of Heider and Simmel's studies of causality.
 Cohen makes no specific reference to Heider and Simmel, and the specific shapes involved are different. It may simply be that the archetypal love storyline is simply one of the most basic, universal stories we all tell and experience.
 Historically (at least since the 19th century) there has been a divide in classical music between "absolute" and "program" Music. Program music like Smetana's The Moldau (1990) depict non-musical pictorial settings or events; The Moldau musically traces the journey of the Moldau river in the Czech republic, and Berlioz's Symphonie Fantastique (1999) is an autobiography of sorts. By contrast, absolute composed as music for music's sake or "music composed with no extra musical implications" (Feldstein, 1985). Jean-Jacques Nattiez argues that the semantic possibilities and temporal frame of music permit narrative approaches to music (Nattiez, 1990), but such approaches must recognize that music alone relies primarily on syntagmatic continuity.
 In music theory, a cadence is a sequence of two three chords at the conclusion of a musical phrase that "punctuate" the phrase by bringing it to resolution and by connecting it to related phrases by the type of resolution it provides. For example, an "authentic" cadence moves from the dominant chord to the tonic, producing a final, resting position much like a period. A "half-cadence," moving from sub-dominant to dominant, feels "open" or unresolved and would likely appear between two complimentary phrases much as a comma or semi-colon joins two independent clauses in a sentence.
 Many of Wagner's operas assign a musical "signature" to characters that the audience hears when that character appears on stage. Leitmotifs can also interact with one another to mimic the tension of the drama and the interaction of characters and objects, but one of their purposes is to help the audience identify characters as they enter the stage. The stationary audience witnesses a very large number of performers pass through the space of the stage in a Wagnerian opera, but in The Ocarina of Time, the audience travels and the leitmotifs are attached to the stationary environments of Hyrule.
Bang, Molly. (1991) Picture This: Perception & Composition. Boston, Little, Brown.
Belinkie, Matthew. (1999) Video Game Music: Not Just Kids Stuff. 15 December, viewed 23 June 2004, <http://www.vgmusic.com/vgpaper.shtml.
Berlioz, Hector. (1999) Symphonie Fantastique, Op. 14, Cond. Muti, Richard, [sound recording], EMI Classics.
Bessell, David. (2002) What's that Funny Noise? An Examination of the Role of Music in Cool Boarders 2, Alien Trilogy, and Medievil 2 . In: King, Geoff & Krzywinkska, Tanya (Eds.) Screenplay: cinema/videogame/interface. London, Wallflower Press.
Bruner, Jerome. (1986) Actual Minds, Possible Worlds. Cambridge, Massachusetts, Harvard University Press.
Capcom Entertainment. (2000) Resident Evil: Code: Veronica. [PS2 Game], United States: Capcom Entertainment.
Chion, Michel. (1994) Audio-Vision: Sound on Screen. New York, Columbia University Press.
Cohen, Annabel. (1998) The Functions of Music in Multi-Media: A Cognitive Approach. Fifth Annual Conference on Music Perception and Cognition. Seoul National University, Seoul, Western Music Research Institute.
Cohen, Annabel. (2000) Film Music: Perspectives from Cognitive Psychology. In: Buhler, James, Flinn, Caryl & Neumeyer, David (Eds.) Music and Cinema. Hanover, NH, University Press of New England.
Darley, Andrew. (2000) Visual Digital Culture: Surface Play and Spectacle in New Media Genres. New York, Routledge.
Douglas, J. Yellowlees & Hargadon, Andrew. (2004) The Pleasure of Immersion and Interaction: Schemas, Scripts, and the Fifth Business. In: Wardrip-Fruin, Noah & Harrigan, Pat (Eds.) First Person: New Media as story, Performance, and Game. Cambridge, MIT Press.
Erard, Michael. (2004) The Ivy-Covered Console. [Online Newspaper], 26 February.
Feldstein, Sandy. (1985) Absolute Music. Alfred's Pocket Dictionary of Music. Sherman Oaks, CA, Alfred Publishing Co.
Gorbman, Claudia. (1987) Unheard Melodies: Narrative Film Music. Bloomington, Indiana University Press.
Grieg, Edvard. (1994) March of the Trolls, Lyric Suite, Op. 54: No. 4. Cond. Bernstein, Leonard, Sony.
Hee, T., Ferguson, Norman & Leopold Stokowski, cond. (1960) Fantasia. 60th Anniversary Special Edition, Disney.
Iwerks, Ub. (1928) Galloping Gauchos. In Walt Disney Treasures - Mickey Mouse in Black and White 2002. [DVD], Walt Disney Home Video.
Iwerks, Ub. (1929) Skeleton Dance. In Disney Treasures: Silly Symphonies 2001. Comp. Carl Stalling. [DVD], Disney Home Video.
Konami. (1999) Silent Hill. [PlayStation], Redwood City, CA: Konami.
Morris, Sue. (2002) First-Person Shooters - A Game Apparatus. In: Krzywinkska, Geoff King and Tanya (Ed.) Screenplay: Cinema/Videogame/Interface. London, Wallflower Press.
Nattiez, Jean -Jacques. (1990) Can One Speak of Narrativity in Music? Journal of the Royal Musical Association, 115, 240-257.
Neumeyer, David & Buhler, James. (2001) Analytical and Interpretive Approaches to Film Music (I): Analysing the Music. In: Donnelly, K.J. (Ed.) Film Music: Critical Approaches. New York, The Continuum International Publishing Group.
Nintendo. (1985) Super Mario Brothers. [NES], Redmond, WA: Nintendo.
Nintendo. (1998) The Legend of Zelda: Ocarina of Time. [Nintendo 64], Redmond, WA: Nintendo.
Poole, Steven. (2000) Trigger Happy: the Inner Life of Video Games. London, Fourth Estate.
Rogue Entertainment. (2000) American McGee's Alice. [PC], Redwood City, CA: Electronic Arts.
Smetena, Bedrich. (1990) My Fatherland: II. Die Moldau. (T: 111), Cond. Kubelik, Rafael, Deutsche Grammophon.
Stalling, Carl. (2002) Interview with Mike Barrier. Reprinted as "An Interview with Carl Stalling." In: Goldmark, Daniel & Taylor, Yuval (Eds.) The Cartoon Music Book. Chicago, A Capella Books.
Strauss, Neil. (2002) Tunes for Toons: A Cartoon Music Primer. In: Goldmark, Daniel & Taylor, Yuval (Eds.) The Cartoon Music Book. Chicago, A Capella Books.
Wadhams, Nick. (2004) Of Ludology and Narratology. [Online article], 14 February.
Ward, Paul. (2002) Videogames as Remediated Animation. In: King, Geoff & Krzywinkska, Tanya (Eds.) Screenplay: cinema/videogame/interface. London, Wallflower Press.
Wikipedia. (2004a) Platform Game. [Online Encyclopedia], 10 March, Wikipedia: The Free Encyclopedia, viewed 2 April 2004, <http://en.wikipedia.org/wiki/Platform_game.
Wikipedia. (2004b) Survival Horror Game. [Online Encyclopedia], 13 March, Wikipedia: The Free Encyclopedia, viewed 2 April 2004, <http://en.wikipedia.org/wiki/Survival_horror_game.
© 2001 - 2004 Game Studies