Kristine Jørgensen

Kristine Jørgensen is an associate professor at the Department of Information Science and Media Studies, University of Bergen, Norway. She is author of Gameworld Interfaces (MIT Press 2013) and A Comprehensive Study of Sound in Computer Games (Mellen Press 2009). She has also published research on playfulness, game characters, game narratives, and the Norwegian game industry.
Contact information: kristine.jorgensen at

Audio and Gameplay: An Analysis of PvP Battlegrounds in World of Warcraft

by Kristine Jørgensen


This article addresses how audio works as support for gameplay while remaining true to the perceived reality of the game world in World of Warcraft's PvP Battlegrounds. The argument is that the interpretation of game audio is highly contextual, and that the player must understand the specific situation as a whole in order to understand what a specific auditory signal indicates.

Keywords: World of Warcraft, PvP, game audio, gameplay, situation-oriented approach


As any game audio designer or composer can tell, computer game audio is not merely ornamental and mood-enhancing. It also works as support for gameplay, by providing the player different kinds of information that needs to be comprehended in the light of the specific context. This article addresses how audio works as support for gameplay in a collaborative and competitive context while remaining true to the perceived reality of the gameworld. This context is the player versus player (PvP) Battleground scenarios in the massively multiplayer online roleplaying game (MMORPG) World of Warcraft (WoW)(Blizzard, 2004). The primary argument is that game audio is tightly integrated with gameplay in this game setting, not only as an information system and a support for gameplay, but also by providing an understanding for how the game should be played, and how to behave in a specific in-game context. The analysis will demonstrate that although audio is intimately associated with the rule system of the game, it also supports the virtual environment and the fictional setting.

Context is a keyword in this respect, and I will argue that a specific sound cannot be comprehended in isolation, but that the situation in which it is heard always decides the interpretation of the informative content of the sound signal. The following situation from the Arathi Basin Battleground will serve as an introductory example: A player's avatar is guarding the lumber mill to prevent enemies from claiming the valuable resource. Suddenly a characteristic whispering sound is heard — the product of a rogue entering stealth mode. Stealth makes the rogue invisible for enemy players, and allows that avatar to do special attacks. When the sound is heard, it may be interpreted as an alert, but as this article demonstrates, the comprehension of any sound signal is highly contextual. Two situations may be used as illustration: If the player suddenly hears a sound when he knows he is alone, the sound will certainly be alarming by signalling a potential, invisible danger. However, if the avatar is guarding the lumber mill together with a friendly rogue, the sound has a different meaning: It is the positive notification that he has backup in case of an attack.

Figure 1: With his back turned against two rogues, the mage (2) cannot know whether it is the allied (3) or the enemy (1) rogue that enters stealth mode.

In this example we see that the player's comprehension of the situation is crucial, even though the sound signal is identical in the two situations. In order to understand the meaning of the sound signal, the player must not only understand what event generates the sound, and what it means in the given context, he must also understand the strategic importance of the lumber mill resource at the specific point in the game; what it means to lose it and how a loss may affect the overall game. Defensive information must also be understood, such as whether he is able to protect the resource against a rogue or not. In short, the player needs to understand the dynamics of the game — the rule system, how it can be manipulated, and how it affects individual actions — and how audio is integrated into this system. When talking about the relationship between game audio and gameplay, I am essentially referring to this relationship.

More precisely, gameplay is not a feature designed into the game alone, but an emergent aspect of interaction between the game system and the player's strategies and problem solving processes. In short, gameplay is how the game is played, delimited by the game rules, and defined by the dynamic relationship that comes into being when the player interacts with these rules. The gameplay of a specific game can therefore not be fully understood without playing the game and becoming familiar with the game system and its dynamics (Jørgensen, 2007). This understanding is further enlightened by what Salen & Zimmermann call meaningful play (2005). For them, the play of a game is simply what "occurs as players experience the rules of the game in motion" (2004, p. 302). In order for meaningful play to occur, however, actions and outcomes need to be discernible in the sense that they must be communicated in a perceivable way, and integrated into the larger context of the game (2005). "Meaning" is therefore connected to how the game system reacts to player actions, and how player actions make sense in the specific setting and the progression of the game. In our context, meaningful play relates to how the player understands game audio as both discernible and integrated into the game dynamics, by working as an informative system that provides intuitive usability information as well as being tightly integrated with the depicted gameworld. This means that game audio is integrated into the overall understanding of what it means to play the game and that in order to understand what a specific auditory signal indicates, the player must understand the specific game situation as a whole.

The theoretical point of departure is a situation oriented approach to listening and auditory comprehension. This approach colors my own work (Jørgensen 2007a), as well as modern debates on audio as a representational feature in audiovisual media (Langkjær, 2000), auditory display studies (Keller & Stevens, 2004, Walker & Kramer, 2004), and certain researchers from ecological psychoacoustics (Rosenblum, 2004). The situation oriented perspective emphasizes the idea that auditory comprehension is oriented towards interpreting sounds in terms of events instead of in terms of objects. This opens for contextual auditory comprehension since associating sounds with events means identifying the situation in which the sound occurs. This can be contrasted with an object oriented perspective that believes that our perceptual system associates a certain sound with a specific object, thereby suggesting a static and absolute relationship between the sound and the information it provides. This view is advocated by earlier debates on the relationship between sound and image in film theory (see Altman, 1992a; Altman 1992b), certain researchers within ecological psychoacoustics (McAdams, 1993; van Valkenburg & Kubovy, 2004), and by certain game researchers (Stockburger, 2003). Some game researchers take a moderate position by arguing that game audio is closely connected to specific game states (Whalen, 2004) or that it adapts to gameplay (Collins, 2007), but fail to emphasize that audio is tightly integrated with how the game should be played and that context decides the interpretation of a specific sound signal.

The article starts with a brief presentation of Battlegrounds and the player versus player systems in WoW, and the methodological and theoretical background for this work. I will then present an analysis of how game audio works in relation to gameplay, based on how the player interprets its informative role, spatial origin in the game environment, and how sounds work with respect to the magic circle (Salen & Zimmermann, 2004).

The Battleground and the PvP Systems

As a massively multiplayer online roleplaying game, WoW has a player base of more than eleven million players worldwide [1]. Players are organized into different servers that take thousands of players each. All individuals on a given server may interact with one another in different ways. Players configure their own avatar according to parameters such as race, class and gender. Avatars increase their level, which is a measure of how powerful they are, by gaining experience points through killing monsters and completing quests. In order to keep the players busy, WoW is built around different kinds of game content that the players may engage in. One of these is to gain reputation, honor and arena points in player versus player battles (PvP). These points may be spent on buying increasingly better equipment. There are three organized systems for players wanting to fight players from the competing faction: Arenas, Outdoor PvP, and the objects of this study — namely Battleground.

In Battlegrounds, the avatar enters a spatially delimited minigame that allows fights between teams from the two factions, Alliance and Horde, each consisting of 10, 15, or 40 avatars depending on the specific minigame. Avatars are separated into similar level groups, so that avatars within level range 10-19, or 20-29 and so on, go together in the same battleground. All Battlegrounds are competitive and cooperative games, where avatars from one team struggle to reach a common goal against the opposing team. At the time of writing, the player may select between four different Battlegrounds, described shortly below.

Warsong Gulch (WSG) is a game of capture the flag, where ten avatars from each team try to get into their enemy's base to steal their flag and plant it in their home base. The enemy flag can only be planted in the home base when the team is in possession of their own flag, and the team that captures the enemy flag three times is the winner. It is therefore crucial to hinder the enemy to pick up the flag, and to retrieve it whenever it has been picked up by the enemy.

Arathi Basin (AB) is a game over resources collected by controlling a number of bases. The two teams, with fifteen avatars each, compete to secure as many as possible of five bases which will generate resources for the owner. To capture a base, an avatar must activate a banner and hold it for one minute before it starts generating resources. The first team to reach 2000 resources is the winner, and in order to win, it is crucial that the team controls at least three bases most of the time, and reclaims bases whenever they are lost.

Eye of the Storm (EotS) has fifteen avatars on each team, and combines the objectives of WSG and AB. The goal is to reach 2000 points, which are collected by holding a number of bases and by capturing flags. There are four bases in EotS, which are captured by occupying the area around the base for a certain amount of time. The flag may then be collected from neutral middle ground and returned to a home base. Each flag captured generates 100 points.

The last Battleground is Alterac Valley (AV), which takes two teams of 40 avatars. A number of rule changes have affected the game in different ways, but the basic objective is for a team to work its way to the enemy base to kill their non-playing character general, and destroy enemy towers and capturing graveyards on the way. The rule changes have turned AV from a stalemate where the teams clash in the middle of the map for hours, to a fast run of fifteen minutes where both teams move directly to the enemy's base without having to fight other avatars at all. The fast run version was criticized by players for not supporting actual PvP since the game could be won by combating non-playing characters only [2]. Since this research was carried out during the fast run period, AV was left out of study.

Although it is possible to team up and join a Battleground as a full group (commonly called a premade) in all scenarios except from AV, most players join alone or together with a few friends (commonly called a pick-up-group (PUG)). Consequently, players have no guarantee that they will group with players who are willing to follow orders, or are aware of — or interested in — following, specific tactics for winning. It is, however, important to know the rules and the dynamics of the scenario in order to win. For instance, leaving a base unguarded in AB is likely to result in the opponent taking control over it, and is not a good choice of action if the teams are controlling two bases each and are struggling to gain control over the last one. On the other hand, if one team is in complete control by controlling four or five bases, leaving one base may not pose a risk against victory. Understanding what behaviour and which actions are preferable in different situations is crucial to playing Battlegrounds, and players need to find a balance between teamwork and following individual whims. Moreover, it is important to understand the rules and game dynamics on both a macro and a micro level. The descriptions of the Battlegrounds above are of the rules on a macro level, or formal descriptions of the win conditions and broad features of the environment that are common to all players. However, the micro level is equally, if not even more, important. I am referring to the rules that affect players' choices of actions on an individual level — individual spells and abilities available for each character class, and the player's comprehension of how these should be utilized for the better outcome in combat situations. Audio is integrated into gameplay on both macro and micro level, but it should be emphasized that it has special relevance for choices taken on a micro level, since game audio is in particular influential for the player's individual choices and behaviour.

There are strong reasons for choosing Battlegrounds as case in this article. They demonstrate a game situation with important emergent qualities due to the fact that all allies and opponents are human players. With the exception of AV, Battlegrounds are also free from any pre-scripted events, and focus on simple rules set around a few specific objectives in a delimited environment. This allows us to study game audio from a purely functional perspective, where the informative roles of audio step into the foreground. It is important to note, however, that this does not mean that the fictional setting and gameworld disappear. Battlegrounds also demonstrate how audio seamlessly integrates functional aspects with the sense of presence in the gameworld.

The empirical point of departure is a study of video recordings of four empirical players' performance in AB, WSG and EOTS in July 2007. Running the video capture software Fraps during play, you are allowed to record the same audiovisual information that the player sees on screen. Recordings were done with four level 70 avatars of different races, classes and factions on two different European servers. All distinguishable sounds and corresponding situations were transcribed and categorized according to signal type, their function within the game system, spatial origin and generator. As we will see, the results demonstrate that audio and gameplay are closely integrated, and that the specific function of any sound signal in the game is highly dependent on contextual interpretation.

Integration of Game System and Virtual World

Game audio remains true to the perceived reality of the gameworld at the same time as it supports gameplay. The reason is that most modern computer games implement the game system into a virtual world. When these features are combined, a powerful frame of reference comes into being that combines elements from both sides into an intuitive and functional unity. Game system information is baked into the virtual environment, creating a situation where the usability information of elements such as audio becomes integrated with the sense of presence in the virtual world (Jørgensen, 2007). In terms of audio in Battlegrounds, this happens by the use of different techniques.

One technique is connected to the use of sound signals. In audiovisual environments such as games and films, many sounds correspond to real world sounds. Such sound signals are what auditory display studies calls auditory icons (Walker & Kramer, 2004). In fantasy or science fiction settings with magic or futuristic equipment, however, this kind of replication is not always possible. In such cases it is common to use what auditory display studies calls earcons (Walker & Kramer, 2004); artificial noises, sound bursts or musical phrases. In addition to using these sound signals, WoW often combines them into hybrid signals that are partly recognizable as a real world sound. An example is a mage casting a fire spell: the sound starts with the crackling sound of fire, and ends in a whoosh that moves from a high-pitch tone to a lower. Here we also see how the audio designers have utilized metaphorical relations by utilizing movement in pitch that suggests some kind of transition, or movement in space (Keller & Stevens, 2004; Wilhelmsson, 2001). Using sound signals that have a connection to real world environments while being stylized to fit the game situation creates the unique opportunity to make auditory usability signals seem natural to the gameworld.

Another technique that helps merging the game system and the virtual world is to blur the boundary between the gameworld environment of the avatar and the real world environment of the player. WoW does this by providing system information via channels natural to the game environment. This blurring is the result of the combination of communication on two levels: One communicative situation takes place between the game system and the player. The player is a real world person situated outside the gameworld as a computer user, expecting the computer and game systems to provide meaningful feedback to his commands. The second communicative situation is between the virtual gameworld and the avatar. This relationship exists on the diegetic level, which means that the gameworld should be understood as a continuous, but virtual, reality where events take place; and the avatar should be seen as an individual existing in that world. In computer games, these frames of communication are combined because the player is granted direct access to the gameworld through the avatar. Communication thus moves across the boundary of the diegetic world, questioning the border between the avatar's world and the player's world. I call this emerging frame of communication transdiegetic (Jørgensen, 2007a; 2007b). Transdiegetic communication merges communication from the game system with communication from the gameworld into a frame of reference that has usability value at the same time as it upholds the sense of presence in the gameworld. An example from WoW is verbal responses like "It's not ready yet", produced by the voice actor representing avatars of a certain race and gender. These responses provide the impression of being produced by the avatar as communication to the player operating the computer, thereby suggesting that the avatar as a fictional person in the gameworld is aware of the existence of the player in the real world. Although this does not make any sense when interpreting the gameworld as a continuous reality, it makes perfect sense when we see the verbal message as a response to a player action from a usability perspective. Another example is found when the player opens the game map followed by the dry crackling sound of stretching out crumbled paper. One can easily imagine that maps in this world would be on a parchment that produces this kind of sound when stretched out; also, the information gained from looking at it is relevant for player behaviour in the gameworld. However, the map is not part of the gameworld, but a feature of the graphical user interface. From the point of view of usability, the sound is a response to the player that the map menu has been opened. In the first example, I call the sound internal transdiegetic because the sound seems to have a diegetic source within the gameworld (the avatar) that communicates to an entity external to the gameworld (the player). In the second example, however, the sound is external transdiegetic because a source that does not exist within the gameworld (the map) provides information that is directly relevant for what is going on internally in the gameworld (Jørgensen, 2007a; 2007b).

In order to understand how the techniques above influence gameplay, the concept of the magic circle is helpful. With reference to Johan Huizinga, Salen & Zimmermann point out that games are defined by a magic circle (2005), a conceptual or physical frame of reference that separates the actions of the game from real world actions. The player may step in and out of this border at will without ever being in doubt which side he is on. The techniques above illustrate how game audio plays with this magic circle, and even utilizes it as a separate frame of communication that binds together what is positioned outside and inside the game. The fact that players know about this border and what belongs on either side is the reason why hybrid sound signals and transdiegetic communication are accepted and work so well. Transdiegetic communication does not only invite players into the gameworld without leaving the real world behind; it also invites players to take actions that affect that gameworld. Hybrid sound signals create the base for the transdiegetic communication by merging real world sounds with stylized artificial sounds, thereby enabling the magic circle to become more transparent.

The Role of Audio in a Gameplay Context

In this section we will take a look at what kind of information audio provides and how it affects the player's choice of actions and understanding of the situation. In WoW, sounds may be connected to usability explicitly, and thus provide information that directly responds to or demands player action. In other cases, sounds have a more general informative function where they support player orientation or help the player identify different situations and states (Jørgensen, 2007a). Note that a sound signal may have more than one function, and the specific situation decides which role is the most prominent at any given time. It is therefore not possible to categorically identify a certain sound signal as related to one specific informative function. An example is the situation when an avatar casts a friendly spell on someone, also known as buffing. The sound of the buff has several functions: It may be a responsive signal that confirms an avatar's casting of the buff; it may identify a change in avatar state; and it may orient that avatar with respect to other avatars. Which of these functions is most important, however, depends on the situation and the player's subjective relationship to what is going on. It is important to understand whether the sound is related to an event that directly influences one's own avatar or not. If the player's avatar is being buffed, the most important role of the sound is to identify a change in avatar state, and the orientation with respect to the presence of an ally is secondary. If the player's avatar is buffing another avatar, however, the primary role of the sound is as a response. This demonstrates the importance of understanding the context of a specific sound before being able to understand which of the different informative functions is most relevant at a specific point in time.

In terms of orientation in Battlegrounds, audio may provide information about temporal and spatial issues related to whether individual abilities are ready for use, as well as the relative distance to objects and the presence of other avatars. Verbal responses such as "I can't cast that yet" and "It's too far away" demonstrate temporal and spatial orientation respectively, while any sound produced by another nearby avatar signals presence. Concerning identification, audio has a central role related to changes in game state and player state. In Battlegrounds, fanfares are important auditory tools in informing the player about changes in game state, for instance when someone picks up or brings home the flag in WSG, or when a base has been assaulted or claimed in AB. These sounds signal changes in game state by notifying the players about balance adjustments between the teams. Audio also signals change in player state every time the avatar's properties are modified. An example is the screams and changes in ambient sound that signal an avatar's death and respawn.

Concerning action related audio, Karen Collins separates interactive from adaptive audio (2007). Interactive audio are sound events occurring in response to player action, while adaptive audio reacts to events in the environment. Although she does not emphasize it, adaptive audio may affect the player's choice of action. However, the audio is proactive by demanding evaluation or action on part of the player. Following this, interactive audio is reactive in that it occurs as an immediate response to player action (Jørgensen, 2007). Auditory display studies interpret proactive sounds as signals of urgency. Their role is to attract a person's attention, and can be separated according to different priority levels based on whether they demand immediate attention or evaluation only (McCormick & Sanders, 1986; Sorkin, 1987; Walker & Kramer, 2004). Proactive sounds evaluated as high priority are related to situations with immediate effect on the player, exemplified in Battlegrounds as attacks. Low priority proactive sounds, on the other hand, are related to situations that do not demand immediate attention, but where sound provides the player with information about some event or change of state. We may call low priority sounds notifications, and separated them into negative, positive or neutral depending on whether they signal a setback, a bonus, or neutral information for the player's team. Reactive sounds are different kinds of responsive signals that appear immediately after a player action to ensure the player that the action is registered by the system. Responsive sounds may also have different value, and in Battlegrounds they are either confirmations or rejections of an attempted action. The sound of a gun shot immediately after the avatar has fired a weapon is an example of a confirmation, and the verbal message "I can't cast that yet" heard immediately after trying to fire a spell is a rejection that signals failure.

The different roles of audio identified above will make the starting point for analysing how audio influences gameplay in Battlegrounds.

Player Interpretation of Audio in Context

Listening is a complex cognitive activity. A listener often needs to make sense of situations where there are a lot of simultaneous sounds (Walker & Kramer, 2004). In such situations, it is crucial to be able to identify and interpret which of the occurring sounds are relevant, and which can be ignored as noise. Such complex cases make understanding the situation as a whole crucial, and the meaning and reference of each individual sound signal becomes secondary. The combat-oriented Battlegrounds is an example of such a complex situation, where not all sounds have equal importance for the player. Players do not need to know how a specific named spell sounds like — it is enough to understand whether one is being attacked, or a friendly avatar is attacking an opponent (Jørgensen, 2007a). An important feature at work that explains why we are able to attend to the relevant sounds in chaotic situations is figure-ground segregation; the ability of the human brain to group and organize perception into background and foreground information (Valkenburg & Kubovy, 2004). When a fanfare signals the assault of a base in AB at the same time as the player is engaged in defending another base, the player automatically groups and filters all present sounds according to relevance. The player is therefore allowed to ignore the fanfare and attend to the most urgent sounds, which in this situation are responsive sounds from his avatar and the interface, and urgency sounds related to the enemies he is fighting. However, if the player is in a less urgent situation defending a base, the fanfare may move into the foreground compared to responsive interface sounds. This also explains why the functional roles of sounds are judged with different urgency in different situations even though the sound is exactly the same.

When the player makes meaning out of sound in context, the interpretation of what generates the sound is crucial for understanding what a specific sound communicates. Notice that the generator of a sound is not the same as the source of a sound. While the source is the object that physically (or virtually) produces the sound; the generator is what causes the event that produces the sound. Thus, when an avatar is hit by an enemy, the source of the sound is that avatar, but the generator is the enemy. To understand this difference is essential to the player's interpretation of the sound, since the functional role of a sound is dependent on the understanding of the situation as a whole, including what generates a certain sound (Jørgensen, 2007a). In Battlegrounds, there are five audio generators. These are the player, allies, enemies, game system, and gameworld. Note that this is a player-centred perspective where audio is understood strictly from the point of view of an individual player. A sound that is enemy generated for one player is therefore player generated for that enemy. In this context, we also need to decide whether the sound originates from within the depicted gameworld or not. In Battlegrounds, enemy and ally generated sounds are always diegetic. System and player generated sounds, however, may have both diegetic and transdiegetic origin. This means that diegetic sounds have a more variable meaning attached to them, and that they may be more difficult to identify than transdiegetic sounds. Context therefore tends to be more important in connection with diegetic than with other sounds.

Sounds generated by the player, allies, enemies, and the game system are all examples of dynamic audio (Collins, 2007) produced by events and actions in the gameworld and directly relevant for gameplay by guiding player actions. Gameworld generated sounds, however, are non-dynamic by having no such direct relevance. Interpreting a sound as generated by the gameworld, the player dismisses the sound as having secondary relevance for his choice of actions. An example of gameworld generated sounds in Battlegrounds is the sounds of wood chopping at the lumber mill resource. The sources of these are non-interactable non-player characters, and the inclusion of the sound has no operational function besides identifying the lumber mill and supporting the sense of presence in the gameworld. It is, however, important for the player to be able to identify these as gameworld generated sounds in order to understand that they have no proactive or reactive relevance for his actions. Having no direct relevance for gameplay, gameworld generated sounds are left out of the analysis.

Below I will show how establishing a sound's generator decides how the player interprets the meaning and functionality of a particular sound. We will see that the usability value of a sound changes according to what generates the sound. We will also see that when a sound provides several kinds of information simultaneously, the generator and the situation will decide which kind of information is the most important. I will discuss the different generators with analytical reference to their spatial origin, what kind of information they provide in terms of usability, orientation and identification. The analysis is summed up in one figure for each sound generator, and the most frequent informative role is emphasised in each case.

1. Player Generated Sounds

A sound generated by the player is a sound caused by player action. If the player has not actively been involved in the production of the sound, he is not the generator of the sound. A player may therefore be the source of a sound without at the same time being its generator. Note that the player's avatar in many cases is the source of player generated sounds due to the inevitable link between player and avatar in the game. The most important informative role of player generated sounds is to provide usability information, or more specifically to provide response since they always seem to appear immediately after a player action. Player generated sounds also provide spatial information, and sometimes also temporal and avatar state information. In most situations, however usability information stands forward as most crucial.

Figure 2: Overview of player generated sounds in Battlegrounds.

A player generated sound is produced by all player commands, and can as naturally occurring in the depicted game environment, all players hear them regardless of whose avatar is the source of the sound. Diegetic sounds are generally also tightly integrated with the gameworld, either by being auditory icons adopted from the real world or hybrid signals, or by conforming to popular culture conventions (i.e. the conventions of what magic sounds like). If the avatar casts a spell of magic, differences in the characteristics of the sound's offset provide a response that either confirms or rejects his attempted action. It also provides information about the relative presence and distance to the target of the spell. For instance, if the spell successfully goes off, the sound fades out with a whoosh, but if it is interrupted, it ends abruptly in a hollow whistle, also followed by a verbal transdiegetic signal such as "I'm out of range".

Figure 3: The mage selects the spell "scorch" from the menu, accompanied by a short "chug" in response.

Figure 4: A crackling sound is heard while the spell gets ready for casting.

Figure 5: The targets jumps off the cliff, the spell fails, followed by the verbal rejection: "I'm out of range".

Transdiegetic sounds have an ambiguous relationship to the gamespace, but still provide information that relates to player actions and the game situation. When player generated, they tend to provide orientation information related to space and time. In WoW, player generated sounds are external transdiegetic when their source is not an object in the gameworld, such as interface clicks that appear in response to a player command, or when the player targets another avatar. The targeting response also provides spatial information by telling the player the relative distance to the targeted avatar. Internal extradiegetic sounds, on the other hand, have a naturally occurring source in the gameworld, but cannot be heard by other players. These tend to provide rejection responses, such as the avatar complaining, "I cannot attack that yet", or "it's too far away". These responses also provide temporary and spatial information, which often is equally or more important than the usability information provided.

In WoW and Battlegrounds, extradiegetic sounds can only be player generated. When extradiegetic, a sound lacks anchorage in and has no direct influence on events in the gameworld. Extradiegetic responses to player actions are typically related to game options menus, quest logs, etc. When a player opens his key bindings menu for instance, there is a short click, which is limited to usability responsive information only.

2. Sounds Generated By Enemies and Allies

Sounds generated by allies and enemies are both diegetic by having a perceived natural source in the gameworld (i.e. other avatars), and by being heard by all avatars within listening range. Common for enemy and ally generated sounds is that they are produced externally from the player's perspective, by being detached from the player's own actions and emerging from the gameworld. It is crucial to decide whether a sound is generated by an enemy or ally, as the two tends to have different consequences for the player. Characteristic of enemy and ally generated sounds is that the sound files used are the same, and they may therefore be hard to distinguish. This is in particular the case with sounds produced offscreen. In order to make an accurate interpretation of sound signals produced offscreen, the player must have an overview of the situation. This goes beyond mere identification of the sound as ally or enemy generated. The player must evaluate whether or not the situation is in the favour of the allied team, and what kind of consequences the event connected to a certain sound will have for his and his team's position.

Both enemy and ally generated sounds carry spatial information related to relative presence, and may also provide information about changes in state of avatars. However, as the grey boxes in figures 6 and 8 demonstrate, enemy generated sounds distinguish themselves functionally from ally generated sounds in terms of usability, in the sense that enemy generated sounds tend to provide a higher degree of urgency than ally generated sounds. In general, ally generated sounds have stronger focus on spatial and identifying information than on usability.

Enemy generated sounds, on the other hand, tend to downplay the orientational and identifying roles.

Figure 6: Overview of enemy generated sounds in Battlegrounds.

Figure 7: A shadow priest is targeted by a shaman's Frost Shock spell, which is accompanied by a crackling woosh.

In terms of usability, enemy generated sounds are either proactive urgency signals or notifications, and they tend to provide negative information to the player. The reason is that all actions that the enemy takes are potential threats towards the player and his team. When the enemy generates proactive urgency information, this information positions itself towards the most critical end of the priority meter. When evaluated as the highest priority, the sound belongs to a situation with immediate relevance to the player, such as an enemy hitting the player. When evaluated as medium priority, the sound signals stress or conflict but without a need for immediate attention. This is the case when an enemy deals damage over time to the avatar. The priority levels should not be understood as absolutes, but as different points of a continuum, not least because context decides the interpretation of a signal's degree of urgency. Negative notifications could therefore also be seen as part of this continuum. However, notifications distinguish themselves from urgency messages by not demanding an evaluation on part of the player. Instead, notifications relate to player action neither reactively nor proactively, but are mere information about a certain situation or a state. All enemy generated notifications in Battlegrounds are negative, by providing information about enemy activities in the game environment. An example is when the player hears nearby combat that he does not partake in.

Figure 8: Overview of ally generated sounds in Battlegrounds.

Figure 9: An allied warrior is charging nearby enemies with a scream and a wooshing sound.

Ally generated sounds have no proactive or reactive relation to player actions, and provide therefore notification information to the player. Even though allies are important in accomplishing the final goal of a Battleground, ally actions rarely have direct influence on an individual player's actions. The exception is when an ally targets the avatar with a friendly spell, but in such cases we are talking of a positive notification. The sound accompanied by a flash of light provides the player with information that an ally has boosted the avatar's abilities, whether it is a statistic such as stamina or intelligence, or by increasing its health level. Thus, this information is primarily a message about a positive change in player state. Following this reasoning, ally generated negative notifications provide information with negative relation to the avatar and the team's position. An example is when an ally breaks an ability that was meant to neutralize an enemy avatar. When an enemy avatar is trapped in a hunter's Freezing Trap, that avatar is immobile for some time. However, if someone from the allied team deals damage to the trapped enemy, the trap breaks with an accompanying negative notification sound. But also here the identifying role is most important, since the sound refers to an event where a certain enemy is no longer controlled. Ally generated neutral notifications are information that the player values as neither negative nor positive, but as non-biased information about an event or situation. An example is audio produced when allies mount their horses or move around. The most important informative role of such sounds is to provide spatial information, by informing that allies are present.

3. System Generated Sounds

In Battlegrounds, system generated sounds are the most complex category, and stretches beyond the border of interface-oriented and extradiegetic sounds. System generated sounds are both diegetic and transdiegetic, and avatars may be identified as the source in certain cases. This makes system generated sounds sometimes hard to identify. All messages not generated as the result of a specific command by any player should be understood as system messages, and they exist to provide information about states that no avatar generated sound can provide. These are never reactive since the player does not actively produce them; instead they provide notifications and proactive messages. Although these sounds provide notifications and proactive information, the usability role is commonly not their most important functional role. For the player, these messages provide primarily information about changes in avatar state or game state. This is illustrated by the grey box in figure 10.

Figure 10: Overview of system generated sounds in Battlegrounds.

Most system generated sounds are transdiegetic, by having invisible or unclear sources in the gameworld, and/or by communicating to the actual player situated in real world space instead of characters in the gameworld. While transdiegetic sounds often can be identified as having an internal or external origin with respect to the gameworld, system generated sounds are often situated in a more absolute transdiegetic position with clearly defined boundaries. Such sounds are heard by all avatars in the specific battleground and are therefore hard to dismiss as extradiegetic. They are, however, also hard to dismiss as diegetic since they do not have a clear source within the gameworld. In Battlegrounds, these transdiegetic sounds are fanfares played as notification of a change in game state, such as the fanfares that signal an assault on a base or a flag pickup.

Figure 11: When a base is captured, this is accompanied by a whoosh from the base itself as well as from an omnipresent fanfare.

Transdiegetic system generated sounds in Battlegrounds may also be positioned from an external perspective, but these only appear in a special situation: when the player accepts the avatar's release to the graveyard after being killed, and when the avatar respawns. Immediately after the avatar is dead, a message box opens, asking the player to "release spirit". The box is accompanied by a metallic click. A similar click is played when the avatar respawns. The sounds are external since they do not have any source within the gameworld, and are connected to the interface menu, and they are transdiegetic because they have a specific relevance for gameplay and what happens within the gameworld. Together with the message box, the sound provides information about a change in avatar state by telling the player that the avatar again is available for making a difference in the game.

There are also a few diegetic system generated sounds. The sources of these are typically avatars, but since the sounds are not the result of a player command or action, the player is not the communicative force behind the signal. This means that the player cannot be seen as the generator of the sound, but that the sound is generated by the system for informative purposes. When an avatar targets an enemy with a spell, the player is the generator of that sound; but when the avatar reactively screams from taking critical damage from an enemy's hit, the avatar is the source of the scream, but not, however, the generator of the sound. Neither is the enemy that deals the damage, even though the critical attack is what caused the sound. The scream originating from the damage taken only occurs when someone is critically hit, and the sound is therefore a signal about a specific event only occurring randomly. Since players cannot intentionally decide to do critical damage, they cannot be the generator of the scream. Instead this is a system generated message informing the player that a certain game rule has been exerted. That game audio reflects the game mechanics in this way emphasises the fact that audio provides important information relevant for gameplay and player behaviour in the gameworld.

Conclusions and Summary

Although the purpose of this article is to demonstrate that game audio has an important role in relation to gameplay, this does not mean that visual information is less important. On the contrary, auditory and visual information work together in creating an understanding of every situation. In context, the two sensory processes provide different information that the player must interpret as a whole in order to make sense of what a specific situation means (Maasø, 1994; Jørgensen, 2007a). However, since the auditory and visual systems have different functional properties, they are specialized in processing different kinds of information. Auditory information has therefore certain advantages over visual information in certain situations, and it has been the aim of the article to discuss how auditory information specifically contributes to player interpretation of gameplay situations.

As a means of communication, audio is particularly useful for providing information when the visual system is restricted. Since it does not require the listener to be oriented in a specific direction, audio has the advantage of being able to provide information about events taking place out of line of sight, and when the visual system is busy with other tasks. In complex situations, audio may also help the player pick up more information than the visual system could do alone (Heeter & Gomes, 1992; Jørgensen, 2007a; Jørgensen, 2008; Kramer, 1999). This is important for computer games where the three-dimensional environment allows events to happen offscreen, and where action and combat create chaotic situations where visual information may be hard to grasp. Another important point is that while visual information can be shut out by closing the eyes, audio has no equivalent shut-down mechanism. Audio is therefore an omnipresent feature which is easy to forget is present at all (Jørgensen, 2007a). This makes audio especially suitable for communication that plays with the magic circle and that does not fully conform to ideals of realistic situated information.

Ecological psychoacoustics argues that all auditory cognition is dependent on the specific situation in which listening takes place. Since auditory perception has evolved to deal with sounds in context in natural environments, our auditory system is attuned to interpreting and filtering sounds relation to specific contexts (Neuhoff, 2004). In this sense, the processing of auditory information never happens in isolation, but in interaction with an interpretation of the situation as a whole. This article has analyzed how players interpret sounds in a gameplay context based on what generates the sound, and on how the player understands the specific situation and his position within it. In order to make a conclusion, I would like to sum up the different ways and on what levels auditory information affects gameplay. We have seen that game audio directly relates to player actions in a reactive and a proactive way, by providing immediate feedback to player commands and by providing urgent information that the player needs to evaluate. Audio is also relevant for player behaviour by orienting the player in the game environment, and by identifying changes in game state and in avatar state. However, the proactive and reactive auditory information is connected to the usability of the system in the same way as similar sounds do in virtual and physical interfaces, and there is no clear indication to how this information relate to understanding the game mechanics and the dynamics of the system. First of all, an important point is that the specific function of usability oriented sounds in relation to gameplay becomes clearer as the player becomes more familiar with the dynamics of the game. Not until then will a player be able to fully understand how a specific sound means in a given situation and what kind of event it refers to. However, there are certain sounds that have a particularly close connection to gameplay, and which needs specific attention. System generated sounds are generated by the system to provide information that any avatar cannot produce on its own, and carry information directly connected to game rules as well as game and player states. System generated sounds provide information concerning the execution of game rules that cannot be fully represented as existing and real in the virtual gameworld. Attending to and understanding these sounds provide invaluable gameplay information to the player. Also, the mere fact that they are present puts emphasis on what is viewed as important information from the designer point of view, and should be seen as a hint from the designer of how to play the game. System generated sounds are also connected to understanding the collaborative and cooperative gameplay elements of Battlegrounds. The game balance and the dynamics of the individual Battleground being played are understood through game state auditory information: system messages in the favour of one team only informs about a severe imbalance in the current match, while the constant and frequent occurrence of system messages in the favour of both teams informs of a balanced, but unstable match.

Ally and enemy generated sounds also have a direct role to understanding cooperation and collaboration in Battlegrounds. Distinguishing between the two kinds is learning that they have a special function in relation to gameplay. Once the player has interpreted a sound as ally generated, that player has defined the sound as relevant for cooperation. The opposite goes for enemy related sounds: once interpreted, they are identified as conflict related. Thus, earning what generates a specific sound is learning important gameplay elements, in the same way as learning how to play a game is learning what the different auditory signals mean.


Altman, Rick. (1992a). Introduction: Four and a Half Film Fallacies. In Altman (ed.), Sound Theory Sound Practice. NY, London: Routledge.

Altman, Rick. (1992b). Sound Space. In Altman (ed.), Sound Theory Sound Practice. NY, London: Routledge.

Blizzard Entertainment (2004). World of Warcraft. Vivendi/Blizzard.

Collins, Karen (2007). An Introduction to the Participatory and Non-Linear Aspects of Video Games Audio. In Hawkins, Stan & John Richardson (eds.), Essays on Sound and Vision. Helsinki : Helsinki University Press (forthcoming). Retrieved Oct 30, 2008 from

Heeter, Carrie & Pericles Gomes (1992). It's Time for Hypermedia to Move to Talking Pictures. In Journal of Educational Multimedia and Hypermedia, Winter 1992. Retrieved Oct 30, 2008 from

Jørgensen, Kristine (2007a). 'What are Those Grunts and Growls Over There?' Computer Game Audio and Player Action. PhD dissertation. Dept. of Media, Cognition and Communication, Copenhagen University.

Jørgensen, Kristine (2007b). On Transdiegetic Sounds in Computer Games. In Fetveit, Arild & Gitte Stald (eds.), Northern Lights No.5, Vol.1.: Digital Aesthetics and Communication. Intellect Publications.

Jørgensen, Kristine (2008). Left in the Dark: Playing Computer Games with the Sound Turned Off. In Collins, Karen (ed.), From Pac-Man to Pop Music. Ashgate.

Keller, Peter and Catherine Stevens (2004): "Meaning from Environmental Sounds: Types of Signal-Referent Relations and Their Effect on Recognizing Auditory Icons", in Journal of Experimental Psychology: Applied. Vol. 10, No. 1. American Psychological Association Inc.

Kramer, G., B. Walker, T. Bonebright, P. Cook, J. Flowers, N. Miner, J. Neuhoff. R. Bargar, S.Barrass, J. Berger, G. Evreinov, W. Fitch, M. Gröhn, S. Handel, H. Kaper, H. Levkowitz, S.

Lodha, B. Shinn-Cunningham, M. Simoni, S. Tipei (1999). The Sonification Report: Status of the Field and Research Agenda. Report prepared for the National Science Foundation by members of the International Community for Auditory Display. ICAD, Santa Fe, NM.

Retrieved Oct 30, 2008 from

Langkjær, Birger (2000). Den lyttende tilskuer. Perception af lyd og musik i film. Copenhagen: Museum Tusculanums Forlag.

Maasø, Arnt (1994): Lyden av levende bilder. IMK report no. 14 from Department of Media and Communication, University of Oslo.

McAdams, Stephen (1993). Recognition of Sound Sources and Events. In McAdams, Stephen and Emmanuel Bigand (eds.), Thinking in Sound: The Cognitive Psychology of Human Audition. Oxford, New York: Oxford University Press.

McCormick, Ernest J. and Mark S. Sanders (1986). Auditory, Tactual and Olfactory Displays. In Human Factors in Engineering and Design. Singapore: Mc Graw-Hill.

Neuhoff, John G. (2004). Ecological Psychoacoustics. Introduction and History. In Neuhoff, John G. (ed.), Ecological Psychoacoustics. London: Elsevier Academic Press.

Rosenblum, Lawrence D. (2004). Perceiving Articulatory Events: Lessons for an Ecological Psychoacoustics. In Neuhoff, John G. (ed.), Ecological Psychoacoustics. London: Elsevier Academic Press.

Salen, Katie & Eric Zimmermann (2004). Rules of Play. Game Design Fundamentals. Cambridge (Mass.): MIT Press.

Salen, Katie & Eric Zimmermann (2005). Game Design and Meaningful Play. In Raessens, Joost & Jeffrey Goldstein (eds.), Handbook of Computer Game Studies. Cambridge (Mass.): MIT Press.

Sorkin, Robert (1987). Design of Auditory and Tactile Displays. In Salvendy, Gavriel (ed.), Handbook of Human Factors. New York, Chichester, Brisbane, Toronto, Singapore: John Wiley & Sons.

Stockburger, Alex (2003). The Game Environment from an Auditive Perspective. In Level Up: Proceedings from DiGRA 2003. Retrieved Oct 30, 2008 from

Van Valkenburg, David and Michael Kubovy (2004). From Gibson's Fire to Gestalts: A Bridge- Building Theory of Perceptual Objecthood. In Neuhoff, John G. (ed.): Ecological Psychoacoustics. London: Elsevier Academic Press.

Walker, Bruce N. and Gregory Kramer (2004): "Ecological Psychoacoustics and Auditory Displays: Hearing, Grouping, and Making Meaning", in Neuhoff, John G. (ed.), Ecological Psychoacoustics. London: Elsevier Academic Press.

Whalen, Zack (2004). Play Along: An Approach to Videogame Music. In Gamestudies, Vol. 4, Issue 1. Retrieved Oct 30, 2008 from


[1] Press release, Oct 28, 08. Retrieved Oct 30, 08 from

[2] Attitudes presented in the "HordePvP" channel of Argent Dawn (EU), autumn 2007, and in the official Player vs. Player forum, i.e. the thread "Alterac Valley". Retrieved Nov 15, 07 from

©2001 - 2008 Game Studies Copyright for articles published in this journal is retained by the journal, except for the right to republish in printed paper publications, which belongs to the authors, but with first publication rights granted to the journal. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.