Metadata for Digitally Distributed Video Games at the Seattle Interactive Media Museum

Jin Ha Lee, University of Washington, Information School, USA, Rachel Ivy Clarke, University of Washington, USA, Andrew Perti, Seattle Interactive Media Museum, USA


As video games play an increasingly widespread role in society, museums and related institutions are launching various initiatives to catalog, preserve, and exhibit video games and related media. Collecting and preserving physical video games comes with challenges. Furthermore, with the advent of smartphones, tablet computers, and social networking, many games are increasingly released digitally—having no physical component and accessible only through download or streaming from “the cloud.” Digitally distributed games are problematic to collect, catalog, and preserve.

This paper discusses the challenges the Seattle Interactive Media Museum (SIMM) and other museums face when attempting to catalog digitally downloadable, streaming, and “cloud-based” media as part of their collections. Using the metadata schema for physical games at SIMM as a springboard, each metadata element was analyzed and revised to accurately represent digitally distributed games. The schema was then tested through the actual cataloging of samples of digitally distributed games in order to determine where breakdowns occur. In addition to revealing the necessary information elements for digitally distributed games, our work offers insight for other institutions facing the challenge of collecting and cataloging born-digital materials in other domains and media.

Keywords: video games, interactive media, game apps, born-digital, digital distribution, metadata

1. Introduction

Video games play a widespread role in society. In addition to economic growth (NPD Group, 2013), games are increasingly used as artifacts of cultural study. Over 350 U.S. colleges and universities offer degrees or coursework in video game design (ESA, 2013b), and academic organizations such as the American Culture Association and the Society for Cinema and Media Studies consider video games worthy of cultural study. Cultural heritage institutions support initiatives to catalog, preserve, and exhibit traditional video games and related media. The Library of Congress (Owens, 2012), National Videogame Archive (Crookes, 2012), and Museum of Play (National Museum of Play, 2013) offer opportunities to conduct research about and experience game play and production of historical video games. The Seattle Interactive Media Museum (SIMM) joins these pursuits through its mission of cataloging, preserving, and exhibiting video games, interactive media, and related artifacts for educational and research purposes.

Collecting and preserving historical physical video games comes with challenges. Intellectual information about older games, source codes, and other associated assets are rapidly being lost. With the advent of smartphones, tablet computers, and social networking, many games are increasingly released solely in digital form—having no physical component and accessible only via download, streaming, or from “the cloud.” Similar to other born-digital artifacts, these “digitally distributed” games are problematic to collect, catalog, and preserve, yet museums like SIMM risk misrepresenting the video game domain and underserving users if they are unable to include and make these materials available in their collections.

This paper discusses the challenges SIMM and other museums face when attempting to catalog digitally downloadable, streaming, and “cloud-based” media as part of their collections. Using the metadata schema for physical games at SIMM (Lee et al., 2013b) as a springboard, each element was analyzed and revised to accurately represent digitally distributed games. The schema was then tested through the actual cataloging of samples of digitally distributed games to determine where breakdowns occur. In addition to revealing necessary information elements for digitally distributed games, this work also offers insight for other institutions collecting and cataloging born-digital materials in other domains.

2. Relevant work

Libraries, museums, and archives increasingly overlap in the artifacts they collect. All three organizations have recently struggled with the onslaught of digital objects—books, manuscripts, art, and other documents expressed as electronic bits rather than traditional physical media. These organizations are responsible for acquiring, appraising, collecting, cataloging, and preserving digital media for current and future access. Yet digital media offers new challenges to these tasks: the degradative nature of storage media; the need for particular hardware and/or software to access the media; the short lifespan of digital materials; and the increasingly immense volume of digital media generated due to its easily creatable nature (Deegan & Tanner, 2006). Additionally, new metadata are needed to describe these digital objects, as attempts to shoehorn new emergent artifacts into existing descriptive standards results in description based on physical form, rather than intellectual content (Leigh, 2002). Schemas and standards designed to describe objects regardless of material form, such as Dublin Core (DC), Resource Description and Access (RDA), and Cataloging Cultural Objects (CCO), are often too broad to satisfy the targeted needs of specific domains, such as video games (Nevile & Lissonnet, 2005; Canadian Library Association et. al., 2010; Baca et. al. 2006). While some elements from these standards may be universally applicable and therefore useful in creating application profiles, emergent digital objects like video games often require additional descriptive elements not yet present in existing schemas. As application profiles cannot introduce new data elements (Heery & Patel, 2000), new schemas become necessary.

Describing digital objects is no longer a new phenomenon, as numerous organizations offer access to digital content in their collections. However, these materials are often digitized materials—materials that originated in a physical format but have subsequently been converted to digital data (Reitz, 2014a). Descriptive standards for these digitized surrogates usually rely on descriptions of the original physical item. For instance, the digital image of Rembrandt van Rijn’s Night Watch hosted by the Rijksmuseum (Rijksmusuem, 2014) relies on the painting itself for descriptive metadata: the dimensions listed are those of the physical painting (i.e., h 379.5 cm × b 453.5 cm × g 337 kg) rather than the resolution of the digital image file; the media is listed as “olieverf op doek” (oil on canvas) rather than electronic bits. This makes sense, as the object of interest to users is the physical painting itself rather than its digital representation. In these cases, the digital object is not a work itself, but a “documentary image” of a work (Furner, 2007) that almost becomes metadata of the physical original, rather than a stand-alone object. Some digital video games may be similar cases, such as the new digital download versions of Chrono Trigger, available on the PlayStation Network, or The Legend of Zelda: Ocarina of Time, on the Wii Virtual Console.

But what happens when the object of interest is not a digitization of a physical resource, but a truly native digital object, one that never existed in any traditional physical form? Contemporary digital art, architectural plans, video games, and software are created specifically in and for a digital, electronic environment with no connection to physical resources (Liu, 2009). Corporate archives and other organizations collecting these born-digital materials face the same challenges as digitized materials, but without the descriptive connections to extant physical materials. A survey of cultural heritage organizations in the United Kingdom showed most librarians and curators felt unprepared to deal with born-digital materials, regardless of whether they were created in-house or acquired from external sources (Simpson, 2005). Subsequent work in Australia (Pymm & Lloyd, 2007) found similar attitudes and a conscious avoidance of collecting born-digital materials for these very reasons.

Some institutions that do proactively collect born-digital materials advocate that capturing metadata during the creation of material and collaboration among involved parties is critical, and has led to successful outcomes in many fields dealing with born-digital assets (Gibson, 2008). Creating metadata in collaboration with creators of works is especially beneficial in creative fields affected by digital technologies, such as photography and new media art (Keough & Wolfe, 2012; Documentation and Conservation of Media Arts Heritage Research Alliance, n.d.). However, such collaborations are difficult to achieve in the video game domain, where designers and developers are notoriously lax about archiving games and related information and materials (Andersen, 2011; Sinclair, 2012). Like their counterparts designed to record descriptive metadata for traditional physical objects, schemas designed to describe new media art, such as DOCAM’s cataloguing guide (, still focus on elements specific to their own specific domain. Elements like <iconography>, <medium>, <technique>, and <measurements> are all valuable for describing art installations but may not be applicable for representing video games. Even detailed technical specifications such as <programming language>, <ratio format>, and <sound and video compression system used> are not suitable to describe video games for retrieval by users.

Additionally, preservation and description efforts may be successful for born-digital materials consciously created as lasting works, such as photographic art or corporate records. However, many born-digital materials were not designed with long-term existence in mind, yet offer insight into cultural heritage. Born-digital materials like blogs, Facebook posts, Twitter feeds, and even memes are of interest to scholars of society and culture, yet exist only as “digital ephemera”—digital objects not designed to survive (Mussell, 2012). Many new digital video games fit this ephemeral definition: mobile apps with low price points and other free online Flash-based games are designed to be played for a short time and are easy to abandon or discard.

A substantial increase in digital game content sales in the past two years (ESA, 2013a) demonstrates that digital video games are an important part of our cultural record, and have already started replacing a large proportion of physical games. While descriptive standards for video games do exist (e.g., McDonough et al., 2010; Winget, 2011), they tend to focus on older (i.e., physical) games due to an interest in preserving historical artifacts from early gaming years. Moreover, digital video games run the gamut from digitally distributed versions of traditional physical games to ephemeral installations, and therefore, descriptions based on physical form are no longer applicable to these materials. In addition to the focus on preservation, prior studies tend to consider game information from a data- or creator-centric point of view, rather than that of an end user, such as a video game researcher or museum visitor. Additionally, digital media, like games, are defined as much—or even more—by their performativity and interactive nature than by any physical characteristics (Rinehart, 2007), and these descriptive elements are currently absent from existing schemas.

3. Study design and method

SIMM and the University of Washington Information School GAMER (GAme MEtadata Research) Group recently created a user-centered descriptive schema for video games (Lee et al., 2013a). The CORE set prescribing basic metadata elements about video games that should be described in any context was established in 2011, followed by a larger recommended set of elements in 2012. The schema’s preliminary development revealed issues unique to video games, such as undefined genre descriptions, inaccurate release dates, and edition variations (Lee, Tennis, & Clarke, 2012). Additionally, the format issues raised questions for cataloging digital versions of video games. When testing the schema with sample games, especially digitally distributed games, it became clear that the initially proposed schema needed modification to be appropriately used. In this study, we attempted to answer the following research questions: Which information elements in the current schema could successfully describe both physical and digitally distributed games, and what modifications are required to better represent digitally distributed games?

In order to answer these questions, the authors needed to test which schema elements worked for both formats and which only for physical games, and identify additions needed to describe unique characteristics of digital games. The forty-six metadata elements from the Version 1.1 of the metadata schema are presented below (asterisks indicate CORE elements) (Lee, Cho, Fox, & Perti, 2013). Please see Lee et al. (2013b) for full definitions of elements.

  • Title*
  • Alternative title
  • Edition*
  • Format*
  • Series *
  • Franchise/Universe
  • Platform*
  • Developer*
  • Publisher*
  • Distributor
  • Special hardware*
  • Online capabilities*
  • System requirements
  • Game credits
  • Official website
  • Price/MSRP
  • Retail release date*
  • Controls
  • Packaging
  • Number of players*
  • Rating*
  • Purpose
  • Customization options
  • Difficulty levels
  • Achievements/Awards/Trophies
  • Region*
  • Language*
  • Identifier*
  • Box art/Cover
  • Screenshots
  • Trailers
  • Gameplay videos
  • Genre*/Gameplay
  • Style
  • Plot/Narrative
  • Theme
  • Setting
  • Mood/Affect
  • Temporal aspect
  • Presentation
  • Point of view
  • Character names
  • Character types
  • Link to historical events
  • Type of ending
  • Visual style

The schema was tested through cataloging actual samples of digitally distributed games to illustrate where breakdowns occur. Thirty-one digitally distributed games (games with no corresponding physical components) were selected for cataloging. Note that some of the listed games may appear to have physical versions, such as Solitaire; however, in the context of this project, “no corresponding physical components” refers to specific instantiations of a game (i.e., this particular solitaire game) that, while perhaps inspired by physical card games and/or PC disc games of the past, were created and released solely in a digital format. Contrast this example with games such as Katamari Damacy and Grand Theft Auto IV, which were both initially available on optical disc but subsequently made available as digital downloads requiring no physical carrier. Selected games represented a variety of game types, styles, and genres, as well as distribution formats, including online games, downloadable games, and games accessed through digital distributors such as Apple iTunes or Google Play store. Table 1 lists the selected games and the particular editions we examined.

Game Title

Edition Examined


Agricola iOS app (iPad) Playdek, Inc.
Bakery Story iOS app (iPhone) TeamLava
BANG! [HD] the Official Video Game iOS app (iPad) SpinVector S.p.A.
Bubble Witch Saga iOS app (iPhone) Limited
Barry’s Bad Night Flash-based danthemilk
Candy Crush Saga iOS app (iPhone) Limited
Catan HD iOS app (iPad) USM
Caylus iOS app (iPad) Big Daddy’s Creations
DragonGem Android ITREEGAMER
Dots: A Game About Connecting Android Playdots, Inc.
Dungeons & Dragons Online PC download Turbine, Inc.
Final Fantasy Tactics: The War of the Lions for iPad iOS app (iPad) SQUARE ENIX INC
Ghost Trick: Phantom Detective iOS app (iPad) Capcom
Grid Game Flash-based Mark James
InSpheration Flash-based Puzzle Lab
Juniper’s Knot PC download Dischan Media
Kumo Lumo iOS app (iPhone) Chillingo
LINE Fluffy Diver Android LINE Corporation
Minesweeper Classic Android IT Benefit
Neverwinter PC download Cryptic Studios
Phoenix Wright: Ace Attorney Trilogy HD iOS app (iPad) Capcom
Planetarian iOS app (iPad) VisualArts Co., Ltd.
Plants vs. Zombies HD iOS app (iPad) PopCap
Reiner Knizia’s Tigris & Euphrates iOS app (iPad) Codito Development Inc.
Robot Unicorn Attack Flash-based [adult swim] games
Solitaire Android Ken Magic
The Sims FreePlay Android EA Swiss Sarl
Tiny Farm iOS app (iPhone) Com2uS USA, Inc.
VidRhythm iOS app (iPad) Harmonix
WhizzBall! Flash-based Discovery Kids
Words With Friends Free iOS app (iPhone) Zynga Inc.

Table 1: Digitally distributed games cataloged

Each game was cataloged twice according to the metadata schema and its instruction: once each by two different catalogers. Catalogers were students in the University of Washington Information School’s INFO 498/INFX 598 special topics course on metadata for video games. These participants covered a wide range of experience both in traditional cataloging (from novices to professionals with experience in cataloging instruction) and video gaming (from casual players to dedicated gamers and collectors). The results from comparing two metadata records created for each game and feedback from the participants about the process were used to identify successes, issues, and areas for improvement.

4. Discussion

First, the definitions of metadata elements and instructions were reviewed to determine applicability to digitally distributed games, including apps, streaming, and cloud-based games. Next, thorough examination of the metadata records produced by the catalogers identified all the elements that had inconsistent values. The catalogers also provided feedback about elements that were particularly difficult to address during the cataloging process.

Analysis of the schema along with cataloging sample games revealed ongoing challenges in describing video games of all formats. Many problematic elements had little to do with format but applied to physical and digital games equally. Elements such as <mood/affect>, <plot/narrative>, <setting>, <theme>, and <visual style> suffered from the same subjectivity and inconsistent descriptions regardless of whether a game occurred in a physical or digital format. As the analysis of these elements is applicable to video games at large, they will be discussed elsewhere (see Lee et al. in preparation). Instead, this section will focus on elements that broke down when applied to digitally distributed games. Table 2 shows the list of elements deemed particularly problematic for these games.


Original Definition (from Lee et al., 2013b)


Edition* A word or phrase appearing in the manifestation of the game that normally indicates a difference in either content or form between the manifestation and a related manifestation previously issued by the same publisher/distributor (e.g., second edition, greatest hits), or simultaneously issued by either the same publisher/distributor or another publisher/distributor (e.g., collector’s edition, limited edition). The edition designation pertains to all copies of the manifestation produced from substantially the same master and issued by the same publisher/distributor or group of publishers/distributors. (modified from FRBR, 2009, p. 41)
  • Is a digital format or subsequent iteration of a physical game to be considered an edition, port, version, different work, or something else?
  • In addition to <edition>, another element <version> needs to be created in order to describe different versions of the same game edition (e.g., different patches, bug fixes).
  • For game apps that offer a free but limited version as well as a paid full version (i.e., freemiums), should we consider them as different editions of the same game?
Format* The distribution medium or method that provides the executable code of a video game (e.g., cartridge; disc; born-digital).
  • The term “born-digital” may not encompass the detail necessary to describe the game—perhaps more detailed descriptors such as “downloadable,” “streaming,” etc. are necessary.
Platform* The hardware and firmware required to realize the game (e.g., PlayStation 3; Xbox 360; Nintendo 3DS).
  • The controlled vocabulary needs to be revised in order to accommodate game apps. With a significant number of devices capable of running the exactly same software executable, should <platform> be changed into two separate elements <system requirements> and <operating system>?
Special hardware* A hardware that is required or recommended for playing the game in addition to the main platform/console (e.g., motion controller; gaming headset).
  • Some game apps merely act as extensions of other games on other platforms and vice versa. Does this feature belong in <online capabilities>? Similarly, retail barcode scanner applications used at the Apple Store, for example, require additional hardware on iPhones to work. Game apps, to our knowledge, do not require additional hardware, but some very well may in the future.
Online capabilities* The capabilities for playing the game online and/or downloading the game or additional features online.
  • The vague definition causes issues—this was originally intended to describe physical games with online components and/or games that needed online components (like a server) to run (e.g., MMORPGs like World of Warcraft).
Game credits The intellectual contributors of the game and their roles as specified in the manual/end-game credits (e.g., a list of people who contributed to creating the game, such as “Graphics by Akiko Kazumoto”).
  • It is challenging to find this information on many digital games, especially for some game apps.
Price/MSRP The suggested manufacturer’s retail price (MSRP) at time of initial release in the region where the game was released.
  • Digital games, especially apps, frequently change price, and therefore the metadata value may differ depending on when the information was sourced. Also, the historical record of price change is often not available or accessible.
  • Many games employ “freeminum” models for generating revenue, offering multiple versions—free and paid. This is especially problematic when defining the variations between non-paying and paying customers. Is it more accurate to list the “full” price of the game, including all additional contents/features, or the price of the basic version? Are they even the same game or different games?
Retail release date* The date of the public/commercial release of the manifestation. (modified from FRBR, 2009, p. 42)
  • This information is often not available and difficult to source for game apps. Most distribution platforms provide dates for a game’s current version; for instance, Apple app store and Google Play store both show the last “updated date” for game apps rather than the original release date. Finding out the dates for prior versions, as well as acquisition of prior versions, is difficult.
Packaging All items included in the original packaging of the game (e.g., a list of what is included in the game package, such as “includes manual and action figures”).
  • This element is not applicable, as digitally distributed games do not have packaging.
Region* The names used to refer to a place, region or territory where a game is designated as playable (e.g., North America NTSC-U/C; Japan and Asia NTSC-J).
  • It is unclear what the region information is for game apps, since the region lock for most console video games does not apply to apps. Some distribution platforms require a resident IP address or other passive (operating system) or active (registered physical location) credentials in order to access, download, or play some games. Available games may also be different translations or versions, or entirely unavailable depending on these factors, as well. This information is often unclear.
  • If the game is available in particular languages, does that imply that the game is marketed in the regions where the languages are spoken?
Identifier* An alphanumeric code uniquely assigned to a manifestation of the game. When available, use the Universal Product Code (UPC) which is a 12-digit number representing a barcode. For some games (e.g., downloadable game app), this information may not be available. Note: this element used to be UPC in an earlier version of CORE16.
  • This element is applicable, but it is difficult to source for digitally distributed games. It is recommended to use internal organization/database IDs rather than standard retail industry identifiers.
Box art/cover The image featured on the front of the box/packaging or the officially released image which virtually represents the downloadable games.
  • Digitally distributed games generally do not have singularly defined physical box art of a “best edition” (i.e., preferred representation).

Table 2: Problematic elements for digitally distributed games

Four main issues continually arose: 1) difficulty finding information about video games; 2) imprecise element instructions and definitions; 3) the need for improved controlled vocabularies and encoding schemes for element values; and 4) varying levels of domain knowledge. Below we discuss these four areas with respect to digitally distributed games.

1. Difficulty Finding Information about Games

While finding information about video games can be problematic for physical games, such as finding an accurate release date (Lee, Tennis & Clarke, 2012), problems increase for digitally distributed games. Many element definitions and instructions direct catalogers to record data from the ‘chief source of information (CSI).’ A traditional library cataloging concept, the CSI is a source that has precedence over all other sources in the preparation of a descriptive catalog record (Reitz, 2014b). Traditional library cataloging rules stipulate CSIs for a variety of media, such as title pages for books and title screens for films. These rules attempt to address non-book materials, including games, stipulating the CSI as “the object itself together with any accompanying textual material and container issued by the publisher or manufacturer of the item,” preferring information found on the object itself over any accompanying material (Joint Steering Committee, 2002, 10.0B1).

But how does one refer to ‘the object itself’ in the case of a digital game that does not physically exist? Perhaps playing the game is experiencing the object itself, since information such as plot, temporal aspect, and type of ending may only be evident by playing. Some participating catalogers could not play the game itself, either because the game was not free to access, or because it did not run on a platform to which they had access (i.e., no access to an Android device when describing Android apps). Playing a game also takes time, making the game cataloging process inefficient.

Yet if the game itself cannot serve as CSI, then what can? As digitally distributed games have no containers or packaging, the next closest accompanying material becomes the informational page about the game provided in the Web or app store. This has its own problems, including reliability and sustainability of data: is the data from an app store or other secondary record authentic? Or is it skewed for marketing purposes? For example, sourcing region from Apple’s app store, which restricts content by country rather than by the specified industry regional standards (NTSC, PAL, etc.) makes the element (as it was originally defined) not applicable. The store also does not offer any easy way to examine a game’s availability by country—the only way to verify that a game is available in a specific country is to change the store’s geographic location settings and search for the game—a laborious and inefficient process. Sustainability of data is also a concern: does the data change as patches are applied and versions updated? For example, if a cataloger records information about Angry Birds from the iTunes store in 2012, and another cataloger describes the same game in 2013, will the data from the store be the same, or will the version, file size, language, release date, price, etc. be different? Such issues raise questions of data validity and authenticity, but also about the very concept of video games—is the version of Angry Birds from 2012 the same game as the version of Angry Birds from 2013? They may have changes in many of the descriptive elements. This raises complicated questions about versions and variations, and whether information about games should be recorded on an instance level, such as the specific 2012 instance of Angry Birds that was released in the United States for $4.99, or whether we are trying to describe the overall concept of the game Angry Birds, such as its plot, gameplay interactions, and visual style.

2. Imprecise Element Instructions and Definitions

Once game information is located, the challenge becomes formatting and recording that data in a consistent way. Analysis of metadata records created during the cataloging process revealed inconsistencies in descriptive values due to a lack of detailed, precise instructions. For instance, when describing Agricola, one cataloger listed “Playdek” as the developer and “Apple” as the distributor, while the other recorded “Playdek, Inc.” and “Apple, Inc.,” respectively, with a note that the game was distributed via Apple iTunes. While this may seem trivial, many systems still collocate and retrieve identical entries, not near- or almost-identical entries. It is a positive sign that the cataloger’s descriptions converged conceptually—it demonstrates they are looking in the right places and getting the correct information—but further instructions or computerized guidance are clearly necessary.

The <edition> element, as originally defined, is especially problematic. Digitally distributed games offer challenges never before encountered when it comes to versions and editions. What, exactly, makes a new edition of a digital game? Does the change in one line of code make a new edition? Is version 1.2 a new edition of 1.1, even though the only difference is a bug fix not even visible to a user during gameplay? If each and every version of a game is a new edition, do we need metadata records for each edition? How important is it to users to distinguish between Angry Birds 1.1 and Angry Birds 1.2? Digitally distributed games open a sinkhole of conceptual conundrums about definitions of games, versions of games, and their relationships to one another. This is likely applicable to other fields of digital media: is the digital copy of Nightwatch on your local computer the same as the one from the Rijksmuseum website? To address these issues, a conceptual data model for video games detailing versions and relationships is necessary.

Upon application of the <edition> element, further challenges arose. While a physical game’s packaging might indicate that it is a “special” or “collector’s” edition, many digital apps do not clarify an edition statement. Additionally, apps are well known for offering a “free” version of a game, with restricted playtime, limited rewards, customization options, and other content, while concurrently offering a “paid” version with more rewards, different content, fuller features, and additional options. Catalogers noted that the original schema offered no clarity about how to deal with such instances beyond using a free-text note.

The <price> element, while seemingly objective, proved to be difficult. Instructions stipulated recording the manufacturer’s suggested retail price (MSRP) at the time of release, yet there is no reputable way to uncover this information. Since prices are rarely sourced from a digitally distributed game itself, catalogers must turn to secondary information sources, such as Web pages or app store listings. Prices fluctuate, and there is no available tracking history of pricings, discounts, special deals, or sales over time. For example, Bang! had two different prices recorded: $0.99 and $2.99. Later, we discovered that the app was on sale for a limited time when catalogers recorded the data. Since price necessitates external secondary sourcing that may vary over time, it is important to include attributes for qualification, such as the date the price information was retrieved and from where it was sourced. This could accommodate the multiple recorded price problem our two catalogers faced by allowing for both prices to be recorded as such:

<price currency=”US” source=”iPad iTunes store” retrieved=”20130925”>0.99</price>

<price currency=”US” source=”iPad iTunes store” retrieved=”20131010”>2.99</price>

Such detailed attribute instructions not only assist in recording data consistently, but also offer provenance information for future researchers tracking pricing trends.

Other elements with unclear definitions and instructions included <identifier> and <rating>. Because the instructions for the <identifier> element specifically prefer the use of universal product codes (UPC), there is a clear bias toward physical games with packaging. Digitally distributed games have no UPCs.

The element <rating> also needed revision for digitally distributed games. While physical game packaging clearly presents ESRB ratings, game apps are rated based on different standards. For example, Phoenix Wright: Ace Attorney is rated T (Teen) by the ESRB, but iTunes assigns the rating of “9+” from its own custom rating scale. Like the <price> element, this might be addressed through the use of qualifying attributes.

3. The Need for Improved Controlled Vocabularies and Encoding Schemes

Issues with controlled vocabularies were most common for subjective elements like <genre> and <theme>. However, the lack of a controlled list of values for the <format> and <platform> elements was especially troublesome when describing digitally distributed games. The value “born-digital” for <format> did not disambiguate downloadable games from streaming games and games from the cloud, and was potentially confusing as all video games (at least the source codes) are born-digital. Catalogers suggested including values such as “browser-based” and “smartphone app” to distinguish specific formats.

The definition for the <platform> element (“the hardware and firmware required to realize the game”) proved console-centric, making it difficult to describe digitally distributed games. “Firmware” was an inadequate term to describe the operating system needed to run a game, which also caused confusion with the conceptual overlap between <platform> and <system requirements>. A controlled vocabulary for the <platform> element, such as a drop-down list of values from which catalogers could select, would not only help catalogers select and formulate the appropriate descriptive values, but also help illustrate the definition and intent of this element.

4. Varying Levels of Domain Knowledge

Our group of participant catalogers consisted of two types of people: those who mostly played traditional, physical console-based games, and those who were more interested in casual games on mobile devices and tablets. As expected, those who played casual games and apps knew more places to look for information. This divide also came through in the inability to apply many of the subjective elements to digital games. Because subject elements such as <plot/narrative>, <theme>, <mood/affect>, and <setting> were included in the schema, one cataloger mentioned a possible bias towards a specific type of game like RPGs. Many of these elements are not applicable to game apps like puzzle games, casual simulation games, and card- or board-games.

For some less-knowledgeable participants, it was challenging to even determine whether or not a game was, in fact, really a game. One game, VidRhythm, lets you “take videos on your phone then this app lets you mash them all together to make something crazy”—several catalogers did not consider this a game at all, again raising questions about what it means to be a video game.

5. Revisions to the schema

After analysis, the following modifications were made to adapt the schema for use with digitally distributed games in addition to traditional physical games.

New elements

  • In a new data model, new ID elements were added to each entity (e.g., <edition ID> to the Edition entity, <series ID> to the Series entity). Elements <edition title> and <series title> were also created.
  • New metadata elements <file type>, <file size>, and <icon> (the officially released image which virtually represents the downloadable games) were created for the Digital distribution package entity.
  • <note> element was added to capture any additional relevant information that might be unique to new digital games.
  • New element <additional content> was added to describe add-on content, mods, etc.
  • New element <patch history> was added to include the record of all the patches, bug fixes, etc. released for a particular game edition.

Revised elements

  • Changed the name of <online capabilities> to <connectivity> and rewrote the definition to clarify intent—“The ability of the game’s hardware and software to allow communication with that of other entities, such as game companies, third-party organizations, and other players.” Instructions were revised as follows:

    Please specify information for the following four dimensions: ‘Purpose,’ ‘Method,’ ‘Network Type,’ and ‘Bandwidth.’ Select ‘Play,’ ‘Non-play,’ or ‘Payment’ for the ‘Purpose’ dimension. Select ‘Wired’ or ‘Wireless’ for the ‘Method’ dimension. Select ‘User-to-user,’ ‘Server-based,’ or ‘Cross-platform’ for the ‘Network Type’ dimension. Select ‘Required’ and/or ‘Recommended’ for the ‘Bandwidth’ dimension.

  • <region> was renamed to <regional standard> to clarify the intent of this element to describe region lockout information.
  • The definition for <identifier> was revised to: “An alphanumeric code uniquely assigned to an entity.” This eliminated reliance on physical packaging information and also broadened the use of identifiers to multiple entities in the data model.

Revised controlled vocabularies

  • The controlled vocabulary for <format> was revised and expanded to include “downloadable” and “streaming.”
  • The controlled vocabulary for <platform> was revised and expanded to include “iOS device,” “Android device,” “Kindle,” and “Nook.”

6. Conclusions and future work

Born-digital cultural heritage materials continue to challenge traditional descriptive metadata methods. This is particularly evident in the domain of video games. With increasing interest in games as cultural artifacts, as well as the flood of digitally distributed games and apps in the marketplace, it is critical that we find ways of describing and preserving these materials that do not rest on physicality.

As a collector of both physical and digital video games, SIMM’s metadata schema must be able to represent both types of materials. While many descriptive elements of the two formats overlap, it was clear from cataloging thirty-one sample digitally distributed games that many elements could not accurately describe digital games to a satisfactory level. Elements are problematic not because they do not apply to digital items, but because the way in which they apply to digital items is unclear from their current definitions. Confusing definitions, especially those for <platform>, <format>, and related elements, needed rewriting to clearly include instructions for application to digitally distributed games. Our current work involves soliciting feedback from potential SIMM user groups and revising the schema to include clearer element instructions and definitions, especially geared toward digitally distributed games.

Digital media—be it digitized or born-digital—is increasingly prevalent. It is the responsibility of cultural heritage organizations such as libraries, museums, and archives to collect, catalog, preserve, and maintain these digital materials for future access and use. From datasets to art to video games, metadata designed specifically to describe these digital materials is the key to keeping them accessible in the future.


The authors would like to extend special thanks to INFO 498/INFX 598 participants at the University of Washington Information School for their valuable contributions. This research is supported by the Bridge Funding Award from University of Washington Office of Research.


Andersen, J. (2011). “Where games go to sleep: The game preservation crisis, part 3.” Gamasutra: The Art of Making Games. Consulted January 14, 2014.

Baca, M., P. Harpring, E. Lanzi, L. McRae, & A. B. Whiteside. (2006). Cataloging cultural objects: A guide to describing cultural works and their images. Chicago: American Library Association.

Canadian Library Association, Chartered Institute of Library and Information Professionals (Great Britain), Joint Steering Committee for Development of RDA, & American Library Association. (2010). RDA toolkit: Resource description & access. Chicago: American Library Association.

Crookes, D. (2012). “British Library starts videogame website archive project.” Last updated February 13, 2013. Consulted December 12, 2013.

Deegan, M., & S. Tanner. (2006). Digital preservation. London: Facet.

Documentation and Conservation of Media Arts Heritage Research Alliance. (n.d.). Cataloguing guide: Introduction. Consulted February 7, 2014.

Electronic Software Association (ESA). (2013a). “Essential facts about the computer and video game industry: Sales, demographic and usage data.” Consulted December 12, 2013.

Electronic Software Association (ESA). (2013b). “U.S. colleges, universities, art and trade schools offering video game courses, certificates and degree programs.” Consulted January 6, 2013.

Furner, J. (2007). “Two senses of ‘work.'” Transcript of talk presented at VRA 2007, Kansas City, MO. Consulted January 3, 2014.

Gibson, D. (2008). “Digital Asset Symposium: Museum of Modern Art, New York City.” April 25, 2008. The Moving Image, 8(2), 86–89.

Heery, R. & M. Patel. (2000). “Application profiles: Mixing and matching metadata schemas.” Ariadne, 25. Available at

Joint Steering Committee for Revision of AACR, & American Library Association (Joint Steering Committee). (2002). Anglo-American cataloguing rules. Chicago: Canadian Library Association.

Keough, B., & M. Wolfe. (2012). “Moving the archivist closer to the creator: Implementing integrated archival policies for born digital photography at colleges and universities.” Journal of Archival Organization, 10(1), 69–83.

Lee, J. H., J. T. Tennis, & R. I. Clarke. (2012). “Domain analysis for a video game metadata schema: Issues and challenges.” Theory and Practice of Digital Libraries, Lecture Notes in Computer Science, 7489, 280–285.

Lee, J. H., J. T. Tennis, R. I. Clarke, & M. Carpenter. (2013a). “Developing a video game metadata schema for the Seattle Interactive Media Museum.” International Journal on Digital Libraries, 13(2), 105–117.

Lee, J. H., H. Cho, V. Fox, & A. Perti. (2013b). “User-centered approach in creating a metadata schema for video games and interactive media.” In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 229–238.

Lee, J. H., R. I. Clarke, & A. Perti. (In preparation). “User evaluation of metadata for video games and interactive media.”

Leigh, A. (2002). “Lucy is ‘Enceinte’: The power of an action in defining a work.” Cataloging & Classification Quarterly, 33, 3–4.

Liu, A. (2009). “Digital humanities and academic change.” English Language Notes, 47, 27.

McDonough, J., M. Kirschenbaum, D. Reside, N. Fraistat, & D. Jerz. (2010). “Twisty little passages almost all alike: Applying the FRBR model to a classic computer game.” Digital Humanities Quarterly, 4(2).

Mussell, J. (2012). “The passing of print.” Media History, 18(1), 77–92.

National Museum of Play. (2013). “Atari by design: from concept to creation opens at the National Museum of Play June 22.” Last updated May 20, 2013. Consulted January 6, 2014.

Nevile, L., & S. Lissonnet. (2005). “Was CIMI too early? Dublin Core and Museum Information: Metadata as cultural heritage data.” DCMI International Conference on Dublin Core and Metadata Applications, 31–38.

NPD Group. (2013). “Research shows $14.80 billion spent on video game content in the U.S. for 2012.” Last updated February 6, 2013. Consulted January 6, 2014.

Owens, T. (2012). “Yes, the Library of Congress has video games: An interview with David Gibson.” Last updated September 26, 2012. Consulted January 15, 2013.

Pymm, B., & A. Lloyd. (2007). “Dealing with digital collections: Interviews with the National Library and selected state libraries of Australia.” Australian Academic & Research Libraries, 38(3), 167–179.

Reitz, J. (2014a). “Digitization.” Online Dictionary of library and information science. ABC-CLIO. Consulted January 6, 2014.

Reitz, J. (2014b). “Chief source of information.” Online Dictionary of library and information science. Consulted January 6, 2014.

Rijksmuseum. (2014). Digital image and metadata for “Militia Company of District II under the Command of Captain Frans Banninck Cocq, Known as de ‘Night Watch,’ Rembrandt Harmensz. van Rijn, 1642.” Consulted January 5, 2014.

Rinehart, R. (2007). “The Media Art Notation System: Documenting and preserving digital/media art.” Leonardo, 40(2), 181–187.

Simpson, D. (2005). Digital preservation in the regions. London: Museums, Libraries and Archives Council.

Sinclair, B. (2012). “Gaming preservation a growing crisis.” Consulted January 14, 2014.

Winget, M. A. (2011). “Videogame preservation and massively multiplayer online role-playing games: A review of the literature.” Journal of the American Society for Information Science and Technology, 62(10), 1869–1883.

Cite as:
. "Metadata for Digitally Distributed Video Games at the Seattle Interactive Media Museum." MW2014: Museums and the Web 2014. Published January 15, 2014. Consulted .

Leave a Reply