Historic letters among artists, collectors, curators, and museum directors are some of the most treasured primary source materials in art history. Within a scholarly publication, you may find a letter from Vincent Van Gogh discussing the places he traveled or a letter from Matisse about how he hopes youth will interpret his paintings. Everything from the date of a letter to the personal details included in the text can provide valuable insight into an artist’s work.
But what if those messages from Van Gogh and Matisse were to be sent today, in the midst of a museum director’s deluge of hundreds of emails per day? Would they be deleted because of storage restrictions or lost when the director changed jobs? What if those letters were saved in a Microsoft Outlook folder and years later, a new museum director comes along and handles all correspondence using Gmail. Would those messages ever be retrieved? Even if they were to continue to exist, on some hard drive in some basement, would those emails be accessible for research and possible publication?
Cultural heritage institutions are expected to maintain detailed records documenting their acquisitions and collections, including correspondence from and to the museum staff, artists, and other outside parties. Prior to the rise of the Internet and email, that correspondence was captured on paper and was likely to be sorted and archived as it was received. Today, that correspondence is more likely to take place over email and is unlikely to be effectively archived. It’s not just a workflow problem: staff that do take the time to archive emails may not be saving them in appropriate file formats—causing important metadata to be lost during that process—and notable emails are rarely saved in a systemized manner designed for future retrieval.
Museums and The Web is organizing a Deep Dive event focused on exploring the range of knowledge management, conservation, and preservation issues that email presents for art museums. This project will support the management of the nation’s collections and help to expand and sustain access for current and future generations. Archived emails are not only important for the smooth running of an institution but also for the use of students and scholars who would benefit from access to these historically-significant files for research purposes decades from today.
We will explore and document the issues:
Loss of Valuable Information for Research/Scholarly Use
The rise of email and the quick pace of correspondence in today’s museums may limit the primary source materials that will be available for future generations of students, scholars, and the general public. While there is much more data created every second of every day – such as the millions of Tweets and Facebook updates that are posted after historic political moments – much of this data will be lost if there is not a plan and means of preservation and access. For that reason, the Library of Congress started an initiative to archive all public Tweets from 2006 and beyond. And for similar reasons, tools and best practices need to be developed for museums’ digital communications or we may begin to see a “black hole” of art history from the early 1990s until this problem is solved.
Loss of Valuable Information for Exhibition, Conservation, and Interpretation
Art museums’ mission statements vary in wording and emphasis but almost all refer to a museum’s responsibility to care for, exhibit, and interpret works of art. Artworks often come into a collection via one curator who is in rich correspondence with the dealer or the artist. These conversations help the curator to develop a keen understanding of the piece and intricacies surrounding its care and meaning. But once it comes into a collection, the piece is accessible to other staff in the museum who have not been privy to these conversations—many of which are now taking place via email. Email expedites the communications but its virtual walls can complicate the work that each member of the staff is doing when caring and exhibiting the art.
A salient example of this complication can be found in Variable Media. The Variable Media approach encourages creators to define their work independently from medium, allowing the work to be translated once its current medium becomes obsolete. This requires artists to envision acceptable forms that their work might take in new mediums, and to pass on guidelines for recasting work in a new form once the original has expired. This is a particularly important consideration for contemporary art museums that collect site-specific works, interactive installations, and ephemeral pieces such as Felix Gonzalez-Torres’s Untitled (Public Opinion) that is composed of 700 pounds of black rod licorice candy. Questions raised around the installation and reinterpretation of such pieces can not be simply recorded with a set dimension in a collections management system. The conversations about a work’s essential meaning and acceptable levels of change often occur over email, between artist and museum staff. Retention of these emails is essential for a museum to adequately preserve media-based and performative works.
Storage and Compliance Considerations
While storage and compliance considerations are not at the heart of the museum-specific problem, museums are faced with the same issues that all present-day organizations confront. Email has become the de facto filing system for many organizations, with many attachments and business-critical messages stored locally within email folders instead of on shared servers. And as museums’ reliance on email continues to increase, so does the volume and size of messages and attachments. Nearly every IT department has struggled with the issue of storage management for messaging servers. Some museum professionals continue to report that their IT support staff will instruct them to delete emails when they reach their prescribed storage limits, even though there are better options available. Email was designed for rapid communication, not for archival storage. As a result, the need to easily preserve and search through this ever-growing data has become essential.
Another important consideration for museums is the issue of electronic search and discovery. Motions to discover electronic data are commonplace in today’s world of litigation but most cultural heritage institutions are unprepared to find electronically stored information, especially email and its associated attachments. Institutions need an easy way to search for relevant email to quickly meet legal discovery requests with minimal involvement from an IT department and/or consultants. As entrusted repositories for art treasures, history and provenance, museums are held to the highest standards for recordkeeping. Practices and tools developed to address knowledge and conservation issues would likely also be used in the case of these important, if rare, circumstances.
We will evaluate previous work:
The Deep Dive will explore previous work in the space including a review of relevant commercial products and open source and other projects, such as those noted below
We will review commercial products on the market from a wide range of companies such as Barracuda Networks, Symantec, and IBM. We expect to find limitations in these products that are often built to match corporate data retention policies that are put in place to ensure compliance with the Sarbanes-Oxley Act (legislation passed in 2002 by the U.S. Congress in response to corporate malfeasance). Corporate managers may be instructed to maintain emails for 5, 7 or 10 years depending on the type of document but are rarely expected to retain emails indefinitely. Corporations are also less likely to see a mission-related need to ever expose their emails for research and scholarly publication. Because of museums’ unique missions, visions, and values, a different kind of solution may need to be evaluated.
From 2007-2009, North Carolina State Archives led a project with funding from the National Historical Publications and Records Commission to design and test email preservation software that converts email from its native format to XML. While some of the testing and training results can inform future progress in the area of software development, the project did not go far enough to resolve known software issues or the objective to provide access to the XML files through online catalogs or web interfaces. The team noted administrative changes and difficulties in securing IT support among the challenges for addressing this aspect of the project.
A second known initiative is the Collaborative Electronic Records Project (CERP), a joint project of the Rockefeller Archive Center and the Smithsonian Institution Archives to develop the methodology and technology for preserving born-digital materials in archival collections. By the conclusion of that project in 2008, CERP had worked with the North Carolina State team to parse more than 89,000 emails from the generic format (MBOX) into XML. CERP also tested a software tool that preserves email accounts together with their messages, and began basic searching development. CERP’s final summary outlines lessons learned (e.g., various XML software programs experienced difficulty in opening large XML files for validation) and a wish list of how CERP would like to see its work carried forward (e.g., tools for search and retrieval). Our project team will leverage the work that CERP has already done by testing their applications on data sets from multiple institutions. We will also look at the reasons why the email best practices put forth by CERP and others have not been adopted by most institutions.
Another relevant project is taking place at Stanford University, where a group is working on a tool called MUSE (Memories Using Email) that could potentially be modified to meet museums’ needs. MUSE is a system that combines data mining techniques and an interactive interface to help users browse a long-term email archive. Researchers have been using the MUSE program to explore email archives containing up to 50,000 messages and are working with test users to browse four different kinds of cues (groups, names, sentiments, and photos) to discover messages from within the archive. While some of the functionality is similar to that of systems used for legal discovery and intelligence analysis, MUSE’s emphases on key terms and data visualization have the potential to make it a better match for museums.
We will form a working group
At the conclusion the Deep Dive will endeavor to form a Working Group to engage museum staff involved with the care, exhibition, or interpretation of objects. These include curators, conservators, archivists, registrars, educators, and even web managers who are responsible for how a digitized object is presented online. While museum staff commonly use collections management systems to maintain object files, significant information from emails would not normally get transferred to the collections management system. There are also cases of erroneous data entry and/or incomplete entry that causes significant problems for staff, particularly when the staff that carried out the correspondence is no longer with the institution. In these cases, email archiving tools and best practices help to ensure that important knowledge is passed on from an individual inbox to the broader team of staff working with collections. Finally, while the focus of this working group is on art museum staff, solutions may also be appropriate for staff of other types of museums, libraries, and nonprofit organizations.
The working group has the potential to serve museum directors and administrative staff who engage in important, high-level correspondence with board members, donors, and other outside parties. Email archiving tools that are identified or created could potentially also help directors to identify and preserve significant emails that are vital to an institution’s compliance with local, state, and federal laws, as well as with the specific legal standards governing trust responsibilities.
We will survey the community:
- Who is generating the “important” emails?
- What should be considered important?
- Who should be responsible for this identification?
- What metadata is potentially important?
- How are staff currently archiving significant emails?
- Do museum employees divide business and personal emails?
- What is the typical email life cycle?
- What are the data retention and email policies that museums already have in place?
- What is the range of message stores (e.g., MS Exchange server, Gmail server)?
- How much professional correspondence is taking place outside of email, such as via social networking sites?
- What is the optimal file format for long-term preservation?
- How do you support archiving of email threads?
- How should attachments be linked to email messages throughout processing?
- How can linkages be made to the databases that museums are already using to manage their libraries and archives?
- What tools would need to be built on top of the storage system for search and retrieval of specific emails?
- What algorithms are needed to help people find the right files?
- Who should have permission for what tasks—access, editing, viewing, saving?
- In what ways will depositors’ access rights differ from researchers’ rights?
- What are the specific concerns among museum staff regarding privacy and confidentiality?
- How can we account for the human factor of people not wanting their email archived?
- How do those concerns change when thinking about access in 50 or 100 years?
 “The Variable Media Initiative,” Guggenheim Museum, http://www.guggenheim.org/new-york/collections/conservation/conservation-projects/variable-media (accessed January 24, 2012).
 “Case Studies,” Variable Media Network, http://www.variablemedia.net/e/case_gonza_publi.html (accessed January 24, 2012).
 “Final Report: March 1, 2008-June 30, 2009” North Carolina State Archives, 2009, http://www.records.ncdcr.gov/emailpreservation/docs/emcap_finalreport_20100109.pdf (accessed January 25, 2012).