Hardware

HARDWARE

 

Introduction

One of the goals of the MJF project is to create a repository to store in a digital form the audiovisual archives of the Montreux Jazz Festival. An archival repository is structured for storage, maintenance and access over the long term.

[3]’s holistic approach, accordingly with OAIS model, lies in this sentence: “a functioning preservation system must consider all aspects of a digital repository: Ingest, Access, Administration, Data Management, Preservation Planning and Archival Storage, including storage media and management software” (p. 3). So the choices of the hardware for storage (media) and of the management software depends on a single reflexion, about Archival Storage.

By “storage hardware”, we mean the physical carrier of information, or the storage media, or the physical format. The other aspects of storage – the software (the system architecture, or the way the content and the descriptive information is acquired, stored, indexed, secured, searched, exported, transformed and made accessible), the content itself (essence, with format, resolution, and the metadata) are discussed in other chapters: we advocate for an independent and neutral format (like JPEG and unlike Betacam), which allows to separate formats and carriers.

Using mass storage instead of discrete storage (off-line tapes, for example, or DVDs) has several advantages [15]:

  • Density: mass storage allows the use of less space. And as the density of carriers grows up each year, the costs can decrease.
  • Media cost: IT storage solution (like data tapes) are far less cheaper than specialised audiovisual media, because the market is large and very competitive.
  • Disaster recovery: the low cost of mass storage solutions allows to dupplicate information at the same cost; and the fact that it is connected to networks allows fast recovery.
  • Stock management: the assets allways remain available, even when copied, processed, migrated, etc.
  • Varying value: mass storage allows Hierarchical Storage Management (HSM): the most used items are transferred to expensive carriers (with fast access), while items not used are stored on cheaper medias.

When choosing the adequate physical carrier, not only general criteria have to be examined, as makes this document, but also the specific needs of the project should be taken into account: the estimated total capacity and the speed needs. In general, speed is not very important for archival purpose (master archives) [13] [11]. [11] notes that speed of file handling is far less important than guaranteeing that a file will be neither lost nor damaged. So preserving is a paramount, providing access is secondary (very low demand, some documents will never be accessed), and providing quick access is a luxury (extremely fast disks arrays are expensive and don’t fit preservation needs). But the secondary archive has to benefit from a very efficient connection for processing and display.

This point needs a precision: in this document, we mainly speak about the preservation needs, which apply only to the Master Archive of the project. The project’s presentation document [7] advocates for the use of data tapes for the Master archive, and of disks for the Secondary archive. If the choice of disks for secondary archive seems indisputable (because of the need for fast access, which data tape cannot offer), the choice of data tapes for Master archive, even if usually used in institution, should be carefully examined.

Selection criteria

The marketplace decides the longevity of digital media on economics’. So technology obsolescence occurs earlier than the end of the physical carrier’s life expectancy. For this reason, no digital storage can be considered as the “definitive” archival solution, able to last forever. For instance, a LTO tape can last from 10 to 30 years [9], but beyond 6 years, the drivers will no longer be able to read it [6] (p. 11).

The only way to overcome this is to periodically refresh the content onto new media. Change is inescapable. This strategy implies a careful selection of the appropriate media, in order to 1° maximize the periods between refreshment cycles (and minimize the costs), 2°simplify the refreshment procedure, and 3° minimize the risks of data lost [4] (p. 4). So the difficulty is not to find a system lasting forever, but to find a system which is able to evolve.

Here are the selection criteria proposed by [4], which are a good digest of all selection criteria proposed in the literature.

Longevity

The storage media should have a proven life span of at least 10 years for the long-term preservation.

Capacity

Minimizing the number of actual media to be managed will usually create efficiency and resource savings. For the moment, data tapes have the highest capacity (1.6 Tb with 2:1 compression), but are challenged by hard disks (1 Tb). In the domain of areal density hard disks have an advantage (520 Gb/in2 vs. 2-3 Gb/in2 for longitudinal tapes [13]).

Viability

The media and drives should be able to ensure the integrity of data over time. Several features have to be supported: error-detection methods, data recovery techniques in case of data lost, WORM features, no correction and no-lossy compression.

In large scale archives, these functions cannot be managed with off-line carriers: the tasks have to be automated. For instance, RAID-5 systems and LTO tapes have error detection and WORM (Write Once Read Many) features.

Obsolescence

Obsolescence is inescapable, and it will occur before media physical degradation. To limit the costs, the refreshment cycles have to be shortened, by choosing media with a quiet long technical life expectancy. So one will select a technology mature rather than leading-edge, well established in the market place and widely available. From this point of view, optical disks are not a good choice because there are multiple standards and not a stable and “definitive” choice (the blu-ray technology is the new standard, but for how long?). Hard disks technology also evolves at a very rapid pace.

In addition, as migration is the solution to obsolescence, the media should be easily migrated: open standards rather than proprietary solutions, interoperability, and even backward compatibility. For instance, LTO tapes are an open standard, available by any tape library manufacturer, and two generations backward compatible. At the contrary, high-ends products, like Sun StorageTek T10000 is a very performing solution, but proprietary, and limited to Sun solutions.

Cost

Each solution (hard disk, tape, optical disc) has its own strengths and weaknesses; so the cost is a decisive selection criterion. It is also an important issue when planning for long-term preservation: even if the budget is ensured in the first steps of the digitization project, the maintaining of the system, through continuous migrations, has to be ensured.

So not only the cost of the media (price per Mb or Gb) is important, but also the Total Cost of Ownership, i.e. the costs for purchasing and maintaining the necessary hardware and software, including Mean Time Before Failure (MTBF), Annual Failure Rate (AFR), for instance, or the coercivity value, number of loads/unloads, etc.

Susceptibility

This is the physical strength of the media: it should have low susceptibility to physical damage, and should be tolerant of a wide range of environmental conditions without data loss. So the thickness of the tape, for example, should be taken into consideration.

In this domain, hard disks have an advantage, because they are not sensitive to temperature and humidity variations (and the costs relative to the control of the room’s environmental conditions are lower).

Descriptions of the different media

There are three main families of physical formats for storing digital material: optical disks, magnetic disks (hard disks) and magnetic tapes. In this section, we will present them in details.

We won’t speak here about geographically distributed, Internet based archival systems – whose focus is not to rely on “physical” media, but to distribute the archive on P2P networks, for instance [17] – because such solutions are not yet implemented.

Optical discs

The optical discs family is large, and includes many standards: CD-ROM, CD-R, CD-Rw, DVD-ROM, DVD-R, DVD+R, DVD-Rw, Blu-Ray, etc. There are also some professional standards, like UDO, AOD, etc. For a complete presentation of optical disks, see [2].

Optical disks could be a preservation media, since they offer a good cost-to-storage space ratio, they are a WORM media, they are reliable over time, and they have fast access. But, following [2], they are not a reliable long term storage media, since it is difficult to predict their life expectancy: some say 50 to 100 years, and even over 1000 years for CD-ROM [14], but with 2 to 5% probability to fail with good storage conditions. Actually, the life expectancy depends on the way the information is written (magneto optical, phase change, polymer dye decomposition, optical path length modulation), and the quality of fabrication. Moreover, the multiple existing and still developed standards are not a good guarantee against technological obsolescence.

Optical disks have several other weaknesses: slower random access, weak areal density in comparison with magnetic tapes, a small capacity (25-30 Gb for Blu-ray disks), complex interface laser-disk (which implies fast technology obsolescence), no guarantees given by the manufacturers about the readability of the data on the disks over a well specified period of time (which could make difficult or impossible to plan migration) [1]. Finally, the market of optical mass storage is a niche, which could make it unavailable (because non rentable) in the future.

However, some professional formats have been developed and are used as mass storage media in jukeboxes:

  • UDO (Ultra Density Optical Disc) by Plasom, 60 Gb (gen 2): life expectancy of 50 years, data authenticity and integrity features (WORM), low cost. IBM manufactures jukeboxes.
  • Professional Disc for DATA (Sony), blu-ray technology, 23.3 Gb
  • AOD (Advanced Optical Disc), by Sony and NEC, 20 Gb

Conclusion: optical discs have not been developed for an archival and preservation use, but for mass market. Even financially, they don’t constitute a good alternative to HDD and tapes, and present critical risks [2] [13].

Magnetic discs

Magnetic discs, or hard discs, are a very competitive storage media: while the storage capacity, the security and reliability of this media improve each year, the cost per Gb keeps decreasing. And when combined in disks arrays, with RAID or MAID features, they are an excellent – and reliable – alternative to magnetic tapes for long-term storage.

RAID (Redundant Array of Independent Disks) allows improving the access time, increasing the MTBF and faulting tolerance. Some RAIDs have error checking and correction features. Because of the redundancy of the data, the storage capacity is limited, which also inhibits the advantage related to cost per Gb.

MAID (Massive array of idle disks) switches off the disks which are not used, which prolongs the life of the disks and lowers the power consumption. The cost per Terabyte is roughly equivalent to that of tape. It uses SATA drives, which has shorter MTBF ratings (it has low rotational speed, and so low acoustic noise), and the disks are periodically tested. Because of the non-redundancy, the storage density is greater than for RAIDs.

  • Longevity: following Moore [9] hard disks have a life expectancy of 3 to 6 years, which can be extended to more than 10 years with the use of MAID systems [6]. Even if hard disks (especially SATA drives, which have a better reliability) can be considered as reliable (very low MTBF and AFR), there are however more probabilities of system malfunctions than for magnetic tapes, because of the more complex functioning. Especially RAID systems involve significant computation.
  • Capacity: some hard drives can store 1 Tb
  • Viability: RAID and MAID have error detection and WORM features
  • Obsolescence: the main principles of the system doesn’t evolve a lot, but although too fast from the archivist point of view, and faster than tapes.
  • Cost: the cost per Gb is very low, and keeps decreasing. But data tapes are still cheaper for mass storage. MAID systems take a lot of space, but they don’t need a huge cooling system. The best storage capacity for a given price is offered by SATA drives [6].
  • Susceptibility: hard disks don’t need as much environmental control than data tapes, which also has an influence on the costs.

In addition, the access time is faster than tapes, and the retrieval is better. But for archival purpose, with no need for fast access 24/7, and with a big amount of data, disk drives exceed the needs, and storing the totality of the archives on disk arrays is not cost efficient. [16] points out that RAIDs are costly but of little value in digital preservation: “They provide high availability, but spending heavily to improve availability is hard to justify for systems such as dark archives where the probability of a user access during the recovery time from a disk failure is low”.

Conclusion: the advantages of HDD would make it a good backup system (for short term retention and fast access needed), but not for archives, where data tapes are a more cost effective solution [13].

We should however notice that some institutions have chosen HDD for permanent storage. For example, the British Library has chosen Magnetic Disks for the preservation storage of its digital objects (but without explaining the reason of this choice [11]). And it is interesting to note that EMC, one of the leaders and specialists of back-up and archives solutions, only provides disk arrays solutions and advocates against tape libraries. The advantages the manufacturer offers are not adapted to the needs of the project, but it could indicate a new trend…

When used as archival storage, hard disks should be used in MAID systems with SATA interfaces.

Magnetic tapes

We won’t speak about digital video tape formats (Digital betacam, DVC Pro, etc.). Those systems are dependent from an encoding, which has its specific internal compression, and it is more expensive (almost 10 times more than data tapes [12]). Moreover, [5] and [1] recommend to preserve objects and metadata in a media-neutral form on data tapes, of the same kind as the tapes used in IT services for back-up: this kind of solution allows easy migration, low storage costs, integration in tape libraries, and interaction with disk arrays storage solutions. [3] too says that any digital solution should be based on open standards and automated systems in order to easily overcome technical change; now off-line tapes and discrete storage don’t facilitate the management and preservation of the data content in the face of change, as it is explained above.

[18] makes a typology of magnetic tape data storage following the performance (transfer rate and capacity) of the different systems.

  • High performance: not used for network storage applications (too slow). Some instances are DTF (SONY) and its DTF based tape library: PetaSite, and also IBM (3590 Magstar series).
  • Middle range: LTO (IBM, Seagate, HP), S-DLT (Quantum, Tanberg), S-AIT (SONY). S-AIT is the most performing and evolutive system, but it is not an open standard. Moreover, the technology (helical recording) is only developed by SONY, so though it is very reliable, one could wonder whether this format will still be developed for years; S-DLT stays at the leading edge of performance (capacity and transfer rate) and has a very long term behavior [13]; but the de facto standard for mid-range back-up and archive solutions is now LTO, with 50% of the market [13] [9]. It has the advantage to offer error detection features, to be backward compatible over two generations [1], and to be cross-platform.
  • Fast access: limited storage capacity, but quick access time.
  • General purpose: the only advantage is the cost, so used for small companies, with a small amount of data.

One can also add “high ends” tape drives (by SUN and IBM), which are very performing and reliable, but more expensive, and proprietary [13].

As long-term mass storage solution, the data tapes are the most widely used solution, for the following reasons:

  • Longevity: one of the main strengths of data tapes is their long shelf life. Following ([14], the life expectancy of magnetic tapes like S-DLT exceeds 100 years under controlled environment, while [9] and [6] speak about 10 to 30 years, which, in any case, overcomes the problems linked to technological obsolescence.
  • Capacity: LTO tapes have now (gen4) a 800 Tb capacity; with 2:1 compression, one can store 1.6 Tb, which is more than most of hard drives; but this compression, although being lossless, cannot be applied to audiovisual collections, because the very generic algorithm isn’t efficient for this kind of data [8].
  • Viability: LTO tapes have error detection and WORM features. Tapes are monitored and replaced when necessary. It is a very reliable carrier: the data tapes are used in IT contexts, in the aim of preserving critical information in records management contexts, so it is certainly adapted for long term (or middle term) conservation [1].
  • Obsolescence: following [6], archival tapes have a total format life (for reading) of about 6 years, due to the fact that they are backward compatible. Moreover, compared to hard disks, tapes are a mature technology (“The handling procedures have been developed to perfection over half a century, and the knowledge about best storage conditions far exceeds experiences with newer media”, say [1]. Four years later, this assessment should be put into perspective, and as exposed above, disks arrays are now a reliable technology). Data tapes are the most used archival storage worldwide: 81% [6], which ensures to it a long development.
  • Cost: tapes have the lowest cost per Gigabyte ratio, don’t consume power when not used, and have a high areal density. Moreover, the material (robotics) can be reused for several generations of tapes.
  • Susceptibility: data tapes need a strong environmental control.

In addition, data tapes are removable.

The slow access time compared to hard disks is not a handicap, for access is usually not made from the tape system, and because fast access is not the priority in a conservation project.

Conclusion: even if challenged by HDD, tapes remain the most chosen solution for archives, because of the areal density and the cost. Among the different tape standards, LTO seems to be the most appropriate one, because it is an open standard, and it is very dominant on the market [9].

Comparison

This comparison only applies for the master archive, with preservation needs, and not access needs. It doesn’t go into the details of each possible manufacturer, and only gives a general and indicative overview of the different medias. For example, the costs are evaluated only in $/Gb, but don’t include Total Cost of Ownership (TCO) (with maintenance, room requirements, and the fact that the data is duplicated in RAID systems, so that there is much more data to store).

For the comparison, we use the LTO-4 tapes (which are the recommended tape for archival purpose), and an approximation of an “average” hard disk drive. We don’t use optical disks, because they are not a good archival medium a priori.

 

Media LTO HDD
Longevity (years) 10 to 30 3 to 6
Capacity (GB, uncompressed) 800 Up to 1000
Viability OK OK
Obsolescence (useful life, with no consideration on the need to change when storage capacity and performance grow up. In years) 10 to 15 2 to 10
Cost ($/GB) [9] 0.30 to 2.50 2 to 10
Susceptibility High Low

 

As we can see, the competition between the tapes and hard drives for archival purpose is very close. But the advantages of tapes are more relevant in a long term preservation context: low power consumption, mature technology, cost, market share, etc. Data tape is seen by curators as the only proven long-term storage medium.

Conclusion

How to choose between hard disks and magnetic tapes?

It is important to say that both solutions are complementary, because there will be two repositories: the master archive and the secondary archive, with two different needs. On the opposite, there isn’t any system that allows longevity without failure. So the usual solution is to mix:

  • HDD for speed, input/output catching, and most used items (secondary archive).
  • Tape for cheapest mass storage, disk back-up and redundancy (master archive).
  • Optical for distribution [13] (p. 24).

And here are some principles about the architecture that emerge from the literature:

  • Triple redundant system: there should be at least three different entities of each object to ensure recovery in case of data lost, one of them should be remote.
  • The data should be stored on two different carriers (tapes and hard disks), in order to ensure a better reliability. It is also recommended to make backups on tapes coming from different lots, which reduces the risk of manufacturing defects.
  • In order to allow easy migration, the hardware and the software should be independent from each other.
  • Multi-vendor system overshadows the potential risks in integration (when an equipment doesn’t fit the needs anymore or becomes obsolete, it can be replaced without replacing the whole system), and avoids vendors lock-in [10] [16]. The different modules (storage, servers, software, infrastructure) have to be integrated by an architect – system integrator (most of the vendors can act as system integrators).
  • One should use only non-proprietary systems: each of the vendors has to have a system that is open enough for us to leverage them in various ways into the solution [10].

But the final choice of the appropriate manufacturer of the different devices will depend on the performance expected (storage capacity, access time), and on the costs of the different alternatives.

Bibliography

All resources accessed on December 16, 2008

 

General references

AHDS. AHDS Digital Preservation Bibliography. In: Arts and Humanities Data Service (AHDS) [online]. Last update March 1st 2007. http://ahds.ac.uk/preservation/bibliography.htm

DCC. Digital Curation Center [on-line]. Last modified April 25 2008. http://www.dcc.ac.uk/

NATIONAL LIBRARY OF AUSTRALIA. PADI Preserving Access to Digital Information. In: National Library of Australia [online]. S.d. http://www.nla.gov.au/padi/

PRESERVATION DEPARTMENT OF STANDFORD UNIVERSITY LIBRARIES AND INFORMATION RESOURCES. Conservation Online: Resources for Conservation Professionals. In: Conservation OnLine (CoOL) [online]. Last updated 24.05.2007. http://palimpsest.stanford.edu/

Bibliography

[1] AHAMER, Julia. PAVUZA, Franz. Linear Uncompressed Video Archiving on High Performance Computer Tapes. In: JTS 2004 [online]. 2004. http://www.jts2004.org/english/proceedings/Ahamer.htm

[2] BRADLEY, Kevin. Risks Associated with the Use of Recordable CDs and DVDs as Reliable Storage Media in Archival Collections – Strategies and Alternatives [online] Paris: Unesco, 2006. 31 p. http://unesdoc.unesco.org/images/0014/001477/147782E.pdf

[3] BRADLEY, Kevin. LEI, Junran. BLACKALL, Chris. Towards an Open Source Repository and Preservation System: Recommendations on the implementation of an Open Source Digital Archival and Preservation System and on Related Software Development [online]. Paris: Unesco, 2007. 34 p. http://portal.unesco.org/ci/en/ev.php-URL_ID=24700&URL_DO=DO_TOPIC&URL_SECTION=201.html

[4] BROWN, Adrian. Selecting Storage for Long-Term Preservation [online]. [London]: The National Archives, 2003 (Digital preservation guidance note 2). 7 p. http://www.nationalarchives.gov.uk/documents/selecting_storage_media.pdf

[5] CEDARS. Cedars guide to: Digital Preservation Strategies. In Cedars [online]. Last version 2nd April 2002. http://www.leeds.ac.uk/cedars/guideto/dpstrategies/dpstrategies.html

[6] COUGHLIN, THOMAS M. Archiving in the Entertainment and Professional Media Market [online]. Coughlin Associates: 2008. 17 p. http://www.tomcoughlin.com/Techpapers/Archiving%20in%20the%20Entertainment%20and%20Media%20Market%20%20Report,%20final,%20021808.pdf

[7] EPFL. Montreux Jazz Festival Digital Archive Project: A unique and first of a kind high resolution digital archive of the Montreux Jazz Festival. October 2008

[8] GILMOUR, Ian. DAVILA, Justin R. Lossless Video Compression for Archives: Motion JPEG2k and Other Options [online]. Media Matters, 2006. 8 p. http://www.media-matters.net/docs/WhitePapers/WPMJ2k.pdf

[9] HORISON. Horison Information Strategies [online]. Last updated June 2008. http://www.horison.com/index.shtml

[10] JOHNSTON, Craig. Fox News Goes Tapeless: Network’s News and Business Channels Install End-to-End Digital Workflow. In: TV Technology.com [on-line]. June 25th 2008. http://www.tvtechnology.com/pages/s.0082/t.14187.html

[11] LINDEN, Jim. MARTIN, Sean. MASTERS, Richard. PARKER, Roderick. Technology Watch Report: The large-scale archival of digital objects [online]. London: British Library, 2005 (DPC Technology Watch Series Report 04-03). 20 p. http://www.dpconline.org/docs/dpctw04-03.pdf

[12] LINDNER, Jim. A Methodology for Determining the Economics of Digital Storage Costs for Archives. 2008

[13] MOREIRA, Fernando. Storage: Ten years Forecast of Storage Evolution [online]. Prestospace, 2006 (Deliverable D12.5 Storage Forecast). 25 p. http://www.prestospace.org/project/deliverables/D12-5.pdf

[14] NAVALE, Vivek. Predicting the Life Expectancy of Modern Tape and Optical Media. In: RLG Diginews [online]. Vol. 9 n°4 (2005). http://digitalarchive.oclc.org/da/ViewObject.jsp?objid=0000068919&reqid=27073#article3

[15] PRESTOSPACE. Preservation towards storage and access: Standardised Practices for Audiovisual Contents in Europe. In: Prestospace [online]. Last updated 19.12.2007 http://www.prestospace.org/index.en.html

[16] ROSENTHAL, David S. H. ROBERTSON, Thomas. LIPKIS, Tom. REICH, Vicky. MORABITO, Seth. Requirements for Digital Preservation Systems: A Bottom-Up Approach. In: D-Lib Magazine [online]. Volume 11 Number 11, November 2005. http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html

[17] ROSENTHALER, Lukas. GSCHWIND, Rudolf. Distarnet: A Distributed Archival Network. In: Distarnet [online]. 2004. http://www.distarnet.ch/IS_T-distarnet.pdf

[18] SADASHIGE, Koichi. Data Storage Technology Assessment – 2002: Projections through 2010 [online]. National Technology Alliance,  2003. 80 p. http://www.imation.com/government/nml/pdfs/AP_NMLdoc_DSTAssessment.pdf