|
Research
Abstracts - 2007 |
The Gordon Project: High-Performance Relational Databases
on Flash Memory
Daniel Myers, Samuel Madden & Barbara LiskovNAND flash memory (flash memory) is a form of non-volatile, solid-state memory commonly used for data storage in consumer devices such as music players and digital cameras. Advances in manufacturing and economies of scale have increased the capacity and decreased the price of flash memory chips to the point where manufacturers are offering IDE-compatible disk-like storage units capable of storing dozens of gigabytes at reasonable prices. The capacity of these units will only increase, and we are thus approaching a cost-per-gigabyte at which flash disks can become a viable storage medium for large-scale relational database management systems. Flash memory has a number of features that make it particularly attractive as an RDBMS storage device. In particular, while flash disks are block devices like conventional magnetic disks, the "seek time" required to begin accessing a block is considerably smaller than the seek time for a magnetic disk: on the order of microseconds instead of milliseconds. Additionally, flash disks generate less heat and consume less power than magnetic media (which decreases data centers power requirements) and are considerably more resistant to damage from environmental stresses. On the other hand, flash disks present a number of challenges to database designers. First, the devices have restricted support for overwrites. Flash disks have a two-level hierarchical structure in which the device is partitioned into "erase blocks" (typically ~128kB each) which are further divided into pages (typically ~2kB each). Writes to pages can only clear bits, and bits may only be set by erasing the entire erase block containing the page. Additionally, each page may only be written a limited number of times (e.g., 4) before the containing erase block must be erased, and each erase block is only guaranteed to function for a limited number of erase cycles (typically 1,000,000-10,000,000) before it fails and its pages become unusable. Finally, unlike as in conventional disks, writes to flash disks are significantly slower than reads. In short, flash disks provide new opportunities for high-performance DBMS design. However, techniques optimized to work on conventional magnetic disks may fail to take advantage of the shorter seek times provided by flash or to compensate for the devices' write limitations and failure characteristics. The Gordon Project explores this new design space. In particular, we are investigating four areas of query processing in RDBMS and exploring how they can be best optimized for this new storage medium:
|
||||
|