CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

The Gordon Project: High-Performance Relational Databases on Flash Memory

Daniel Myers, Samuel Madden & Barbara Liskov

NAND flash memory (flash memory) is a form of non-volatile, solid-state memory commonly used for data storage in consumer devices such as music players and digital cameras. Advances in manufacturing and economies of scale have increased the capacity and decreased the price of flash memory chips to the point where manufacturers are offering IDE-compatible disk-like storage units capable of storing dozens of gigabytes at reasonable prices. The capacity of these units will only increase, and we are thus approaching a cost-per-gigabyte at which flash disks can become a viable storage medium for large-scale relational database management systems.

Flash memory has a number of features that make it particularly attractive as an RDBMS storage device. In particular, while flash disks are block devices like conventional magnetic disks, the "seek time" required to begin accessing a block is considerably smaller than the seek time for a magnetic disk: on the order of microseconds instead of milliseconds. Additionally, flash disks generate less heat and consume less power than magnetic media (which decreases data centers power requirements) and are considerably more resistant to damage from environmental stresses.

On the other hand, flash disks present a number of challenges to database designers. First, the devices have restricted support for overwrites. Flash disks have a two-level hierarchical structure in which the device is partitioned into "erase blocks" (typically ~128kB each) which are further divided into pages (typically ~2kB each). Writes to pages can only clear bits, and bits may only be set by erasing the entire erase block containing the page. Additionally, each page may only be written a limited number of times (e.g., 4) before the containing erase block must be erased, and each erase block is only guaranteed to function for a limited number of erase cycles (typically 1,000,000-10,000,000) before it fails and its pages become unusable. Finally, unlike as in conventional disks, writes to flash disks are significantly slower than reads.

In short, flash disks provide new opportunities for high-performance DBMS design. However, techniques optimized to work on conventional magnetic disks may fail to take advantage of the shorter seek times provided by flash or to compensate for the devices' write limitations and failure characteristics. The Gordon Project explores this new design space. In particular, we are investigating four areas of query processing in RDBMS and exploring how they can be best optimized for this new storage medium:

  1. External memory joins. When a database system computes the join of two relations, if both relations cannot be held in main memory simultaneously, intermediate results must be written to disk. We are exploring new ways to structure these intermediate results. For example, we have found that writing subsets of tuples as intermediate results (instead of entire tuples as is currently done) and paying a reaccess cost to retrieve complete tuples that pass the join is an effective way to trade the cheap random reads provided by flash memory for expensive writes.
  2. Index structure design. One advantage of the fast random reads provided by flash memory is that secondary database indexes, which contain pointers to data rather than data themselves, become considerably more useful than they are with conventional disks. We intend to characterize the impact of these new capabilities on query planning. In addition, we are investigating new index structure designs that are more compatible with the access characteristics of flash memory: preliminary experiments have shown the standard B-Tree structure to exhibit 40% flash-related overhead on realistic workloads.
  3. Recovery. One of the signature features of database systems is that they provide transactions and guarantee that in the event of a crash, the system can be brought back to a consistent state. We intend to explore new mechanisms for storing recovery information to achieve higher performance on flash memory.
  4. Hybrid flash/magnetic systems. Magnetic disk will continue to have a lower cost per byte than flash memory for the foreseeable future. We plan to investigate methods for automatically migrating portions of database from magnetic disk to flash memory to boost performance in systems where the database can not be economically stored entirely on flash memory.

 

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu