now.cs.berkeley.edu/Td -> now.cs.berkeley.edu/Td/
In the past 5 years, the cost performance gap between secondary and terti ary storage has been widening. The cost per megabyte of disk drives has been falling at a factor of 2 per year, compared to 15 per year for tap e drives and libraries. Disk areal densities have been increasing at 60% per year, with 8 GB 35 inch disk units currently available. Data rates have also been increasing at rates of 40% per year, expected to pass 40 MB/s by the end of the decade. These trends change the possibilities in large scale storage systems. If they continue, large storage systems co mposed of disks will have significant cost/performance advantages over t ape libraries of similar capacity. Applications such as databases, video on demand, medical data and web arc hival have a need for storage systems which are high performance as well as high capacity. The solution used in most cases is a hierarchy of a d isk array and tape library. However, disk arrays have drawbacks in terms of cost/performance, availability, and scalability. Due to custom hardw are, the cost per megabyte of RAID disk arrays increases with system cap acity, unlike raw disks and tape systems. Also, a disk array needs to be connected to a host computer, which becomes a bottleneck for both perfo rmance and availability. Its scalability is limited by the number of dis ks that can be supported by the infrastructure. Some storage consuming a pplications like web archival have a fixed growth rate of data. When suc h applications reach the capacity limit of their disk array, another arr ay must be added. Adding independent disk arrays also lowers the reliabi lity of the total system and complicates storage management. Tertiary Disk is a storage system architecture which exploits the trends mentioned above to create large disk storage systems that avoid the disa dvantages of custom built disk arrays. The name comes from twin goals: t o have the cost per megabyte and capacity of tape libraries and the perf ormance of magnetic disks. We use commodity, off the shelf components to develop a scalable, low cost, terabyte capacity disk system. Our target is to build a complete storage system with about 30-50% extra to the co st of the raw disk. Tertiary Disk uses PCs connected by a switched netwo rk to host a large number of disks. Our prototype consists of 20 200MHz PC PCs, which host 370 8GB disks. The PCs, running FreeBSD, are connecte d through a 100Mbps Ethernet switch.
|