Bell Labs scientists have put together a cloud storage system
Share on Twitter.
Get the most reliable SMTP service for your business. You wished you got it sooner!
August 10, 2015
Scientists from Bell Labs and Stony Brook University have built a cloud storage system they
hope can serve as a reference design for future cloud implementations in the IT industry.
Called SEARS (Space Efficient and Reliable Storage), the R&D has been published on the Arxiv
Overall, the researchers argue that what's expected from cloud storage is easy to articulate-–
reliability, interactive user access, global coverage and good response times. Of course, that's
easier said than done in the real world.
Added to that is the inevitable tradeoff between space and efficiency. For example, RAID solutions
are space efficient, but computationally very demanding.
And for its part, GFS has lower computational requirements, but its file duplication needs more
space, and there's not much that can be done to change that, at least for now.
Hence the SEARS concept, a combination of deduplication techniques and erasure coding that can
be configured for either fast file access, or high storage efficiency, ot both, with some
compromise, of course.
The way it works is this-- on the upload side, the client chunks the file and generates metadata
which it sends to the “switching node” (a server node designated to that user).
The switching node first checks the file metadata to identify unique chunks, and only those not
already in storage are uploaded.
On the retrieval side, the user device receives unique chunks from multiple storage nodes for high
The researchers acknowledge that in a content de-duplication scenario, there's a trade-off to be
made in choosing the size of the data chunk itself.
If chunks are too big, there's less chance of a hit during the de-dupe process, while smaller
chunks lead to less efficient random access pattern.
With the SEARS solution, chunks are between 1 KB and 8 KB (with an average of 4 KB), and 160-bit
SHA-1 hashing gives a fixed-size value as the chunk ID.
During file storage, file metadata is created containing the chunk IDs in the file, and an ID for
the storage cluster holding the chunk.
So as to let system admins make their performance/efficiency choices, there are two binding schemes
Chunk-level binding: for archival storage running in the background. Chunk-level binding is designed
to maximise system-wide de-duplication, such that storage space of all clusters are evenly consumed as time
User-level binding: for applications focussing on performance, this concept binds users to
particular clusters. This sacrifices system-wide de-duplication efficiency for fast file retrieval,
The new system was tested across ten Amazon EC2 instances. On ten machines in the eastern U.S., the researchers
claimed to achieve 2.5 second retrieval of a 3 MB file, compared to 7 seconds from an ordinary Amazon EC2
Source: Bell Labs.
Get the most dependable SMTP server for your company.
Share on Twitter.