Get the lowest-cost and the best server colocation service in the business. Learn more.
Information Technology News.

Bell Labs scientists have put together a cloud storage system

Share on Twitter.

Get the most reliable SMTP service for your business. You wished you got it sooner!

Click here to order the best deal on a HP enterprise dedicated server and at a great price.

August 10, 2015

Scientists from Bell Labs and Stony Brook University have built a cloud storage system they hope can serve as a reference design for future cloud implementations in the IT industry.

Called SEARS (Space Efficient and Reliable Storage), the R&D has been published on the Arxiv website.

Overall, the researchers argue that what's expected from cloud storage is easy to articulate-– reliability, interactive user access, global coverage and good response times. Of course, that's easier said than done in the real world.

Added to that is the inevitable tradeoff between space and efficiency. For example, RAID solutions are space efficient, but computationally very demanding.

And for its part, GFS has lower computational requirements, but its file duplication needs more space, and there's not much that can be done to change that, at least for now.

Hence the SEARS concept, a combination of deduplication techniques and erasure coding that can be configured for either fast file access, or high storage efficiency, ot both, with some compromise, of course.

The way it works is this-- on the upload side, the client chunks the file and generates metadata which it sends to the “switching node” (a server node designated to that user).

The switching node first checks the file metadata to identify unique chunks, and only those not already in storage are uploaded.

On the retrieval side, the user device receives unique chunks from multiple storage nodes for high performance.

The researchers acknowledge that in a content de-duplication scenario, there's a trade-off to be made in choosing the size of the data chunk itself.

If chunks are too big, there's less chance of a hit during the de-dupe process, while smaller chunks lead to less efficient random access pattern.

With the SEARS solution, chunks are between 1 KB and 8 KB (with an average of 4 KB), and 160-bit SHA-1 hashing gives a fixed-size value as the chunk ID.

During file storage, file metadata is created containing the chunk IDs in the file, and an ID for the storage cluster holding the chunk.

So as to let system admins make their performance/efficiency choices, there are two binding schemes offered:

  • Chunk-level binding: for archival storage running in the background. Chunk-level binding is designed to maximise system-wide de-duplication, such that storage space of all clusters are evenly consumed as time passes.

  • User-level binding: for applications focussing on performance, this concept binds users to particular clusters. This sacrifices system-wide de-duplication efficiency for fast file retrieval, however.
  • The new system was tested across ten Amazon EC2 instances. On ten machines in the eastern U.S., the researchers claimed to achieve 2.5 second retrieval of a 3 MB file, compared to 7 seconds from an ordinary Amazon EC2 instance.

    Source: Bell Labs.

    Get the most dependable SMTP server for your company.

    Share on Twitter.

    IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

    All logos, trade marks or service marks on this site are the property of their respective owners.

    Sponsored by Sure Mail™, Avantex and
    by Montreal Server Colocation.

           © IT Direction. All rights reserved.