Network Working Group R. Alimi Internet-Draft Yale University Intended status: Informational Z. Lu Expires: June 20, 2010 Fudan University H. Song Huawei Y. Yang Yale University December 17, 2009 A Survey of In-network Storage Systems draft-song-decade-survey-02 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on June 20, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Alimi, et al. Expires June 20, 2010 [Page 1] Internet-Draft DECADE Survey December 2009 Abstract This document describes existing in-network storage systems and their applicability for DECADE. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Survey Overview . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Terminology and Concepts . . . . . . . . . . . . . . . . . 5 2.2. Historical Context . . . . . . . . . . . . . . . . . . . . 5 2.3. In-network Storage System Components . . . . . . . . . . . 7 2.3.1. Data Access Interface . . . . . . . . . . . . . . . . 7 2.3.2. Data Management Operations . . . . . . . . . . . . . . 7 2.3.3. Data Search Capability . . . . . . . . . . . . . . . . 7 2.3.4. Access Control Authorization . . . . . . . . . . . . . 7 2.3.5. Resource Control Interface . . . . . . . . . . . . . . 7 2.3.6. Discovery Mechanism . . . . . . . . . . . . . . . . . 8 2.3.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . 8 3. P2P Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1. Transparent P2P Caches . . . . . . . . . . . . . . . . . . 8 3.1.1. Data Access Interface . . . . . . . . . . . . . . . . 9 3.1.2. Data Management Operations . . . . . . . . . . . . . . 9 3.1.3. Data Search Capability . . . . . . . . . . . . . . . . 9 3.1.4. Access Control Authorization . . . . . . . . . . . . . 9 3.1.5. Resource Control Interface . . . . . . . . . . . . . . 9 3.1.6. Discovery Mechanism . . . . . . . . . . . . . . . . . 9 3.1.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . 9 3.2. Non-transparent P2P Caches . . . . . . . . . . . . . . . . 9 3.2.1. Data Access Interface . . . . . . . . . . . . . . . . 9 3.2.2. Data Management Operations . . . . . . . . . . . . . . 10 3.2.3. Data Search Capability . . . . . . . . . . . . . . . . 10 3.2.4. Access Control Authorization . . . . . . . . . . . . . 10 3.2.5. Resource Control Interface . . . . . . . . . . . . . . 10 3.2.6. Discovery Mechanism . . . . . . . . . . . . . . . . . 10 3.2.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . 10 4. Web Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1. Data Access Interface . . . . . . . . . . . . . . . . . . 11 4.2. Data Management Operations . . . . . . . . . . . . . . . . 11 4.3. Data Search Capability . . . . . . . . . . . . . . . . . . 11 4.4. Access Control Authorization . . . . . . . . . . . . . . . 11 4.5. Resource Control Interface . . . . . . . . . . . . . . . . 11 4.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 11 4.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 11 5. CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.1. Data Access Interface . . . . . . . . . . . . . . . . . . 12 5.2. Data Management Operations . . . . . . . . . . . . . . . . 12 Alimi, et al. Expires June 20, 2010 [Page 2] Internet-Draft DECADE Survey December 2009 5.3. Data Search Capability . . . . . . . . . . . . . . . . . . 13 5.4. Access Control Authorization . . . . . . . . . . . . . . . 13 5.5. Resource Control Interface . . . . . . . . . . . . . . . . 13 5.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 13 5.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 13 5.8. Comments . . . . . . . . . . . . . . . . . . . . . . . . . 13 6. NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.1. Data Access Interface . . . . . . . . . . . . . . . . . . 13 6.2. Data Management Operations . . . . . . . . . . . . . . . . 13 6.3. Data Search Capability . . . . . . . . . . . . . . . . . . 14 6.4. Access Control Authorization . . . . . . . . . . . . . . . 14 6.5. Resource Control Interface . . . . . . . . . . . . . . . . 14 6.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 14 6.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 14 6.8. Comments . . . . . . . . . . . . . . . . . . . . . . . . . 14 7. Amazon S3 . . . . . . . . . . . . . . . . . . . . . . . . . . 14 7.1. Data Access Interface . . . . . . . . . . . . . . . . . . 15 7.2. Data Management Operations . . . . . . . . . . . . . . . . 15 7.3. Data Search Capability . . . . . . . . . . . . . . . . . . 15 7.4. Access Control Authorization . . . . . . . . . . . . . . . 15 7.5. Resource Control Interface . . . . . . . . . . . . . . . . 15 7.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 15 7.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 15 8. OceanStore . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.1. Data Access Interface . . . . . . . . . . . . . . . . . . 16 8.2. Data Management Operations . . . . . . . . . . . . . . . . 16 8.3. Data Search Capability . . . . . . . . . . . . . . . . . . 16 8.4. Access Control Authorization . . . . . . . . . . . . . . . 16 8.5. Resource Control Interface . . . . . . . . . . . . . . . . 16 8.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 16 8.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 16 9. Cache-and-Forward Architecture . . . . . . . . . . . . . . . . 16 9.1. Data Access Interface . . . . . . . . . . . . . . . . . . 17 9.2. Data Management Operations . . . . . . . . . . . . . . . . 17 9.3. Data Search Capability . . . . . . . . . . . . . . . . . . 17 9.4. Access Control Authorization . . . . . . . . . . . . . . . 17 9.5. Resource Control Interface . . . . . . . . . . . . . . . . 17 9.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 17 9.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 17 10. Network Traffic Redundancy Elimination . . . . . . . . . . . . 17 10.1. Data Access Interface . . . . . . . . . . . . . . . . . . 18 10.2. Data Management Operations . . . . . . . . . . . . . . . . 18 10.3. Data Search Capability . . . . . . . . . . . . . . . . . . 18 10.4. Access Control Authorization . . . . . . . . . . . . . . . 18 10.5. Resource Control Interface . . . . . . . . . . . . . . . . 18 10.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 18 10.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 18 11. BranchCache . . . . . . . . . . . . . . . . . . . . . . . . . 18 Alimi, et al. Expires June 20, 2010 [Page 3] Internet-Draft DECADE Survey December 2009 11.1. Data Access Interface . . . . . . . . . . . . . . . . . . 19 11.2. Data Management Operations . . . . . . . . . . . . . . . . 19 11.3. Data Search Capability . . . . . . . . . . . . . . . . . . 20 11.4. Access Control Authorization . . . . . . . . . . . . . . . 20 11.5. Resource Control Interface . . . . . . . . . . . . . . . . 20 11.6. Discovery Mechanism . . . . . . . . . . . . . . . . . . . 20 11.7. Storage Mode . . . . . . . . . . . . . . . . . . . . . . . 20 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 20 13. Security Considerations . . . . . . . . . . . . . . . . . . . 21 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 16.1. Normative References . . . . . . . . . . . . . . . . . . . 21 16.2. Informative References . . . . . . . . . . . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 Alimi, et al. Expires June 20, 2010 [Page 4] Internet-Draft DECADE Survey December 2009 1. Introduction DECADE (DECoupled Application Data Enroute) is an architecture that provides applications with access to in-network storage. A major motivation for DECADE is the substantial increase on capacity and reduction in cost offered by storage systems. In particular, over the last decade, capacity of solid-state storage has increased 100-fold, while cost dropped to $50/GB; capacity of magnetic storage devices has increased 100-fold, while cost dropped to $0.50/GB. High-capacity and low-cost in-network storage devices introduce substantial opportunities. One example of in-network storage is content caches supporting Web and P2P content. Different from existing content caches whose control fully reside at the owners of the caching devices, DECADE also allows applications to control access to their allocated in-network storage, as well as the resources consumed while accessing that storage (bandwidth, connections, storage space). While designed in the context of P2P applications, it may be useful to other applications as well. This document provides details on existing in-network storage solutions, and evaluates their suitability for DECADE. We note that the techniques presented in this section are only representative of the research in this area. Rather than trying to enumerate an exhaustive list, we have chosen some typical techniques that lead to derivative works. 2. Survey Overview 2.1. Terminology and Concepts The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. This document uses terms defined in [I-D.song-decade-problem-statement]. 2.2. Historical Context In-network storage has been used previously in numerous scenarios to reducing network traffic and enable more efficient content distribution. Systems have been developed with particular use cases in mind. Thus, this survey is not meant to point out shortcomings of existing solutions, but rather to indicate where certain capabilities required in DECADE are not provided by existing systems. Alimi, et al. Expires June 20, 2010 [Page 5] Internet-Draft DECADE Survey December 2009 In the early stage of Internet development, most Web content was stored at a central server and clients requested Web content from the central server. In this architecture, the central server was required to provide a large amount of bandwidth. Web browsing is still a primary activity on today's Internet. As more and more users access Web content, a central server can become overloaded. The use of web caches is one technique to reduce load on a central server. Web caches store frequently-requested content, and provide bandwidth for serving the content to clients. The ongoing growth of broadband technology in the worldwide market has been driven by the hunger of customers for new multimedia services as well as Web content. In particular, the use of audio and video streaming formats has become common for delivery of rich information to the public - both residential and business. To overcome this challenge of massive multimedia consumption, only installing more Web cache will not be enough. Moving content closer to the consumer results in greater network efficiency, improved QoS, and lower latency, while facilitating personalization of content through broadband content applications. In these edge technologies, CDN is a representative technique. Content Delivery Networks (CDN) is based on a large-scale distributed network of servers located closer to the edges of the Internet for efficient delivery of digital content including various forms of multimedia content. Although CDN is an effective means of information access and delivery, there are two barriers to making CDN a more common service: cost and replication integrity. Deploying a CDN for publicly available content is expensive. It requires administrative control over nodes with large storage capacity at geographically dispersed locations with adequate connectivity. CDN can be scalable, but due to this administrative and cost overhead, not rapidly deployable for the common user. The emergence and maturity of Peer to Peer (P2P) has allowed improvements to many network applications. P2P allows the use of client resources, such as CPU, memory, storage, and bandwidth, for serving content. This can reduce the amount of resources required by a content provider. Multimedia content delivery using various peer- to-peer or peer-assisted frameworks has been shown to greatly reduce the dependence on CDN and central content servers. However, popularity of P2P applications has resulted in increased traffic on ISP networks. DECADE aims to provide a standard protocol allowing P2P applications (including Content Providers) to make use of in-network storage to reduce the traffic burden on ISP networks, while enabling P2P Alimi, et al. Expires June 20, 2010 [Page 6] Internet-Draft DECADE Survey December 2009 applications to control access to content they have placed in in- network storage. 2.3. In-network Storage System Components Before surveying individual technologies, we describe the basic components of in-network storage systems used to evaluate them in the context of DECADE. Note that the network protocol(s) used by a storage system are also an important part of the design. We omit details of particular protocol choices in the current version of this document. 2.3.1. Data Access Interface A set of operations are available to a user for accessing data in the in-network storage. Solutions typically allow both read and write, though the mechanisms for doing so can differ drastically. 2.3.2. Data Management Operations Storage systems may provide users the ability to manage stored content. For example, operations such as delete and move can be provided to users. In this survey, we focus on data management operations that are provided to client users and omit those provided to system administrators. 2.3.3. Data Search Capability Some storage systems may provide the capability to search or enumerate content that has been stored. In this survey, we focus on search capabilities that are provided to client users and omit those provided to system administrators. 2.3.4. Access Control Authorization A user is able to authorize individual users to retrieve the content stored on its In-network storage. In-network storage can check the authorization of a client before it stores or retrieves content. In- network storage only permits the users with authorization to access the corresponding contents. The admission could be based on user, content, time period, etc. 2.3.5. Resource Control Interface This is the interface through which users manage the resources on in- network storage that can be used by other peers, e.g., the bandwidth or connections. The storage system may also allow users to indicate Alimi, et al. Expires June 20, 2010 [Page 7] Internet-Draft DECADE Survey December 2009 a time for which resources are granted. 2.3.6. Discovery Mechanism Users use the discovery mechanism to find location of in-network storage, find access interface or resource control interface or other interfaces of in-network storage. 2.3.7. Storage Mode The data managed by the in-network storage could be of various types. Example storage modes are file-based, object-based, or block-based. 3. P2P Cache Caching of P2P traffic is a useful approach to reduce P2P network traffic, because objects in P2P systems are mostly immutable and the traffic is highly repetitive . In addition, making use of P2P caches do not require changes to P2P protocols and can be deployed transparently from clients. P2P caches operate similarly to web caches, in that they temporarily store frequently-requested content. Requests for content already stored in the cache can be served from local storage instead of requiring the data to be transmitted over expensive network links. Two types of P2P caches exist: non-transparent P2P caches and transparent P2P caches. A non-transparent cache appears as a super peer; it explicitly peers with other P2P clients. For a transparent cache, once a P2P cache is established, the network will transparently redirect P2P traffic to the cache, which either serves the file directly or passes the request on to a remote P2P user and simultaneously caches that data. Transparency is typically implemented using deep packet inspection (DPI). DPI products identify and pass P2P packets to the P2P caching system so it can cache the traffic and accelerate it. To enable operation with existing P2P software, P2P caches directly support P2P application protocols. A large number of P2P protocols are used by P2P software, and hence are supported by caches, leading to higher complexity. Additionally, these protocols evolve over time, and new protocols are introduced. 3.1. Transparent P2P Caches Alimi, et al. Expires June 20, 2010 [Page 8] Internet-Draft DECADE Survey December 2009 3.1.1. Data Access Interface Data Access Interface allows P2P content to be cached (stored) and supplied (retrieved) locally such that network traffic is reduced, but it is transparent to P2P users, and P2P users implicitly use the data-access interface (in the form of their native P2P application protocol) to store or retrieve content. 3.1.2. Data Management Operations Not provided. 3.1.3. Data Search Capability Not provided. 3.1.4. Access Control Authorization Not provided. 3.1.5. Resource Control Interface Not provided. 3.1.6. Discovery Mechanism Use of Deep Packet Inspection means no discovery mechanism provided to P2P users, it is transparent to P2P users. Since DPI is used to recognize P2P applications private protocols, P2P Cache is getting more and more complicated as the P2P applications keep evolving. 3.1.7. Storage Mode Object-based. Chunks (typically, the unit of transfer amongst P2P clients) of content are stored in the cache. 3.2. Non-transparent P2P Caches 3.2.1. Data Access Interface Data Access Interface allows P2P content to be cached (stored) and supplied (retrieved) locally such that network traffic is reduced. P2P users implicitly store and retrieve from the cache using the P2P application's native protocol. Alimi, et al. Expires June 20, 2010 [Page 9] Internet-Draft DECADE Survey December 2009 3.2.2. Data Management Operations Not provided. 3.2.3. Data Search Capability Not provided. 3.2.4. Access Control Authorization Not provided. 3.2.5. Resource Control Interface Not provided. 3.2.6. Discovery Mechanism Cache pretends to be normal peers to join the P2P overlay network. Other P2P users can find these cache nodes through overlay routing mechanism, just looking them as normal neighbor nodes. 3.2.7. Storage Mode Object-based. Chunks (typically, the unit of transfer amongst P2P clients) of content are stored in the cache. 4. Web Cache Web cache is a well-built technology since the late 1990s, which has been widely deployed by many ISPs to reduce bandwidth consumption and web access latency. A web cache can cache the web documents (e.g., HTML pages, images) between server and client to reduce bandwidth usage, server load, and perceived lag. A web cache server is typically shared by many clients, and stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met. Another form of cache is a client-side cache, typically implemented in web browsers. A client side cache can keep a local copy of all pages recently displayed by browser, and when the user returns to one of these web pages, the local cached copy is reused. A related protocol for P2P applications to use web cache is HPTP (HTTP based Peer to Peer). It proposes to share chunks of P2P files/ streams using HTTP protocol with cache-control headers. Alimi, et al. Expires June 20, 2010 [Page 10] Internet-Draft DECADE Survey December 2009 4.1. Data Access Interface Users explicitly read from a web cache by making requests, but they cannot explicitly write data into it. Data is implicitly stored into the web cache by requesting content that not aleady cached and meets policy restrictions of the cache provider. 4.2. Data Management Operations Not provided. 4.3. Data Search Capability Not provided. 4.4. Access Control Authorization Not provided. 4.5. Resource Control Interface Not provided. 4.6. Discovery Mechanism Web Caches can be transparently deployed between Web Server and Web Clients, employing DPI for discovery. Alternatively, web caches could be explicitly discovered by clients using techniques such as DNS or manual configuration. 4.7. Storage Mode Object based. Web content is keyed within the cache by HTTP Request fields, such as Method, URI, and Headers. 5. CDN Pathan et al. introduced the main idea and function of Content Delivery Networks (CDN) [PR07]. CDN provides services that improve network performance by maximizing bandwidth, improving accessibility and maintaining correctness through content replication. They offer fast and reliable applications and services by distributing content to cache or edge servers located close to users. A CDN has some combination of content-delivery, request-routing, distribution and accounting infrastructure. The content-delivery infrastructure consists of a set of edge servers (also called Alimi, et al. Expires June 20, 2010 [Page 11] Internet-Draft DECADE Survey December 2009 surrogates) that deliver copies of content to end-users. The request-routing infrastructure is responsible to directing client request to appropriate edge servers. It also interacts with the distribution infrastructure to keep an up-to-date view of the content stored in the CDN caches. The distribution infrastructure moves content from the origin server to the CDN edge servers and ensures consistency of content in the caches. The accounting infrastructure maintains logs of client accesses and records the usage of the CDN servers. This information is used for traffic reporting and usage- based billing. In practice, CDN typically host static content including images, video, media clips, advertisements, and other embedded objects for dynamic Web content. A focus for CDNs is the ability to publish and deliver content to end-users in a reliable and timely manner. A CDN focuses on building its network infrastructure to provide the following services and functionalities: storage and management of content; distribution of content among surrogates; cache management; delivery of static, dynamic and streaming content; backup and disaster recovery solutions; and monitoring, performance measurement and reporting. Examples of existing CDNs are Akamai, Limelight, and CloudFront. The following description uses the term Content Provider to refer to the entity purchasing CDN service, and the the term Client to refer to the subscriber requesting content via the CDN from the Content Provider. 5.1. Data Access Interface CDN is typically an internal closed system, and CDN just provide read (retrieve) access interface to clients but they don't provide write(store) access interface to clients. Content provider can access to network edge servers and store content to them, or edge servers retrieve content from content provider, but client nodes just can retrieve content from edge servers. 5.2. Data Management Operations Content Provider can manage the data distributed in different cache nodes, such as moving one hot data from one cache node to another cache node, or deleting one rarely-accessed data in one cache node, but client user nodes have no right to perform these operations. Alimi, et al. Expires June 20, 2010 [Page 12] Internet-Draft DECADE Survey December 2009 5.3. Data Search Capability Content provider can search or enumerate what data each cache node hold, but client user nodes have no right to perform these operations. 5.4. Access Control Authorization Content Providers typically cannot control per-client access to content accessed via a CDN. 5.5. Resource Control Interface Not provided. 5.6. Discovery Mechanism Content providers can directly find internal CDN cache nodes to store content, since they typically have an explicit business relationship. Clients can locate CDN nodes through DNS or other redirection mechanism. 5.7. Storage Mode Mostly using File based Storage Mode, In most cases, CDN cache nodes cache the entire file from content provider, and sometimes they also can only cache some objects,such as file prefix or file suffix. 5.8. Comments 6. NFS The Network File System is designed to allow users to access files over a network in a manner similar to how local storage is accessed. NFS is typically used in local area network or enterprise settings, though changes made in later versions of NFS make it easier to operate over the Internet. 6.1. Data Access Interface Traditional file-system operations such as read, write, and update (overwrite) are provided. 6.2. Data Management Operations Traditional file-system operations such as move and delete are provided. Alimi, et al. Expires June 20, 2010 [Page 13] Internet-Draft DECADE Survey December 2009 6.3. Data Search Capability User has the ability to list contents of directories to find filenames matching desired criteria. 6.4. Access Control Authorization Files and directories can be protected using read, write, and execute permissions for the files owner, group, and the public (others). Extended ACLs can provide additional protections to explicitly allow access to a subset of users and groups. Per-user access control is only provided to users with accounts at the storage server. 6.5. Resource Control Interface While disk space quotas can be configured, it typically limits the total amount of storage allocated to a particular user. User control of bandwidth and connections used by remote peers is not provided. 6.6. Discovery Mechanism Manual configuration is typically used. Clients address NFS servers by providing a hostname and a directory that should be mounted. 6.7. Storage Mode File-based storage, allowing files to be organized into directories. 6.8. Comments The efficiency and scalability of the NFS access control method is a concern in the context of DECADE. A user owning storage may be required to explicitly reconfigure permissions for files and directories often (e.g., for each object transfered to each peer) resulting in additional overhead for both the user and storage server. 7. Amazon S3 Amazon S3 [AmazonS3] provides an online storage service. Users create buckets, and each bucket can contain stored objects. Users are provided an interface through which they can manage their buckets. Amazon S3 is popular backend storage for other services. Another related storage service is the Blob Service provided by Windows Azure [Azure]. Alimi, et al. Expires June 20, 2010 [Page 14] Internet-Draft DECADE Survey December 2009 7.1. Data Access Interface Users can read, and write objects. 7.2. Data Management Operations Users can delete previously-stored objects. 7.3. Data Search Capability Users can list contents of buckets to find objects matching desired criteria. 7.4. Access Control Authorization Access to stored objects can be restricted by owner, a list of other Amazon Web Service users, all Amazon Web Service Users, or open to all users (anonymous access). Another option is for the owner to generate and sign a query (e.g., a query to read an object) that can be used by any user until an owner-defined expiration time. 7.5. Resource Control Interface Not provided. 7.6. Discovery Mechanism Users are provided a well-known DNS name (either a default provided by Amazon, or one customized by a particular user). Users accessing S3 storage use DNS to discover an IP address where S3 requests can be sent. 7.7. Storage Mode Object-based, with the extension that objects can be organized into user-defined buckets. 8. OceanStore OceanStore is a storage platform developed at UC Berkeley that provides globally-distributed storage. OceanStore implements a model where multiple storage providers can pool resources together. Thus, a major focus is on resiliency and self-organization and self- maintenance. The protocol is resilient to some storage nodes being compromised by utilizing Byzantine agreement and erasure codes to store data at Alimi, et al. Expires June 20, 2010 [Page 15] Internet-Draft DECADE Survey December 2009 primary replicas. 8.1. Data Access Interface Users may read and write objects 8.2. Data Management Operations Objects may be replaced by newer versions, and multiple versions of an object may be maintained. 8.3. Data Search Capability Not provided. 8.4. Access Control Authorization Provided, but specifics are unclear from published paper. 8.5. Resource Control Interface Not provided. 8.6. Discovery Mechanism Users require an entry-point into the system in the form of one storage node that is part of OceanStore. 8.7. Storage Mode Object-based, though interfaces have been provided for NFS and HTTP. 9. Cache-and-Forward Architecture Cache-and-Forward [PRDW08] is an architecture content delivery services in the future Internet. In this architecture, storage can be exploited at nodes with the network, either directly at routers or deployed nearby routers. CNF is based on the concept of store-and- forward routers with large storage, providing for opportunistic delivery to occasionally disconnected mobile users and for in-network caching of content. The proposed CNF protocol uses reliable hop-by- hop transfer of large data files between CNF routers in place of an end-to-end transport protocol like TCP. Alimi, et al. Expires June 20, 2010 [Page 16] Internet-Draft DECADE Survey December 2009 9.1. Data Access Interface Users implicitly store content at Cache-and-forward routers by requesting files. Endhosts read content from in-network storage by submitting queries for content. 9.2. Data Management Operations Not provided. 9.3. Data Search Capability Not provided. 9.4. Access Control Authorization Not provided. 9.5. Resource Control Interface Not provided. 9.6. Discovery Mechanism A query including a location-independent content ID is sent to the network, and routed to a Cache-and-forward router, which handles retrieval of the data and forwarding to the endhost. 9.7. Storage Mode Object-based (with objects representing individual files). The architecture proposes to cache large files at storage within the network, though files could be made to represent smaller chunks of larger files. 10. Network Traffic Redundancy Elimination Another form of in-network storage is Redundancy Elimination (RE), or identifying and removing repeated content from network transfers. This technique has been proposed to improve network performance in many types of networks, such as ISP backbones and enterprise access links. One example redundancy elimination proposal is SmartRE, proposed by Anand et al., which focuses on network-wide redundancy elimination. In packet-level redundancy elimination, forwarding elements are equipped with additional storage which can be used to cache data from forwarded packets. Upstream routers may replace packet data with a fingerprint that tells a downstream router how to Alimi, et al. Expires June 20, 2010 [Page 17] Internet-Draft DECADE Survey December 2009 decode and reconstruct the packet based on cached data. 10.1. Data Access Interface Redundancy-elimination are typically transparent to the user. Writing into the storage is done by transferring data that has not already been cached. Storage is read when users transmit data identical to previously-transmitted data. 10.2. Data Management Operations Not provided. 10.3. Data Search Capability Not provided. 10.4. Access Control Authorization Not provided. However, note that the content provider still retains control over which peers receive the requested data. The returned data is simple "compressed" as it is transferred within the network. 10.5. Resource Control Interface Not provided. The content provider still retains control over the rate at which packets are sent to a peer. The packet size within the network may be reduced. 10.6. Discovery Mechanism No discovery mechanism is necessary. Routers can use redundancy- elimination without the users' knowledge. 10.7. Storage Mode Object-based, with "objects" being data from packets transmitted within the network. 11. BranchCache BranchCache [BranchCache] is a feature integrated into Windows (Windows 7 and Windows Server 2008R2) that aims to optimize enterprise branch office file access over the WAN links. The main goals are to reduce WAN link utilization and improve application responsiveness by caching and sharing content within a branch while still maintaining end-to-end security. BranchCache allows files Alimi, et al. Expires June 20, 2010 [Page 18] Internet-Draft DECADE Survey December 2009 retrieved from the web servers and file servers located in headquarters or datacenters to be cached in remote branch offices, and shared among users in the same branch accessing the same content. BranchCache operates transparently by instrumenting the HTTP and SMB components of the networking stack. It provides two modes of operation: Distributed Cache and Hosted Cache. In both modes, a client always contacts a BranchCache-enabled content server first to get the content identifiers for local search. If the content is cached locally, the client then retrieves the content within the branch. Otherwise, the client will go back to the original content server to request the content. The two modes differ in how the content is shared. In the Hosted Cache mode, a locally provisioned server acts as a cache for files retrieved from the servers. After getting the content identifiers, the client first consults the cache for the desired file. If it is not present in the cache, the client retrieves it from the content server and sends it to the cache for storage. In the Distributed Cache mode, a client first queries other clients in the same network using the Web Services Discovery multicast protocol. As in the Hosted Cache mode, the client retrieves the file from the content server it is not available locally. After retrieving the file (either from another client or the content server), the client stores the file locally. The original content server always authorizes requests from clients. Cached content is encrypted, and clients can only decrypt the data using keys derived from metadata returned by the content server. In addition to instrumenting the networking stack at clients, content servers must also support BranchCache. 11.1. Data Access Interface Clients transparently retrieve (read) data from a cache (other clients or a Hosted Cache) since it operates by instrumenting the networking stack. In Hosted Cache mode, clients write data to the Hosted Cache once it is retrieved from the content server. 11.2. Data Management Operations Not provided. Alimi, et al. Expires June 20, 2010 [Page 19] Internet-Draft DECADE Survey December 2009 11.3. Data Search Capability Not provided. 11.4. Access Control Authorization Transferred content is encrypted, and can only be decrypted by keys derived from data received from the original content server. Though data may be transferred to unauthorized clients, end-to-end security is maintained by only allowing authorized clients to decrypt the data. 11.5. Resource Control Interface The storage capacity of caches on the clients and Hosted Caches are configurable by system administrators. The Hosted Cache further allows configuration of the maximum number of simultaneous client accesses. In the Distributed Caching mode, exponential back-off and throttling mechanisms are utilized to prevent reply storms of popular content requests. The client will also spread data block access among multiple serving clients that have the content (complete or partial) to improve latency and provide some load balancing. 11.6. Discovery Mechanism The Distributed Cache mode uses multicast for discovery of other clients and content within a local network. Currently, the Hosted Cache mode uses policy provisioning or manual configuration of the server used as the Hosted Cache. 11.7. Storage Mode Object-based. 12. Conclusions Though there have been many successful in-network storage systems, they have been designed for use cases different than those defined in DECADE. As a result, they their functionality and feature set does not meet the requirements defined for DECADE. DECADE aims to provide a standard protocol for P2P applications and content providers to access and control in-network storage, resulting in increased network efficiency while retaining control over content shared with peers. Additionally, defining a standard protocol can reduce complexity of in-network storage since multiple P2P application protocols no longer need to be implemented by in-network storage systems. Alimi, et al. Expires June 20, 2010 [Page 20] Internet-Draft DECADE Survey December 2009 13. Security Considerations This draft is a survey of existing in-network storage systems, and does not introduce any security considerations beyond those of the surveyed systems. For more information on security considerations of DECADE, see [I-D.song-decade-problem-statement]. 14. IANA Considerations This document does not have any IANA Considerations. 15. Acknowledgments The authors would like to thank Yu-Shun Wang and Ning Zong for comments and contributions to this document. 16. References 16.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 16.2. Informative References [I-D.song-decade-problem-statement] Yongchao, S., Zong, N., Yang, Y., and R. Alimi, "DECoupled Application Data Enroute (DECADE) Problem Statement", draft-song-decade-problem-statement-00 (work in progress), October 2009. [I-D.gu-decade-reqs] Yingjie, G., Yongchao, S., Yang, Y., and R. Alimi, "DECoupled Application Data Enroute (DECADE) Requirements", draft-gu-decade-reqs-01 (work in progress), October 2009. [HYAL08] H. Xie, Y. R. Yang, A. Krishnamurthy, Y. Liu, and A. Silberschatz., "P4P: Provider Portal for Applications.", In ACM SIGCOMM 2008. [MCM08] M. Hefeeda, C. Hsu, and K. Mokhtarian., "pCache: A Proxy Cache for Peer-to-Peer Traffic,", In ACM SIGCOMM'08 Alimi, et al. Expires June 20, 2010 [Page 21] Internet-Draft DECADE Survey December 2009 Technical Demonstration. [JZL08] Jie Wu, ZhiHui Lu, BiSheng Liu, et al., "PeerCDN: A Novel P2P Network Assisted Streaming Content Delivery Network Scheme", In 8th IEEE International Conference on Computer and Information Technology (CIT2008). [GYZ07] G. Shen, Y. Wang, Y. Xiong, B.Y. Zhao, Z.-L. Zhang, "HPTP: Relieving the tension between isps and p2p", In 6th International workshop on Peer-To-Peer Systems (IPTPS2007). [JCL09] Jiajun Wang, Cheng Huang, Jin Li., "On ISP-friendly rate allocation for peer-assisted VoD", In ACM Multimedia 2008. [GH09] Geoff Huston, Telstra., "Web Caching", In The Internet Protocol Journal Volume 2, No. 3. [McGraw02] Scott Hull et al., "Content Delivery Networks: Web Switching for Security, Availability, and Speed". [PR07] Pathan, A.K., Buyya, R., "A Taxonomy and Survey of Content Delivery Networks.", In Grid Computing and Distributed Systems Laboratory in University of Melbourne, Technology Report, Feb. 2007. [AmazonS3] Amazon, "Amazon Simple Storage Service (Amazon S3).", http://aws.amazon.com/s3/. [Azure] Microsoft Corporation., "Windows Azure Blob - Programming Blob Storage.", http://go.microsoft.com/fwlink/?LinkId=153400. [OceanStore] S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz., "Pond: the OceanStore Prototype.", In FAST 2003. [AVA09] A. Anand, V. Sekar, A. Akella., "SmartRE: An Architecture for Coordinated Network-wide Redundancy Elimination.", In SIGCOMM 2009. [PRDW08] S. Paul, R. Yates, D. Raychaudhuri, J. Kurose., "The Cache-and-Forward Network Architecture for Efficient Mobile Content Delivery Services in the Future Internet", In Innovations in NGN: Future Network and Services, 2008. Alimi, et al. Expires June 20, 2010 [Page 22] Internet-Draft DECADE Survey December 2009 [BranchCache] Microsoft Corporation., "BranchCache", http://technet.microsoft.com/en-us/network/dd425028.aspx. Authors' Addresses Richard Alimi Yale University Email: richard.alimi@yale.edu ZhiHui Lu Fudan University Email: lzh@fudan.edu.cn Song Haibin Huawei Baixia Road No. 91 Nanjing, Jiangsu Province 210001 P.R.China Phone: +86-25-84565867 Email: melodysong@huawei.com Yang Richard Yang Yale University Email: yry@cs.yale.edu Alimi, et al. Expires June 20, 2010 [Page 23]