FludProtocol

From flud

Contents

[edit] Overview

The flŭd protocol is a simple RESTful protocol, with 8 primitive message types. The protocol's simplicity features prominently in the design of the system.

[edit] HTTP Transport

flŭd uses HTTP as the transport for all communications. There are several benefits to this choice:

  • HTTP is stateless. Stateless is good. Maintaining a lot of state about servers/clients is almost universally considered problematic when designing distributed systems, or at least is something to avoid whenever possible. Stateful protocols do not scale well (server-side state == server-side resources) and are fragile (cleaning up state when connections break or just hang out unresponsively is complex and error-prone). HTTP helps enforce statelessness, and discourages the use of complex, long-standing open connections, which encourage simplification in the design. All flŭd messages consist of a request/response pair, with no additional messages. HTTP keepalive is available for efficiently doing multiple stateless messages between nodes.
  • HTTP has the advantage of being able to operate transparently through firewalls, proxies, and NAT'ed networks (from the client side). With little effort, flŭd can leverage these attributes.
  • Unlike RPC (and its brethren), HTTP doesn't attempt to hide network communications from the programmer. HTTP admits, loudly, that communicating with a remote host is intrinsically different than calling a local procedure. This is truth.
  • Security. HTTP has acquired many mechanisms for access control and authentication. flŭd can simply use them. There is no need to re-invent these by using some other transport that lacks them.
  • Most languages have well-tested libraries for communicating via HTTP.
  • Most messages in flud (and in p2p networks in general) are short lived. Peers communicate with one another all the time, but not to the same small subset of peers. I.e., building a system that scales to millions of nodes is not possible if it is designed to maintain millions of concurrent connections. HTTP messages are short-lived.
  • [Others]

[edit] Messages

There are 8 basic messages in the flŭd protocol. These are divided into two simple suites of messages, one to deal with storage of data blocks, and one to deal with storage of metadata describing those data blocks. The reasons for this bifurcation is made clear in the architectural description of data storage.

[edit] Data Storage Primitives

The data storage primitives are are used to store, retrieve, verify, and delete file data to/from peers.

  • GetID: GET /ID - The peer returns its unique node ID.
  • Verify: GET /hash/[filekey] - The peer hashes the data stored under filekey from position offset and consisting of length bytes, and returns the result.

[edit] Metadata Storage Primitives

The metadata storage primitives are routed via an overlay network modeled after the Kademlia protocol. The /ID() data storage operation (above) is used as a PING() operator.

  • FindNode: GET /nodes/[val] - Peer returns the node(s) with nodeID closest to val (primitive for kademlia's findnode op)
  • Get: GET /meta/[key] - Peer returns the value stored under key, if present (primitve for kademlia's get op).

Since kademlia operations are parallel and iterative, nodes will invoke the above operations via a global interface:

    • kFINDNODE(val) - Returns the node(s) with nodeIDs closest to val. This invokes multiple GET /node/[val] to peers. This is the client's 'Find' operation.
    • kPUT(key, val) - Stores val under key. Storage occurs on k nodes with nodeIDs closest to key. This is the client 'PUT' operation.
    • kGET(key) - Returns the value stored under key, if present. This invokes multiple GET /dht/[key] to peers. This is the client 'GET' operation

The top-level operations are the only operations required for the server-side to implement. The client-side implements the second-level primitives, which may invoke multiple top-level primitives. For example, GET /node/[val] will generate responses for the k-closest nodes to val that a single peer knows of. kFINDNODE(val), on the other hand, will call GET /node/[val] multiple times, resulting in a global list of the k-closest nodes to val.

Personal tools