Architecture

From flud

Contents

[edit] Node Identity

Each node posesses a public/private key pair, which is initially generated locally. The node's public key is hashed and the result is the nodeID. This provides some protection against spoofing/hijacking attacks; nodes can verify the identities of other nodes through a 2-step check: 1) verify that the node's public key hashes to its nodeID and 2) issue a challenge/response by encrypting some random data with the node's public key and verifying that the node can return the data in the clear (by decrypting with its private key).

Each node also posesses a groupID. This is to allow for formation of private Flud networks. The groupID is held as a secret among group members (may want to look into alternate schemes, such as Shamir secret sharing to provide thresholds such as "at least N group members are required to share the secret xxx"). We call this secret groupID the private groupID, or groupIDr. Upon joining a group, a new node must prove that it knows the private groupID, but should not expose it directly (to prevent attacks from snooping such info, or by an attacker phishing for such information directly). For this reason, the private groupID is never sent to other nodes. Instead, each node will hash the concatenation of its own public key and the private groupID. This value, which we will call the public group ID, or groupIDu, is publicly exposed by each node. Other nodes that posess the same private groupID can verify that the node knows the private groupID by performing the same hash operation with a node's public key and the private groupID, but will be unable to do the reverse -- discover the private groupID from the groupIDu. (For practical reasons, out of the box the private groupID will contain a default value, which will denote a distinct Flud network that we will refer to as the public Flud network. The private groupID for the public Flud network is necessarily not a secret. But the user may elect to change their groupID to form a private Flud network).

[edit] Recovering Node Identity

If the user loses their node's public/private key pair, they will be unable to recover any data from the network. To help the user keep a hold of their key pair, a very simple archival and recovery mechanism is provided: The user chooses a memorable password/passphrase during initialization of their flud node. The public/private keypair and private groupID are encrypted with this passphrase, and the result is sent to the user's email account for archival (or could even be posted to some public web server via webdav, ftp'd somewhere, blogged, etc). The user is also encouraged to print out a hardcopy. During a system recovery operation, the user can provide the encrypted data and passphrase, and restore their credentials. Alternate strategies are also possible.

[edit] Security

All data in flud is symmetrically encrypted with a 256-bit AES key. The encryption operation currently uses CBC mode to avert birthday attacks. The key is generated from the sha256 hash of the data being encrypted, so that we obtain convergent encryption.

All communications use a challenge/response mechanism to ensure that nodes are who they say they are. The challenge/response is essentially a nonce with identity verification. When a request comes in, the receiver generates a random nonce before interacting. The nonce is concatenated with the receiver's nodeID. Challenges are encrypted with the other party's public key, and responses must contain the correct decrypt. If they do, operations can proceed. flud nodeIDs are simliar to SFS's self-certifying IDs -- nodeIDs are the sha256 hash of the public key, so nodes must be who they say they are. The inclusion of nodeID in the challenge prevents one node from "Anne-Louise"ing (see the Grandmaster Chess problem) the challenges. In flud, Anne-Louise can play 'between' Spasky and Fisher, but only if she tells Spasky that she /is/ Fisher and Fisher that she /is/ Spasky. She cannot modify the moves, because doing so will lead to her immediate detection. As long as she passes the moves along unmodified, she's just a proxy.

[edit] Data Storage

All backup data in the flud backup network is stored to other cooperating nodes. Metadata is decoupled from backup data. This results in a storage system with two layers: the storage layer and the metadata layer (similar to PeerStore). This division is natural; data and metadata have different storage capacity requirements, different update frequency requirements, and different availability requirements. We use a distributed hash table (DHT) modeled after the Kademlia DHT for the metadata layer. The DHT allows us to utilize the nice qualities of convergent storage, aggresive maintanence of metadata, and efficient retrieval. DHTs are also typicaly well-suited to storage of comparatively small amounts of data, such as metadata. The data storage layer for a decentralized backup system, on the other hand, requires constraints such as fairness and symmetrical resource consumption. By allowing nodes to choose their own storage trading peers, we provide the flexibility to implement symmetrical storage and to fully utilize trust metrics in making such choices.

[edit] Storage Layer

When a node has data that it wants stored, it contacts other nodes in the network with trading offers. The offers list the amount of data the offerrer wishes to store, as well as the amount of data the node is willing to trade. In the default case, these amounts are equal, and nodes are willing to engage in such trades equally. In times of scarcity or low demand on the offered node, nodes may reject symmetrical offers or charge higher prices for resources, as in Grothoff's Excess-based Economic Model. By allowing storage resources to act as a type of currency, flŭd creates an economic incentive for fairness and symmetry.

If an offered node does not currently have data to trade, but wishes to enter into the trade for future resources, it can store a claim on the offering node which can later be replaced by valid data (see Samsara). As in Samsara, these claims are verifiable and unforgeable. Unlike Samsara, claims in flŭd cannot be forwarded or traded to other nodes.

It should be noted that the default strategy outlined above is one of many possible strategies. Each node may implement a strategy that it hopes will optimize its own storage relationships and opportunities, and these strategies may evolve over time. Indeed, it is hoped that others will implement flŭd clients with diverse trading strategies and attempt to more closely approximate the free market.

In order to prevent discarding of data after agreeing to a trade, nodes are expected to periodically audit their own store operations through use of verify operations. Verify operations compute the hash of a random offset and length into a data block that has been stored, then query the storing node for the hash given the offset and length. If the node can correctly respond to the challenge, it still stores the data. The auditing node can always perform these checks, since flŭd nodes keep local copies of all data stored -- this is a backup service, after all, not a generic online storage service. Data which was previously stored but is no longer kept (because it is now out of date) is not verified; storing nodes may discard data which isn't accessed (via a store, verify, or retrieve operation) for some predetermined amount of time.

Data retrieval requests are honored by respondents by returning the stored data.

All storage layer operations are immune to imposter attacks through the assymetrical encryption primitives that make up a node identity. Nodes can verify that requests/response originate at claimed IDs by using signatures, sending challenge/response pairs with public keys, or by setting up secure channels using these primitives.

Incorrect or discarded responses and lack of availability decrease trust between nodes. Nodes prefer trading with partners that are highly available and operate correctly. Detecting low availability and incorrect responses lowers the local trust score at the requestor. If this score becomes low enough, the requestor will cease trading with the requestee, and begin seeking other partners on which to store data. Correctly responding to requests likewise increases trust. All trust is localized -- no reputations or rumors are transferred among nodes. This is feasible in flŭd because the number of trading partners is generally small, on the order of 2^8 nodes or less. Nodes whose trust scores decrease beyond a certain threshold can be moved into a blacklist of greater length capacity.

All stored blocks contain reference lists, with all owners of the stored data accounted for. These reference lists serve to allow nodes to do internal accounting, reconciling the amount of data claimed to be stored for others with the amount of data actually stored. Reference lists also allow nodes to blindly return all data belonging to a particular node when asked (which is useful if the requester has lost metadata but wants to still recover files), as well as safe deletion of data that is stored by more than one owner.

Each file block also contains an erasure coded chunk of the originator's native filesystem metadata, encrypted with the originator's public key. This data is tied to the reference list on blocks of data mentioned above. If multiple nodes store the same file, their opaque filesystem metadata is appended to the record.

[edit] Metadata Layer

The metadata layer stores flud metadata (locations of blocks which make up the erasure coded file). The metadata layer is implemented with a Kademlia-style DHT.

In order to prevent abuse of the DHT storage mechanism, flud metadata is stored transparently. A node may statistically audit this metadata, to ensure that it really does point to valid blocks of stored data (and purge it if some percentage of these operations indicate that it does not).

A master-metadata record is also created, which contains the storage keys of all metadata stored to the metadata layer. This record is created periodically (at the end of a bulk backup operation, for instance) and stored in the DHT as a special record under the ID of the originating node.

One of the great advantages of using the metadata layer in this way is that it allows convergent storage, (often called "single instance storage") to be provided on the granularity of files. If many users back up the same files, these can be stored with great efficiency.

In order to deal with potential problems with a DHT, nodes maintains two caches:

  • unreachable: as per Non-transitivity paper, nodes must maintain a cache of other nodes which are locally unreachable
  • disreputable: to deal with nodes which cheat at the DHT layer, nodes maintain a cache of locally-known disreputable nodes, and will not add these to its routing table or store data to these nodes.

In order to thwart Sybil attacks at the DHT layer, three mechanisms are employed:

  • Kademlia's preference for long-lived (and old) nodes prevents sybil nodes from 'taking over' routing
  • Nodes must compute a hash collision roughly every two weeks. The lower bits of the collision must contain a timestamp. This injects a cost into sybil style attacks.
  • Nodes can use social networks ala SybilGuard

Despite these protections, the DHT layer is mainly a convenience. All metadata and the mastermetadata is also periodically combined (after a bulk backup operation, for instance) and then also stored to a trusted peer in the storage layer as a regular file.

Normally, a restore operation uses the DHT layer to first find the mastermetadata, then recover files by accessing their metadata and storage data. But even if the DHT layer becomes non-functional (due to a massive Sybil attack by well-healed adversary or other currently unknown weakness), nodes may recover data by querying other nodes for data that belongs to them. This could be a time-consuming operation, as each queried node must examine its records and return data blocks, but is guaranteed to return all data. Metadata and master metadata that has been periodically stored in the storage layer of trusted nodes can likewise be recovered to reassemble all stored data. Note, however, that this method is one of last resort; it is not expected that an adversary will be able to compromise the DHT.

[edit] Software Components

Several separate components make up a flud node:

[edit] FludNode Service

The FludNode Service is a long-lived service, running whenever the node is up. It is run as a daemon or service. The FludNode does the bulk of the work: it stores data to the flud network, receives incoming data from the network, purges old data from its local disk caches, responds to requests from the FludClient. Most of the functionality mentioned in this architectural document resides in the FludNode service.

On Unix systems, one FludNode is run per user, with the rights of that user. Regular users can backup and restore their personal files, and can backup other system files for which they have read access (but generally lack sufficient rights to locally restore such system files). Superusers may backup and restore any file on the filesystem.

On Windows systems, it may make more sense to run a single FludNode instance.

[edit] FludClient

The FludClient is an interface for the user to the FludNode. FludNode provides a local network interface for interacting with clients. The interface is only accessible locally, and clients must authenticate to FludNode by correctly responding to a challenge based on knowledge of the node's private key.

The Local Client Protocol provides commands for storing and retrieving files, metadata, and master metadata (as well as lower-level interfaces to DHT and Storage layer primitives). A local client an interact with the FludNode through this interface.

Presently, FludLocalClient.py provides an example of interfacing with the FludNode via a command-line style interface. Work has also begun on a graphical user interface.

The FludClient is also able to give meaningful status information to the user by examining log files and through messages sent back from FludNode.

The FludClient is transient; it is run and terminated at the whim of the user. This is in contrast to FludNode's continuous execution.

FludClient can produce data files describing backup operations instead of executing those operations directly. This is useful when used in conjunction with the FludScheduler.

Upon initial run, FludClient will try to choose a reasonable set of files to backup by default. The description of which files to include/exclude in this set is found in fludfile.init (please update this to add new rules).

[edit] FludScheduler

Since FludClient is not expected to be running all the time, a third mechanism exists for performing backup operations on a scheduled basis. On Unix systems, scheduling can occur through cron. On other platforms, there may exist similar mechanisms. Where feasible, FludScheduler will be implemented using those mechanisms. It is also possible that FludScheduler is written as another custom daemon/service, which is long-lived and takes care of performing backup operations on a scheduled basis.

[edit] Versioning

Traditional backup systems that provide versioning support allow the user to retrieve the current version of a file, or any of N previous versions that were stored during previous backup operations. Since it was rather trivial to simply keep old versions on the central backup server, this wasn't much of an engineering problem (at worst, disk fills up quickly).

With a collaborative backup system, such a scheme is less practical. If fully enabled, it can consume many times the storage space of a simple single-snapshot system. If fairness is to be enforced, the number of resources consumed by a node must be proportional to those that are provided. "Server-side" versioning, even when using a clever delta-compression technique such as that outlined in ABS, disrupts the symmetry of equal resource trades.

There is good news, however. We can use a single-snapshot system to provide versioning by requiring all versioning to occur locally. That is, the node's own hard drive can be used to maintain multiple versions of files, and then all these versions can be backed up as a single-snapshot to the flud network. Think of a local SVN repository (with many versions contained therein, stored as base and delta files) that is set to be backed up; the backup system doesn't have to worry about versioning -- it just backs up the current data. The local SVN repository is in charge of worrying about versions. The user gets her versions, and the system gets to maintain its symmetrical fairness enforcement mechanism.

The advantages of this scheme are several-fold:

  1. simplicity
  2. preservation of resource trade symmetry
  3. storage consumption minimization, and user opt-in
  4. decoupling of versioning layer from backup layer

Of the these, #1 is very appealing from an implementation standpoint. We just back up the current view. This also greatly simplifies the verification mechanism -- the verifier will always have the complete file from which to do challenge/response queries to the verifiee. We don't have to worry about keeping deltas or partial checksums or anything like that in order to do verification; we just pick a block of bytes at random from within the file, and make sure that the storer can return us the hash of those bytes. #1 also means that we don't have to do anything complicated to figure out what the delta of a delta-compressed version should be (i.e., we don't need to request the old version from the storage system, compare it with our version, then send out a delta) -- in fact, with this scheme we wipe out delta compression altogether, at least from the viewpoint of the storage mechanism (some other local mechanism is welcome to use delta compression to store versions locally, but this mechanism won't need to download lots of data in order to do so, because it will all be local).

Preservation of resource trade symmetry (#2) is very important. Without enforcement mechanisms, the storage layer becomes susceptible to many types of leech attacks.

Consumption minimization (#3) is nice. It means that if the user isn't interested in versioning, they don't have to do it. This means that we eliminate a lot of overhead that we would have had if every user was storing versions, even if they didn't need or want them. It also means that there is an automatic cost for enabling versions for the user's local storage resources. This isn't to imply that we want to punish the user for enabling versions, but adding local disk is cheap. Versioning does become necessary quite quickly for things such as email clients that store all mail in a particular folder as one large file, or other applications that use databases in single files -- we don't want the user to have to send the whole file (which can become quite large) every time they get a new email or change the db slightly. The good news is that we can still provide this in its own layer, using some of the techniques described in ABS to create local versions.

Decoupling (#4) is seldom a bad idea, especially when it can be done cleanly. The user (or flud client developer) is free to implement whatever local versioning scheme they want. They can make copies of files and store them in other directories, they could use a version control system such as CVS, they could do their own delta compression and store the deltas in a special directory. They could store as many or as few versions as they want. They can take version snapshots often or seldom, and this is independent of how often they perform backup. And such schemes can be switched out, upgraded, or removed on the fly without anyone really noticing. The storage layer never has to know about any of this, and consequently, never has to be modified to work with changes at the versioning layer.

[edit] Protocol

see FludProtocol

[edit] Trust System

See LocalizedTrust

[edit] Detecting Cheaters

Malicious or misbehaving nodes might want to discard data stored to them from other nodes early for selfish or other reasons. Likewise, they may wish to consume resources on other nodes (bandwidth, disk, cpu) disproportionate to their contributions. And they may wish to masquerade as other nodes in order to accomplish some of the above. Finally, purely malicious nodes might wish to simply disrupt the network with no benefit to themselves. All of these must be guarded against.

[edit] Protections for the Client Side

Nodes rely on the fact that when they store data on other nodes, it will be there when they go to retrieve it. The period of time for which the data will persist is contractual; each layer must store data for a known period of time K (say, 10 days). Upon any access to the data (RETRIEVE or VERIFY), the clock starts over. If all nodes conform to this contract, nodes know when their data might expire and can refresh accordingly. However, malicious or misbehaving nodes might delete data early. To prevent this, we have to provide a disincentive for cheating through the Trust System.

  • The DHT (metadata) layer can be checked by occasionally storing dummy data to a node, and then checking back on it only as the period K is about to expire. If the data is not still there, we decrement trust or reciprocally discard data (or both). This type of check can be done randomly by picking a key value at random, or more directed by targetting the range of a particular nodeID. {Need to reconcile with Server Side verification of DHT metadata}
  • The storage layer likewise can store a dummy block, which is only checked as the period K is about to expire. Normal VERIFY ops prevent the node from discarding real blocks before their time.

[edit] Protections for the Server Side

Each node stores data for other nodes. Those other nodes might wish to take advantage of the willingness to store data, by overusing resources for example. To protect against this, each node implements several protection measures:

  • Identity challenge/response - ensures that nodes can't masquerade as others to consume resources under a stolen identity
  • Storage Layer Symmetry - at the storage layer, consumption must be symmetric with production. If node A wants to store 10MB of data to node B, node A must likewise provide 10MB of space to node B for storage.
  • Metadata Layer Verifiability - the DHT metadata for storage layer blocks is stored transparently, so a node may walk the metadata that is stored to it locally, verifying that some percentage of the blocks do exist in the locations described. If they do not, the node is allowed to delete the metadata. This helps prevent clients from abusing the DHT layer to store data other than valid file metadata. The format of stored data is also checked carefully to make sure that it conforms to metadata format.

[edit] Identity Protections

[edit] Byzantine Faults

Personal tools