Open Issues

From flud

This page serves as a scratch space for managing open issues.

[edit] Open Issues

  • Preserving file metadata: file metadata is currently stored to the DHT. This means that data expirations need to be long, refreshes fairly often (to keep data fresh and to deal with changing topology), and that replication must be set high (so that data survives). The DHT is also vulnerable to some types of attacks (see http://www3.ietf.org/proceedings/06mar/slides/plenaryt-2.pdf) that are hard to defend against. If file metadata is lost, the data itself is lost also. There are several options:
    • use trusted peering relationships to store copies of the metadata to trusted peers (outside of the DHT). This can then be retrieved by querying nodes until found, or pointers to several of these trusted nodes can be saved in email.
    • put the file metadata in the STORE layer, then only store pointers to where this is in the DHT layer. The DHT layer would then constrain itself to only storing mastermetadata. This could be very widely replicated, since it would be the only piece of data each node stores in the DHT. And, it would mean that more of the data gets stored symmetrically with trading partners. On the other hand, this only moves the problem: now the file metadata is impossible to find if the mastermetadata in the DHT layer is lost.
    • shore up the DHT layer. Use cryptographic expense (find sha256 semi-collisions) to further prevent sybil attacks (even though kademlia is very sybil resistent by default), etc. This may be the best bet, even though DHT security is still considered to be an open problem.
    • it might also serve us to use the 'email layer' to send out file metadata, as a backup.
    • Do nothing: rely on the DHT to perform as designed, tolerating a certain percentage of misbehaving nodes. Some things to think about here: kademlia (and all popular DHTs) use access to entries as refreshes. This has the nice effect of preserving popular data perpetually. In the case of backup, only the source is refreshing data (except in the case of popular SIS files), so entries will go stale if a source node fails and doesn't try to recover before data expires. This means that data must be relatively long-lived (on the order of 10x days), or that alternative refresh mechanisms must be introduced.

[the above solutions, in combination with Kademlia's ability to control every iteration of the routing algorithm and its preference for older nodes, addresses pretty much every concern listed in plenaryt-2.pdf above. CDG+02]

Personal tools