[flud-devel] DHT justifications (was: DHT Performance? Design?)
Alen Peacock
alenlpeacock at gmail.com
Wed Oct 31 23:38:42 PDT 2007
On Oct 31, 2007 12:32 AM, Bill Broadley <bill at broadley.org> wrote:
>
> Hrm, so what makes the metadata in the DHT a good idea?
This is driven mainly by the goal to do convergent storage in an
internet-scale backup system, where another major goal is complete
decentralization. If these are goals (as they are in flud), I know of
no better mechanism than a DHT for allowing nodes to find data in a
content-addressable manner. I'll expand on this a bit below:
> Storing potentially
> fairly large (on average one peers metadata*12?) amount of data from folks you
> don't have a vest interest in (nor they you).
Ah, but rational nodes do have self-interest, and risk getting
blacklisted if they misbehave. If blacklisting proves insufficient,
we can add more layers of trust/reputation/rumors/epidemics for the
DHT layer.
> Disaster recovery can be done via more direct methods.
I admit that the method you described for allowing a node to
discover its trading peers through those nodes' verification ops is
very reasonable. But there is one other goal that drives flud design
which advises against it: attack resistence. I subscribe to the view
that one of p2p's greatest architectural strengths is that a
decentralized application, if successful, will spawn many
implementations, written by many different parties (ala gnutella,
bittorrent, eD2k, etc), and any one of these implementations, if
acting rationally, could/should seek maximum benefit for their
clients, even to the detriment of other nodes. Such actions are
sometimes called rational attacks
[http://www.flud.org/wiki/RelatedPapers#A_Taxonomy_of_Rational_Attacks]
and the best defense against them is to provide sufficient
punishments/rewards so that a rational player would never attempt
them. The goal is to arrive at something similar to a Nash
equilibrium.
For a node which has experienced a complete and catastrophic loss,
discovering its old trading peers via those peers' verification ops
seems vulnerable to the rational attack where a peer notices that one
of its trading peers has begun failing verify ops, infers that the
target node must have lost data, and then, realizing that the target
no longer has anything to offer it (since it has lost not only its own
data but the data of its partners), decides to purge the data that was
stored for it. Best of all for the rational attacker, the target has
no way to know how much data has been purged; it only knows that the
attacker was storing at least some data (because it tried some verify
ops). If the attacker does something smart like, say, only purging 90%
of that node's contracts, it may well appear that the attacker is
functioning perfectly and has not lost data.
Of course there are a number of countermeasures you can take that
don't involve using a DHT, but can you think of one that is simpler
and less brittle? Or more resilient? In flud, there is no way for a
storing node to discover the sK for the file that a stored block
belongs to, and thus no way for it to figure out which nodes in the
DHT are storing the block metadata. So even if it wanted to collude
to purge that data as in the attack outlined above, it would not be
able to figure out how to find the nodes that it needed to collude
with. If a node tries to purge data unilaterally, it will be
discovered.
This might sound overly paranoid, but that's the fun; starting out
with the assumption that peers might be malicious or misbehave (by
running software not written by the creator) is for me one of the
hallmark differences between distributed systems and decentralized
ones.
> Not like the metadata does you much good if your
> peers reject your contracts/block stores.
Block storage operations in flud occur before the block metadata is
stored to the DHT.
> I just read that, and I see the reasons why the DHT might not be as vulnerable
> as one might think. But I believe the metadata is also in the storage blocks
> right?
Sorry, I think I've confused things with my references to
'metadata'. In flud, there are three types of metadata:
1) filesystem metadata (filename, perms, ownership, timestamps, EAs,
etc). This is erasure coded and stored along with blocks in the block
storage layer
2) block metadata, which contains the names of blocks belonging to a
file (sha256) and where they are stored (nodeIDs). This is stored in
the DHT.
3) master metadata. This is the list of filekeys (sK) to filenames,
with backup time. This is occasionally stored as a regular file, with
a special DHT entry indicating its location.
> > others here: http://flud.org/pipermail/flud-devel_flud.org/2006-September/000013.html
> > (short summary: if the DHT fails, nodes can still operate correctly
> > including restoring data, and in a source-routed DHT like kademlia,
> > simple blacklisting is sufficient for disgarding misbehaving nodes).
> > Note that both these threads also address the issue of DHT abuse by
> > malicious nodes.
>
> So it doesn't sounds like storing 12 times the metadata on each node is
> proving much benefit.
It provides a performance benefit, both for discovering if a file is
already stored to the system, but also for finding files without doing
global queries to all nodes, or even to all nodes with which a peer
has entered into trading contracts (if such a list is available).
But, you are right that since flud can operate correctly without it,
it is redundant.
> > The one flaw with convergent storage (aka SIS:
> > single-instance-store, aka CAS: content-addressable storage) is that
> > it can reveal that others have copies of well-known files. This is
>
> Less so if you ditch metadata in the DHT, only you contractual peers
> would be concerned. So each client would only store on average:
> * somewhere around log(n) records for DHT routing
> * somewhere around DHT_replication * public_key <-> IP mappings (to support
> dht lookups for peer -> IP).
> * contract info and reputation per storage peer (likely in the 20-100 range?)
> * metadata for local files (pathname, timestamp)
> * metadata for remote blocks (pathname, peer, and slice)
>
> In the normal backup case you access local metadata and sha256sum,
> encrypt, erasure code, and upload your churn.
>
> In disaster recovery you load your keys and wait, or maybe
> put_dht(public_key)=help I lost my disk
>
> Speaking of which a peer failing to inquire about your reputation should
> cause them to lose reputation.
I think this is a very reasonable approach, but I wonder what your
thoughts are re: malicious/misbehaving nodes during attempted restore?
> Right, as well if you wanted to watch activity on a single file you could join
> the DHT with the ID of the file you were interested in. Then any DHT
> lookups would go to you. Oh the replication, so you would need the nearest
> 12 IDs.
You can't choose your nodeID in flud: it is generated as the sha256
hash of your public key. And there is a hashcash component that
requires a modest amount of computation every time period (probably
bi-monthly), to discourage pre-compute attacks. So unless you have an
extraordinary amount of computational resources to do a very large
number of key-generation / hash / hash-collision-searches, targetting
parts of the DHT ID space will be rather futile.
(the hash-collision search is not yet fully implemented in flud)
> Well contracts should have a length, and someone that performs well
> should be rewarded with better terms on a contract. Actions speak much
> louder than words. So new peers don't get a huge chunk of disk, nor a very
> long terms. But when it comes to renewing, those who performed well
> get bigger chunks and longer terms. So once things reach a steady state
> folks will have say 20 peers that would each wait say 2 weeks before throwing
> away 10% of their files a day.
>
> Maybe a suggested 256MB/256MB trade 72 hour contract?
>
> If a particular peer was perceived as abusing you before you might only offer
> a 0/256MB trade (I.e. you use their disk and you don't offer any space), and
> let them earn the trust for you to store with them on more even terms.
This all seems very sane.
> > It's kind of a hard parameter to
> > settle on, because it is a system global and can't really be
> > configured on a per-system basis without a lot of complexity. I'm
> > leaning towards something on the order of 10 days myself.
>
> Not sure I see the complexity, all contracts should be monitored during the
> terms of the contract. If I have a contract with a 20 folks on internet 2 for
> 32Gb/32GB for a month I don't see why folks on dsl with 256MB/256MB for 3 days
> would care.
>
> Hrm, maybe bandwidth should be mentioned in the contract. Kinda silly to
> offer 100GB for a week if your bandwidth can only manage 1GB over that time.
These are good insights.
> I think the only piece misssing is the cost/benefit to
> putting the metadata again in the DHT.
The DHT storage layer in flud ends up taking up only a small
fraction of storage resources. Performance doesn't seem to be a
problem, especially given the ability to do DHT ops in parallel as per
kademlia's design, and to overlap them with other operations to the
block storage layer. But you have absolutely convinced me that I need
to do some serious performance evaluation of this before I can change
that to a statement of fact.
I'm no stranger to being wrong, and won't hesitate to drop the DHT
if it ends up being a problem and I can find another technique that
can replace it. That's the great thing about being able to call flud
"experimental" software :)
Alen
More information about the flud-devel
mailing list