[flud-devel] private flud network and flud goals
Alen Peacock
alenlpeacock at gmail.com
Wed Sep 5 13:49:12 PDT 2007
On 9/5/07, Stuart Langridge <sil at kryogenix.org> wrote:
>
> Yeah...sort of. Put it like this: at least in the early days, before
> flud takes over the world, you're going to have to ensure that (a)
> someone somewhere is running a STUN server (b) there is always at
> least one live node in the "global flud network". So, for all intents
> and purposes, you're going to be putting up a server which everything
> can fall back on: if you're a flud client then the "global flud
> network" might not consist of anything other than "flud.org", but you
> always know it'll be there. So, if that's the case, why not start
> building that sort of support into it from the beginning?
Heh, okay, sure. In the early days, I imagine that there will be a
handful of nodes that are almost all directly controlled by flud.org
or close friends of flud.org. But this is just a bootstrapping
problem. Once the flud network gets to a certain critical mass, there
is no longer a need to rely on the single-point-of-failure that would
be flud.org (even though I'd also imagine that flud.org lives on for
as long as the flud network does).
In other words, I believe that a loosely connected group of nodes
controlled by *independent* entities will always be more reliable than
a group of nodes controlled by a single entity, and I think history
agrees with that (see
http://flud.org/blog/2007/04/26/eradicating-service-outages-once-and-for-all/).
I'm adverse to introducing any single-point-of-failure into the
system. I'd like for it to survive and work even if flud.org falls
into the hands of the russian mafia.
> Is this gwebcache as in the gnutella thing?
Yes, something modeled after gnutella's gwebcache. Similar
node-list caches are used by many other p2p filesharing systems.
> This makes sense. It's not quite as neat as "type this group name and
> password in", and it means that there have to be specific *invites* to
> the group, because the mystic block of text must be generated by a
> flud client rather than being something intelligible, but it's
> basically the same principle. It fails the telephone test, but that's
> not a *huge* issue.
Good point. I'm sure there are some ways to make this pass the
telephone test too, and a centralized server for that is probably the
simplest.
> It would, but is basically presupposes the existence of the global
> flud network. This is why I'm inclined to suggest that there's at
> least one server which the flud project run which can always be
> assumed to be there, so that every flud client *can* assume that the
> global flud network exists (even if it's just one box).
This is also a very good point, and I think it definitely wouldn't
hurt to add a step after trying #2 gwebcache, which would be:
#3 try connecting to a list of well-known super-reliable nodes (such
as those run by flud.org or others)
> What does "symmetrically" mean? To take a more realistic example,
> imagine that Alice, Bob, and Dr Evil all want to back up 2GB worth of
> stuff. Does Alice have to allocate 4GB of space on her machine for
> others to back up into? If Charlie now joins the network, does
> everyone have to allocate another 2GB?
When Alice wants to store 1MB of data to Bob, she must first make
the offer to Bob. Bob can accept or decline. If Bob accepts, he
accepts by countering with another offer, which should be ~1MB of data
to be stored at Alice (all this happens without user intervention).
Bob's data can be from some files that he needs stored, or could be
Samsara-style unforgeable claims that represent future storage space
(http://flud.org/wiki/Architecture#Storage_Layer). Either way, both
Alice and Bob must prove to each other that they are storing each
other's data, and the amount of space traded must be roughly
symmetrical. If they cannot prove this, they are free to discard the
other's data (probabilistically, to account for transient outages).
Space is consumed on-the-fly as it is stored to the network, except
in the case of claims (which are indistinguishable from regular data
to the node where it is stored).
> I pretty much agree with you, except that the global flud network has
> to be relatively big for this, because people will drop off it *all
> the time*. There are also issues like: each flud server must have a
> globally-unique and unchanging nodeID so it can be found again later.
> If I reinstall my machine and reinstall flud, do I get the same nodeID
> back again? If so, how? If not...then reinstalling flud means removing
> yourself as a node, which means that to all intents and purposes
> anyone else's backups which were stored on your node will have
> vanished.
nodeIDs are sha256-derived from the private/public RSA key pair that
is initially generated randomly for the node, and so it is nearly
impossible for collisions to occur when created (unless there are
serious flaws in the RSA key generation algorithm).
The RSA keypair is the single credential that a user needs in order
to recover from a catastrophic loss (dead hard drive, stolen computer,
etc). The command line client already has a command 'CRED' that sends
the keypair, encrypted, to a user's email account, with a message
strongly encouraging them to also print it out and save it in a safe
place. On first run, the GUI will also ask for an email account to
send this information, using the same 'CRED' command (GUI piece not
yet implemented). In the case of recovery from catastrophic loss, the
user will simply re-enter their keypair, hopefully via cut-n-paste
from an email account, but manually if necessary.
(http://flud.org/wiki/Architecture#Node_Identity)
And, yes, when the flud network is small, symmetry is harder to
find. But I still view this as a bootstrapping problem that will be
solved as more nodes enter the network (e.g., I can't backup all my
videos today but by tomorrow there will be enough new nodes to get
going on them). You are right that we can't wait for that, however
(otherwise flud will never grow from small to large). I haven't
decided exactly how to deal with it in small groups, but the answer
may be simply to relax the symmetry constraints when networks are
small, and tighten them as the network grows.
Regarding transience, studies of existing p2p systems show an
incredible tendency for nodes to be short-lived. But I think this has
a lot to do with incentives provided by those networks, i.e., as soon
as I've downloaded my files, I'm out of there. flud, and remote
backup in general, provide different incentives. In the
set-it-and-forget-it background backup scheme, its nice to have
continuous connectivity to catch recent changes. And in flud's
monitor-all-my-own-backup-partners scheme, continuous connectivity is
also a benefit, not to mention that staying available increases trust
in other flud nodes. So nodes have a reason to stay online as long as
possible in flud.
That doesn't mean we won't see a lot of transience from nodes coming
online "just to check it out," but that should cause little
disturbance because new nodes start out with a low trust threshold.
> Yeah, I know. It's a problem, though, because if non-technical people
> are to use this then they'll have to understand the ramifications of
> ticking "back up this folder" next to their 25GB of photos. This is a
> generic issue with all distributed backup systems, not just flud, but
> it is an issue.
Right. Some method of letting the user know how long a backup will
take and why would be useful.
> Really? What's the fix? That could be a problem, especially for
> non-Linux software, which tends to do this sort of thing more -- look
> at, say, all the Mac apps which store a black hole DB like iPhoto.
http://www.flud.org/wiki/Architecture#Versioning
Alen
More information about the flud-devel
mailing list