stephenbrooks.orgForumMuon1Generalideas for scaling the project
Username: Password:
Search site:
Subscribe to thread via RSS
Stephen Brooks
2002-11-02 10:15:03
Some time soon I will contact the "e-science department" of the lab where I worked this summer about this DC system, because so far I have only told the people there vague details of what's going on, and they don't really know how popular it is.  They might offer to host the project, in which case we will end up with a rather large bandwidth+storage allowance, because distributed computing is one of the fields which they specialise in (and RAL has a 2GBit/sec internet pipe).  Alternatively they might not be interested, or say the setup is amateurish or the client badly-coded, in which case it'll be down to me again.

In this second situation I've been thinking about how we might cope with more traffic.  It's important that the new users have useful work to do, but fortunately the design of the new version 5 is geared to allow the project's goals to be scaled up in step with the computing power (i.e. let the AI system do more of the designing, let it range over more possibilities).

I've been wondering what would happen if I let people volunteir to be either a Muon FTP Server or a Muon Datacenter.  The idea is that the client program would be able to download a list of possible FTP servers from this site every 10 days or so, and then when it tries to send results, it tries one of these at random.  If it can't make contact, it tries another, and so on for about 10 attempts (if 10 attempts fail, the client computer is probably offline!).
Setting up a Muon FTP Server would be rather easy for those of you who have already set up other sorts of server.  The only thing you need to do is synchronise the "signal.dat" file with my site (in fact I could write it to _your_ servers, then you would literally just have to tell me the FTP server existed and give a username and password).  Muon1's FTP script creates the approprate directory structure (one dir for each version) as it uploads anyway.

That on its own will take the load off my server, and we could leave it there if my ISP doesn't notice I'm downloading about 1GB/week of results.  Come to think of it people with broadband often D/L movies, so I might be OK there for a while.

The idea for setting up a Muon Datacenter would be I'd publish a pack on my site that you could extract somewhere suitable, which basically has all the scripts (duplicate remover, stats generator, results fetcher) I use here.  They are nothing special - you just have to use Task Scheduler (or Cron if someone ports it to Linux) to run the "genstats" script once per hour.  The amount of results that you fetch would be up to you - in fact you could do once per 6 hours if you didn't want so much, or alternatively only clear 1 or 2 of the available FTP servers.  Synchronising stats here would be a problem but I reckon I can set up a PHP system here so that each datacenter uploads its own "rawstats.txt" somewhere, and then my script re-adds everything up into a global rawstats file once per hour.  The question here becomes "how do I get hold of the database if it's on other people's computers?", but from the research point of view, I only _really_ need the best 100 results or so from each datacenter, which can be sent to me on a weekly or twice-weekly basis.  Of course I'd build this stuff into the genstats script if I decided to deploy this system.  Again, you can stop being a datacenter at any point, so long as you send your best 100 results off to me when finished (or even burn the lot onto a DVD or something and mail it to me).

OK, so what's in it for the people who volunteir to do these things?  Well I would of course publish a stats list of how many points are stored on each datacenter and how many went through each FTP server... Other than that I only suggested the idea because there appear to be a few people around who would be willing to help with bandwidth/storage if it made the project more successful.  Another cool thing is that we'd be (as far as I know) the first people to demonstrate a DISTRIBUTED distributed computing system.  A further step would be to make it P2P but I'd see that as a bad idea because not everyone wants their bandwidth used in that way.

All this would be a very flexible system... Of course the FTP idea would mean making the password public, which is always a risk.  It'd have to be public so that the list of FTP servers that the client programs occasionally download can actually be accessed.  Unless of course they all had the same password: the one in the non-disclosed security.c file.  Even so, that'd be a fairly easy system to break as it only takes 1 person to disclose the password, and to change the password, all the FTP server admins would have to change it and all the users would have to download a new client (the datacenters would have to change too).

So anyway, this isn't something I'm thinking of doing in the _near_ future, but later on in 2003 it might be worth trying.


"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"


[This message was edited by Stephen Brooks on 2002-Nov-02 at 17:26.]
mad-ness
2002-11-02 19:00:36
Very interesting concept.

I have nothing to contribute, but I will keep an eye on this forum to see how things progress.  smile

Ars Technica
Team Atomic Milkshake
http://www.teamatomicmilkshake.com/index.php
prokaryote
2002-11-03 13:38:02
Good luck Stephen,

Would be a good vehicle for the Neutrino group to not only simulate the Accelerator, but also to reach out to the general public.  Getting people involved in science is a sure way to foster political support for these projects and can be a very powerful tool for the progession of science in general.


www.ninjamicros.com mathematical projects
[ARS]odessit
2002-11-04 06:14:05
This is one hell of a concept, I am all for it.  I've heard that DSL will come to my part of the town in December so it would be a bit more acceptable for a server (6 blocks from CO big grin) than my RR cable which is a bit overloaded with users on my branch.  Ofcource it will depend on a pricing whether I could switch roll eyes

ARS Team Atomic Milkshake
Unofficial Muon1 FAQ
John Campbell
2002-11-05 15:42:14
One thing that might simplify some issues... instead of having the client download a list of servers from the FTP server, you could set up round-robin DNS, have "muonproxy.stephenbrooks.org" or the like resolve to a list of FTP proxies.  It should behave just as described, with the client machine picking one of the servers at random when it goes to upload, and progressing through them with retries, and it doesn't require any additional software or infrastucture support beyond the existing DNS system.  This assumes, of course, that your {ISP,registrar,whoever's handling your DNS hosting} allows you a sufficient level of control over your domain.

Also, depending on bandwidth and CPU requirements, I may be able to offer a server or two.
Brian Sogard
2002-11-06 15:18:29
If the bandwidth demand and file space demand are reasonable, let's say about 100MB peak storage and 200MB/day traffic, I could provide ftp space.  The server in fact already exists, and I would grant it a separate file space from the rest of my served file space, this would reduce the risk to the rest of my site from intruders.

What I would not be able to do is execute a signal.dat synch, Stephen you would have to push the signal.dat to my server.
Stephen Brooks
2002-11-06 15:50:44
I was thinking of doing a signal.dat push anyway.

Currently I've limited the storage for muon1 to 50MB and this seems more than enough for 800 users if flushed regularly (i.e. not more than about 48 hours between flushes, ideally more like 1 hour).  I have accumulated only about 750MB of results so far, so it'll be under 200MB/day for ages, especially if the load is spread.

I'll ask NameScout about round-robin DNS, although it was a cheap-ish domain so I dunno if they'll let me do that.  It might be easier if I do it "manually" by for now, having a servers list here.

As I said in another thread, I'm considering this more seriously now because my FTP server appears to be borked in some way.


"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"
Brian Sogard
2002-11-06 18:29:57
I had stopped following the other thread, but see now why you may accelerate distributing the ftp.  With the bandwidths and data volume you mentioned, I would easily be able to allow the project to use some of what I have available.  I would merely be allowing to you to use bandwidth and file space that I already have contracted but which I am not coming close to using up.  The company whose servers my domain is contracted on has had a pretty good history of bandwidth and server uptime.  Nobody's perfect, but they've done good by me so far.
Stephen Brooks
2002-11-06 19:58:34
quote:
John Campbell wrote:
One thing that might simplify some issues... instead of having the client download a list of servers from the FTP server, you could set up round-robin DNS, have "muonproxy.stephenbrooks.org" or the like resolve to a list of FTP proxies.

I've just registered muon.serveftp.org at DynDNS, which I could use for this purpose in the future.  I've also set up the Sambar webserver here on my local machine, which also comes with FTP.

Reason why I'm posting is - my web IP address appears to be http://163.1.162.5/ but I can't see my own webserver on that (a small page with a picture on it should appear).  I have got my webserver switched on because I can see this page (unsurprisingly) at http://127.0.0.1/ and (rather more oddly) at http://10.0.0.94/ . I have to assume the second one is some sort of intranet address I've been given.  Also you might want to check http://muon.serveftp.org/ because that's DNSed to my local machine right now, although probably it'll take a few hours to propagate.  Tell me if you can see the page at either of the two global web addresses because currently I can't.


"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"
[DPC]Scorpion
2002-11-07 00:33:22
dude go here and see what your ip is

Shields Up

ReDoubt [Picard]
2002-11-07 03:37:00
Stephen the IP you list as yours resolves to an address at oxford, england

nat-kludge-coz-alastair-cant-face-subnet-swap.trinity.ox.ac.uk [163.1.162.5

ReDoubt
Stephen Brooks
2002-11-07 05:52:19
Yes Alastair is the network administrator here.  Apparently I can use servers if they're "required for my academic studies", and I have to get my an e-mail from my tutor allowing it.  This thing is vaguely related to my academic studies, but certainly to my academic research.


"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"
DrHanser
2002-11-08 10:05:00
I could offer space and bandwidth on the round-robin setup.


Now that I'm not serving on direct connect full time (which was totalling ~3.5GB day upstream bandwidth usage >_< ), and my line is unsaturated, I have plenty of space and speed at home.  My connection is 1.5Mb/300Kbps, which, while it isn't really fast upstream, downstream would be more than ample for people hitting it constantly.

300K upstream would mean that you would download results from me ~40KB/sec.  Which is pretty fast.



I live in the dorm but go home once a week.  I also have VNC set up in case something goes wrong.  *sob* I miss my home internet connection.  frown The dorm is often worse than dialup.

--
The mark of an educated man is one who knows a lot about something, and a little about everything.
Tom King
2002-11-08 10:19:06
Hear hear.  I only had 128 cable at home, but I'd kill for that here.  Oh well.
Stephen Brooks
2002-11-08 12:23:02
http://stephenbrooks.org/muon1/servers.php

I need to replace the line about "parrots" by some guidelines on how the FTP account should be set up for Muon1, but for those of you who already know how the system works, you can signup now.

First thing I'll release next will be a new manualsend.exe that will download the servers.csv file if it can, and then uploads to a random server.  I'll then have to modify my scripts here to flush all servers for results.

The Infopop account of mine is in that list now mostly as a test - if it's still not working when the new manualsend is out, I'll take it off the list.


"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"
DrHanser
2002-11-09 08:01:47
Fantastic!

I'll mine up tonight when I get home.  smile

--
The mark of an educated man is one who knows a lot about something, and a little about everything.
Stephen Brooks
2002-11-11 13:52:01
I'm bumping this thread too because it's interesting, but I've just bumped another more important thread:

http://www.stephenbrooks.org/groupee/forums?a=tpc&s=724606111&f=144606111&m=8766031161

...which has a URL for the new release of manualsend which sends to multiple servers.

Note to any FTP server owners: the user "[[[[[[plastic ducks test" is a test by me and their files can be deleted.


"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"
[DPC]Stephan202
2002-11-11 14:27:52
Plastic Ducks?????

Are you saying those two ducks on the picture beneath your name (an avatar they call that, right?) are plastic?
The photo looks pretty real to me.

---
Dutch Power Cow.
MOOH!
Stephen Brooks
2002-11-11 15:25:41
No those are geese.  There is a 3ft-long plastic duck in the attic here, but I won't go into that.
Me@Home
2002-11-13 12:07:53
For what it's worth, I don't think 1GB a month will be a problem.  I don't even think 1GB a day is a problem.  I don't even think 3-5GBs a day is a problem.  Or at least, my ISP has never told me it is, though they've told me (When I called tech support) that I'm their highest bandwidth residential user in almost all of WA.  big grin

Me razz
(BTW, if I didn't have a dynamic IP, I'd sign up for a server gladly.  Actually, I could use DynDNS and you could point it at a hostname, if you wanted)
Stephen Brooks
2002-11-13 12:35:11
Try it.
Stephen Brooks
2002-11-19 16:34:07
Just to remind everyone where we'd got to, [ARS]Me@Home was contemplating setting up his own FTP server.  Meanwhile, I've got my fixed IP working here so mine's up too.  However I haven't yet contacted the RAL e-science department about this project (and was avoiding doing so until the stats/thing/queue/logjam was solved).  My results-retriever program here could do with some polishing, but is more-or-less working now, and I've put in place things to prevent loss of results.  In other words, phase 1 of distributing the project is nearly done.  I was going to let phase 2 wait until I either ran out of HDD space or bandwidth here to retrieve the results.  Phase 2 will be "distributed hard-disking"* and databasing, filtering of results, and the reason it will improve my bandwidth usage is simply because only the higher results are _really_ important in an optimisation project.  So we can go totally distributed, with no machine needing immense storage or bandwidth.

(* the phrase "distributed hard-disking" was mentioned by someone on this forum as a joke a while ago but I can't remember who... I'll do a search).

My plans are still to let distributed networking phase 2 wait until after v4.22 (which includes re-checking of those rogue results some people are worried about) and then v5 (which simulates a different design and will include infrastructure changes to let it interactively download new designs) have both been released.

I've found that thread now.  Worth reading - it's quite funny.  So the unofficial goal of networking phase 2 is... to assimilate India and China?  eek



"As every 11-year-old kid knows, if you concentrate enough Van-der-Graff generators and expensive special effects in one place, you create a spiral space-time whirly thing, AND an interesting plotline"

[This message was edited by Stephen Brooks on 2002-Nov-20 at 0:42.]
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.org
Yahoo: scrutney_mallard
Jabber: stephenbrooksstrudel characterjabber.org
Twitter: stephenjbrooks

Site has had 16398868 accesses.