stephenbrooks.orgForumMuon1Q&ARunning muon1 without any sampleresults/results.dat
Username: Password:
Search site:
Subscribe to thread via RSS
Rocko
2004-03-28 16:58:41
What would be the best way to find results that have not yet been found by others running the program?  Could this be achieved by removing the sampleresults file and clearning out results.dat?  I did this, but I have really shitty muon %.
What I want to know is how I can simulate unique results - in order to help the project more.  Can this be accomplished with using sampleresults, or will the simulated results not be 'unique' ? 


Sorry for my poor vocabulary.  I dont really have a good understanding of how the program works; from what I gather it uses a genetic algorithm to mix around random variables and simulate the results, those variables coming from the sampleresults.  Is this true, or am I completely lost?
[DPC]Stephan202
2004-03-28 21:40:31
You're not lost.  Smile

The difference between using and not using the sampleresults is this:
o When you use it, the results in the sample file will be kind of mixed to produce slightly other results.  Since there are quite a lot good results in the sample file, this results in a high % for your new results.

When you don't use the sample file, you start from scratch, and most early work is almost completely random.  It will take a long time to get high yields.

The latter produces more unique results, but the chance of finding something really good is a lot smaller.  If you have multiple computers, it's best to find a balance between computers on which you do and on which you don't use the sample file.
Rocko
2004-03-29 09:33:00
Thanks for the prompt reply. 
Ill have my P2 400 run it without the sampleresults, and my Barton 3200 run it with a 500 results sample file.

Is there any way to 'optimize' the client so I can get better muon %, or is getting a sampleresults file and letting it run 24x7 the only way?  Currently, my best result is around 9.8864, with ~ 263,000 mpts.
[DPC]Stephan202
2004-03-29 13:18:52
There are some ways of manually trying other configurations, but right now I can't tell you about how and what.  Just running the client will be good enough.  Only very few people do more than that.
Rocko
2004-04-02 20:18:55
Just a quick question - If I start with a blank results.dat and tell the program not to download any sampleresults - should my simulations get gradually better (in terms of best muon%) as I run the program for longer and longer ?
Boots
2004-04-02 20:31:11
I have been running from scratch with out d/l,ing sampleresults.  I have now 4173 processed, with a % of 3.444 at this time.  It looks like to get to %9+ is going to take a long time!  Do not know if will get passed %10.
Hope this helps.  I also run another one with sampleresults and show %9.893 just to see what will come with not d/l'ing sampleresults.
Boots
Googlybear
2004-04-23 03:20:55
Hi All,

I have just started using MUON today and have been reading how it works.
As far as the topic of this thread and in terms of the overall project, what is the prefered method you want us to use?  with or without the sample file?

Thanks
Googlybear
[TA]z
2004-04-23 09:01:41
Just an assumption, but I imagine Stephen would almost always prefer that we utilize the sample results.  Right now as 4.34 wraps up, we seem to be finally peaking out the Solenoids to 15cm.

Granted running without the sample results may result in a more efficient breed, the odds of that happening at this point are quite minute.

Once 4.4 is published I may take several clients and allow them to operate apart from the public samples, but I really doubt they will ever produce higher yields.
kitsura
2004-04-23 19:08:41
I have always been running all clients without the sample files and I noticed that some clients seem to increase its yield faster than others.  So its sometimes possible (but very rare) that a client might jump 1% in yield over 1 day.  So far I'm around 6% but if I have faster PCs running the client I might be able to come up with a new breed much faster.  Unfortunately I have no PC's with that much power.  I do have 1 client running off a P4 2.6GHz but its still below 4% and no all my clients are using Intel CPUs.
Stephen Brooks
2004-04-25 09:17:11
The strange thing is, I haven't really found a good way to analyse the evolultion process in order to decide whether or not using the sample files is universally a good thing.  I think the majority of the time it will be a good thing, but it's worth having a few people keeping their machines on separate tracks just to make sure we haven't missed anything.  One thing you can also do is to make your own personal best-results and share those between your computers without using the overall best.  I guess you could also look to see if your gradually-improving results are getting more and more like the project best, or whether they've got some persistent differences.

The analogy that keeps coming to mind is that in nature, isolated habitats end up having weird toucans and stuff on them because they're not homogenised by interacting with everything else.  So there's some reason to think seperating a few parts out might get to interesting places.  Sometimes you might just end up with Dodos though.
kitsura
2004-04-26 00:18:04
I don't know how the evolution works but I'm running all clients totally standalone and not sharing any of the results.dat even between my own clients.
[TA]JonB
2004-04-29 04:36:26
Starting yesterday (April 28th), it appears that the downloaded sampleresults file is now causing a rash of 10% finds.  Looking at the graph on the main page near the bottom, it does appear to stay level, then jump to a new efficiency.  Your feedback system of samples definitely works. 

While I am definitely in this for the Mpts (gotta keep my team happy), it is great to see a jump in efficiency like that.
MTX
2004-05-01 11:11:19
Stephen, have you considered using a PLS method within the program to guide the evolution process?  It seems to me that GAs, while they get the job done eventually, are seldom the most efficient approach.  Following the success of QSAR, do you think that this might be a method worth pursuing in this project?  Since the QA approach is important, maybe a QA-SIMCA-PLS cycle would be suitable?

Regards

MTX
MTX
2004-05-01 11:13:23
-- correction: incorrectly typed "GA" as "QA" a couple of times.
Stephen Brooks
2004-05-04 01:10:41
Though I hadn't come across those algorithms by name before (it sometimes seems that each area of science [biologists/chemists/physicists] has to invent the same algorithms under different names!), I've got one TrialType in the current algorithm that does a regression fit (just a linear one) on some of the data.

The "LocalGrad" type first selects a radius and then does a least-squares fit for the yield being a linear function of the unknowns within that radius.  It then takes a step of the order of the radius being used, in the uphill direction.  I think your PLS is a bit more sophisticated but it appears mostly to be used for classification rather than optimisation.  Anyway, in tests I did, the LocalGrad step appeared to do slightly better than the best of the rest I had in there.  The next-best two appeared to be the "extrapolate" (take two good points and pick somewhere along their axis but not between them) and "MuOne" (mutation where you just change one variable!) types.

[edit] Oops - no, LocalGrad is not a least-squares fit because a least-squares fit on _all_ variables, with sparse data, usually ends up fitting in a stupid way.  No, LocalGrad actually does roughly what a non-overtrained single-layer perceptron would do given the same data.  In fact this is simpler to implement as it doesn't involve inverting a matrix and is hence more stable.
MTX
2004-05-06 10:20:14
That's certainly interesting to know, and I believe you are already going somewhat in the general direction of what I am suggesting here.  I do think, however, that you would be interested in giving PLS (or its parent algorithm, PCA) a whirl; some of the things it can do are really quite incredible.  I can't go into too much detail about what I use it for because of the nature of my work (I'm a physical chemist), but I can assure you that it is a truly invaluable tool for large-scale data interpretation -- and experimental design.  I'd never be without multivariate capability again now after seeing what it can do on such a wide range of data.

The reason I specifically mentioned PLS is because, oc course, Muon1 generates a set of "independent" variables (the accelerator parameters) and tests these against a "dependent" variable (percentage of particles conserved), the basic components of a PLS model.  The important bit is that given enough data (and I think you'll agree, you have a lot already, perfect for this technique) you can create a regression model.  You can therefore see if there are a specific set of parameters towards which all the individual "organisms" are evolving, but you can also classify different routes of evolution (which is more the SIMCA part of it).  However, most importantly for you, you can extrapolate from the model as to what might be an effective set of designs and prefer those routes of evolution in some proportion of your clients.  As I mentioned above, a cyclic approach (GA, classification, optimisation, GA...) might be just what you're looking for here.

Overall, my recommendation would be to invest in a copy of SIMCA-P or MINITAB.  Even if it turns out not to be useful for this particular project (and let me assure you, the limits of utility for this technique are not easily reached), you will be (as I was) utterly stunned when you see what results can come out.

Best regards

MTX
Stephen Brooks
2004-05-07 11:49:23
OK, I know about PCA - was in fact going to use it at some stage to create interesting "pictures" of the muon results in 2D to see the primary directions along which we were working.

A while ago someone was asking if I should "look at each parameter individually and see where the optimum value for each" was, but I rejected that as too rigid, as my parameters are probably highly correlated.  I think if I first converted to the highest-scoring principal-component variables and then tried that, it could be a useful trialtype.  I just read something (briefly) about PLS, but it appears to be an entirely linear model - the trouble with those is that they can't have maxima (except at the corners of the space), whereas this 'accelerator' function almost certainly does have maxima that aren't on the boundary.

One thing you could try, though (if you are interested and have time) is applying your own analysis to the results and submitting the resulting candidate via the TEST capability of Muon1. [Write the genome (s1f=264;d1l=368; etc.) to be tested on one line in queue.txt and TEST <SolenoidsTo15cm> on the second]. If you want more results to work from than your own results.dat you could try fetching the 500-sample regularly - I think it's updated hourly and tends to include different results every time.  Then just run muon on it and it should pick it up (after any queues results) and it will appear as a valid, calculated result in results.txt (and .dat).
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.orgTwitter: stephenjbrooksMastodon: strudel charactersjbstrudel charactermstdn.io RSS feed

Site has had 25159698 accesses.