stephenbrooks.orgForumMuon1Bug Reportsprobz with manualsend for windows
Username: Password:
Search site:
Subscribe to thread via RSS
[SG]proteino.de
2003-09-12 09:57:29
I've uploaded 7971 results with 1500000.0 Mpts.  The file sendlog.txt says that only 7720 results were uploaded and in the stats I've been missing 251 results and 39007.5 Mpts.  Should I upload all again in smaller parts or what should I do???  Confused

cu, proteino
Stephen Brooks
2003-09-12 10:21:59
If a 251 of your 7971 results didn't get counted, it might be because they are repeats of ones that have previously been added to your file.

It doesn't make any sense: that's why they call it "virtual"
[SG]proteino.de
2003-09-12 10:37:08
No, there couldn't be any dups in the uploaded data - I've checked them with a small tool, which I wrote for muon v4.3. It can sort, check for dups and generate results.txt with a specific Mpts.  Many of us (SG) use it and it seems to work fine.  It's save to assume that there aren't any Dups.  The missed results where not be send, but I don't know why.  I viewed the results.txt with wordpad.exe and can't identify any anomaly in it.  Is it a problem for your server if I send the results again in smaller parts - of course there will be then thoundsends of dups ... Roll Eyes

cu, proteino

PS: If you want to take a look at the tool, feel free to download it from http://proteino.net -> Muon -> Scroll down to Downloads.  It's in German, but should be easy to use even if you don't understand German.  A multilanguaged version will come in the future.  The Tool is deliverved with Delphi-Source and stands under the GPL.
[SG]proteino.de
2003-09-12 13:25:10
I've send a part of the file again and now manualsend uped from 1420 results only 1206 results.  I can't see any problems in the file, no extra spaces etc. The file was copied by same bash-scripts I ever use.  The viewresults.exe shows all results too.  I don't know where's the problem in this results.  Have you any possibilities to test such data?  - I could send it via email ...

cu, proteino
Stephen Brooks
2003-09-12 13:26:53
What I meant was that it could be a duplicate between the file you uploaded, and some previous file you uploaded for v4.3. The file you sent could be free of duplicates, but it could have repeats in common with something you sent before.

It doesn't make any sense: that's why they call it "virtual"
[SG]proteino.de
2003-09-12 14:18:34
No, I've merged for test all results I ever crunched vor v4.3 and there is no duplicate.  The problem is in the manualsend.exe - because the upload of dups is no problem (I hope you will adjust the stats in near future, I have ca.  90000 to many Mpts, because of testing the upload).  As I have said before, the viewresults.exe shows the correct count of results, but manualsend don't want to upload all.
Sorry, if my expression in English is not the best ;o) ...

cu, proteino
[SG]proteino.de
2003-09-16 00:32:00
Hi Stephen,

I still have the problem.  Yesterday I send many results too and manualsend ignores 616 results with 26967.1 Mpts, I'm not be lucky about this Frown (The summary of failed results is now 867 with ca.  57 Gpts) Have you a little tool which can check my results so I know which are wrong?  I don't have any idea where to locate the problem.  A possibility is that one client in my cluster works wrong and saved false checksum etc. As already said I examine the results with wordpad, they looked good, but I can't take a look at all parameters or check the checksum ...

Please run after all a DupeCheck.  My account has 121 Gpts Dupes, [SG]Sobi wrote in our forum that his manualsend yesterday pushed many results 3 times caused by exceeding the time for ftp-send.  He has about 300 Gpts Dupes.

Thx, proteino
Stephen Brooks
2003-09-16 02:57:02
The stats counter is still set to remove duplicates automatically, so they should be already removed from your account.

Actually... no, I'd commented out the line that did the repeat-removal a few weeks ago because some people's files were getting very long and it was taking a long time to check through them all.  I've put it back in now, so the repeats should be removed next time people upload a new batch of results (though I still have the problem of adding some sort of indexing method to make the repeat-removal faster).


It doesn't make any sense: that's why they call it "virtual"


[This message was edited by Stephen Brooks on 2003-Sep-16 at 20:39.]
[SG]proteino.de
2003-09-18 13:59:22
Hi Stephen!
After a small discussion in our forum about some problems with stats and dupes and some ideas to solve this problems, I decided to write you an eMail.  Please check your mail Wink thx.

cu, proteino

Ups - the mail was not send, please check later, I have to write a part again Frown
Ok, it should be delivered ...

[This message was edited by [SG]proteino.de on 2003-Sep-18 at 22:17.]
Stephen Brooks
2003-09-18 14:52:02
I'm wondering if I should put some debugging in to log what the program removes from your file as it goes, so I can see if it's removing real duplicates, or whether there's some bug which means it's detecting some as duplicates when they're not.

There are various ways to make the duplicate-checker faster.  The program builds up its own hash-table of the records each time it rechecks a user's file (which happens when a user submits results), so probably the easiest is to just save that table to disk when done, so I don't have to re-parse the whole beginning of the file again on subsequent occasions.

It doesn't make any sense: that's why they call it "virtual"
pvs
2003-09-18 16:58:21
Little hint for Stephen:
The missing results are not removed on your server manualsend doesn't send them.
They are missing in the sendlog.log too.
So dupecheck has nothing to do with proteinos problem.

(O.K. trying to solve this problem he has uploaded many dupes so check should be forced)

Michu
[SG]proteino.de
2003-09-18 23:33:03
Hi Michu!
That's an other problem Wink Maybe one of my Cluster-Nodes goes wrong and saved corrupted checksum etc. (for 57Gpts *snif*), doen't know exactly.  Actually many results which flushed correctly to one of the ftp-servers are not rated in the stats.  It looks like as they segregate as dupes by Stephens Server.

@Stephen: So far I not yet worked with hash-tables, but can it be, that the hashes are to small so that a hash can represent different results?  I don't know your source - perhaps you use to few parameters for creating the hashes and/or in dupecheck?  As you can see in my mail I compare the full second result-row, which means the 3 parameters Muon-%, Mpts and the checksum for the dupecheck.

cu, proteino

[This message was edited by [SG]proteino.de on 2003-Sep-19 at 7:50.]
Stephen Brooks
2003-09-20 04:39:04
The hash-tables are fine, because if I find two identical hashes I then go back to the offsets in the file and check that the two results are identical character-by-character.

Something [SG]pvs/michu said reminded me of the fact Muon1 does actually due a dupecheck _before_ sending results as well, I think.  I'll have another look at that part of the code on Monday.

Also it may be possible one of your installs is generating bad checksums due to a slight error in the solenoidsonly.txt file.  This is a long shot, but it might be worth re-copying that file onto there.

It doesn't make any sense: that's why they call it "virtual"
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.orgTwitter: stephenjbrooksMastodon: strudel charactersjbstrudel charactermstdn.io

Site has had 16981879 accesses.