stephenbrooks.orgForumMuon1GeneralWould Terrene Bell like his "team" to be merged into one user?
Username: Password:
Search site:
Subscribe to thread via RSS
Stephen Brooks
2004-02-20 07:11:43
Something interesting came out of the duplicate-removal in the stats database just now.  Although on average only 7% of the results were duplicates within the same user's file, the [Terrene Bell] team overall got reduced from 80.1MB of results to 25.0MB of results, between about 15 users.  I thought: "They probably were pasting the sample results into .txt rather than .dat by accident". Fortunately that can no longer happen as at the same time I fixed the duplicate remover, I removed checksums from those files.

Then I got an e-mail from [TA]DanC who had noticed that eight of the Terrene Bell users had identical top scores.  That could also have been an accident (team sharing a common best-results file and pasting that into the wrong file).  But out of curiousity I saw what happened if I concatenated the team's files together and eliminated duplicates in the resulting file.  It reduced from the already shrunken 25.0MB to only 2.8MB!  That is, of 24420 results, 21689 were duplicated, not even counting the in-file duplicates.

So... would it be better if I merged the whole team back into one user?  Or was this some sort of mistake with copying the files?  I also noticed that recently a "University of South Alabama [Terrene Bell]" user appeared, whose results were essentially an exact copy of another user's ("Xul Alal [Terrene Bell]").

The within-user duplicates checker is now working automatically, and I will continue to do "spot checks", particularly on teams, to ensure inter-user duplicationisn't going on either...

HB Pencils, also sold as "Moron's Choice" Graphite Cigars.
Stephen Brooks
2004-02-20 07:23:11
The natural next choice would be to scan DanC's team for duplicates, and I got...

team anandtech 188799 results 567 repeated (0.300% bad)

...as you see, very few repeated ones in there.  They have so many results in total that I might even believe some of those were statistical coincidences, though more likely they were pasting team sample files wrongly sometimes.

HB Pencils, also sold as "Moron's Choice" Graphite Cigars.
[TA]BlackMountainCow
2004-02-20 09:00:35
Hi!

It's very good that u made DPAD waterproof now.  I don't even think that the double results within [TA] were from pasting team sample files wrongly as afaik we've never done that in the team.  Are u going to scan all teams and single crunchers and remove any doublets, so that we have real and rock-solid stats?


Cheers


Christian Diepold
Stephen Brooks
2004-02-20 09:46:33
The stats program now removes duplicates from within a single user's account as they are received, so that form of cheating is ruled out.

I would like to be able to check for duplicates between users automatically too - there are a few ways to do this, but each would take a bit of a while to implement properly (altering the database).  What I'm planning, for now, is to do these "spot checks" on teams from time to time.

HB Pencils, also sold as "Moron's Choice" Graphite Cigars.
[TA]Overkiller
2004-02-20 11:50:26
Thanks for all your excellent work stephen!!  Smile
Jetlag
2004-02-22 15:15:38
BMC asked a question which is of more than a little interest to some of us, and which thus far remains unanswered.  Are you going to scan ALL participants in DPAD in order to "level the playing field," or let it go with your scan of [TA] and [Terenne Bell]?
DanC
2004-02-22 19:04:47
I hope I'm not "speaking out of turn" - but I do believe that Stephen has applied the dupe checker against the entire database at this juncture.

If I'm wrong - then mea culpa - but I could swear I recall his saying this was the case.
Stephen Brooks
2004-02-23 03:08:42
quote:
Originally posted by Kokomo:
BMC asked a question which is of more than a little interest to some of us, and which thus far remains unanswered.  Are you going to scan ALL participants in DPAD in order to "level the playing field," or let it go with your scan of [TA] and [Terenne Bell]?
I've scanned _everyone_ for individual-user duplicates, and this will continue to be the case as the program is scanning every result added.  The other (weaker) sort of cheating is submitting the same result under different names - this can only benefit teams of people.  So I will occasionally do spot-checks on teams for that ([TA] and [Terrene Bell] have been done above).

HB Pencils, also sold as "Moron's Choice" Graphite Cigars.
amdxborg
2004-02-23 09:34:42
OK, how about just rejecting the duplicates (when I started the project, I thought that this was allready being done).  That way someone can send duplicates like mad and not gain any points for it.

______________________________

AMD Athlon XP2600+, Soltek SL-75FRN2-L, Corsair 512MB XMS PC3500, TrueControl 550

[TA]amd.borg
Proud member of [TA] SETI@home & DPAD

[This message was edited by amdxborg on 2004-Feb-23 at 19:16.]
Stephen Brooks
2004-02-23 12:57:11
quote:
Originally posted by amdxborg:
OK, how about just rejecting the duplicates
Yes, that's what is done, for a single user sending duplicates of something they have already sent.

HB Pencils, also sold as "Moron's Choice" Graphite Cigars.
Boots
2004-02-24 06:41:59
Hi Stephen, I have just sent some more processed data, and then had a look at the team stats.  I see that all members had a negative upload of data last week.  (Including myself ).  I just do not understand what is going on.  Is this is due to your software duplicate-removal program?
Boots

Life the Universe and Everything!
amdxborg
2004-02-24 10:54:44
quote:
Originally posted by Boots:
Hi Stephen, I have just sent some more processed data, and then had a look at the team stats.  I see that all members had a negative upload of data last week.  (Including myself ).  I just do not understand what is going on.  Is this is due to your software duplicate-removal program?
Boots

Life the Universe and Everything!


This is what you wantto know.  Wink

______________________________

AMD Athlon XP2600+, Soltek SL-75FRN2-L, Corsair 512MB XMS PC3500, TrueControl 550

[TA]amd.borg
Proud member of [TA] SETI@home & DPAD
: contact : - - -
E-mail: sbstrudel characterstephenbrooks.orgTwitter: stephenjbrooksMastodon: strudel charactersjbstrudel charactermstdn.io RSS feed

Site has had 25160162 accesses.