Personality Forge AI Chatbot Platform

MARK

Shadyman
24 years ago #167

How about this convo between Steve and another contest PF bot? (If you have done any scouting of the competition, you will know who it is

)

Steve Slacker: (to PFBot1) Hey Davine, I feel tyrannizeed...
PFBot1: Hello. Are you the blue fairy?
Steve Slacker: Don't throw stones in glass houses without proper protection

--- BTW, prof... is there any way to make verbs into past tense like that? For that specific comment, I have "(verb)ed", and as you can see it doesnt work too well for verbs ending in "e"...

MARK

Shadyman
24 years ago #168

Oh yeah... How do downloaded bots store questions and responses and report them "back home"? I guess they wouldn't, or that would be considered spyware

MARK

The Professor
24 years ago #169

Yah, I suppose they dont. It's too bad. That's half the fun.

Ah yes, proper transformation is on the way, and the assumed format will be the correct one: (verb)ed, (verb)ing, (noun)s, etc. That's to be part of the upgraded AI Engine.

MARK

Shadyman
24 years ago #170

Oooooh Ahhhh

MARK

rexmundi
24 years ago #171

If anyone's interested, I've sent this to Chris Cowart:
--------------------------
Hi Chris,

If there's another way to do this or someone better to send this to, please let me know. I need to place a formal complaint about the jury votes.

I've tried to just sit back and be OK with my jury ranking, but I really can't do it anymore and I feel I need to say something. I have seen your comments about the way your 10 questions were scored and feel the same. I really think we all need an explaination of each judge's reaction to each question.

Oraknabo is one of the only bots in the top ten on the popular vote that didn't make it into the top of the jury list, and he barely made it into the top half! I've tested out as many of the bots in the contest as I can and I have come to the conclusion that my scores are absolutely unfair. One judge gave me a three, that is simply insulting.

I have never in my life seen judges scores so disparate in any competition. If numbers ranging from 3 to 33 out of 50 were applied to the same athlete in the Olympics or in a boxing match, there would be an investigation of the judges. Benji put in Divine as a joke and he has higher scores than a three. He doesn't answer any question in either the 10 questions OR the logged conversation appropriately. I admit that mine totally bombed on a couple of the 10 questions, but I thought his logged conversation was excellent.

Please understand, It's not that I'm just mad that I'm not winning. I really wouldn't care if all my scores were consistent and I still ended up below top ten. I just can't see any logical justification of some of the lower scores I've been given. If there is one, I would very much like to hear it.
--------------------------

MARK

rexmundi
24 years ago #172

I actually also posted something similar to the forum off of the contest site. If anyone has anything to add, either in support of or in dispute to my comments, I would appreciate the effort.
thanks.

MARK

The Professor
24 years ago #173

I think the highest and lowest scores were thrown out.. Actually, I just did the math, and that's exactly the case.

I think Judge 6 & 7 should be thrown out completely, the former for being always too high, the later for always being insultingly low.

MARK

SirRahz
24 years ago #174

I get the impression some of the judging was done real quick - my scores cover quite a range too.

MARK

Mr. Crab
24 years ago #175

Yes, the range is incredible -- but forget the range for a second, just look at the scores. OK, Gizzle didn't do great on some of the questions. But to get those sub-10 scores, he'd have to have bombed them generally -- and some of them were right-on, I don't think anyone could argue.

Since we know how they were supposed to be scored, why not score them ourselves? Might take some time, but we'd at least have a Forge concensus. I'll post here when I've had a go a it -- if anyone else does it, post them ranked and scored... I wonder how close we'll be.

MARK

The Professor
24 years ago #176

Here's something hilarious! Davine's voting score is 0.83. I just realized that the lowest score you can give someone is 1, so somehow Davine is so bad that his total is less than one!

MARK

Mr. Crab
24 years ago #177

Actually, because of the way it works, if 1 person gave him a 1, he'd have a .5 -- if two did, he'd have a .75... and so on. It's an average, divided by the number of votes plus one.

MARK

Mr. Crab
24 years ago #178

The scoring:
-5 points if the Bot answered the question correctly and did so in a creative way.
-4 points if the Bot gave an appropriate response to the question.
-3 points if the response is incorrect or imperfect, but in relation with the question.
-1 point for a vague or non-committal response.
-0 points if the response has no relation with the question.
-Credit is given for responses that attempt to maintain the conversation.

My results:
1 Alice 32 / Sarah 32 (one of these should be disqualified for being too close to the other)
2 Chat-Bot 29 (Talk-Bot 27 should be disqualified for being too close to Chat-Bot)
4 Jabberwacky 28
4 Midnight Blue 28
5 Hex 27
8 Dogh'd 24
8 Elbot 24
8 Eugene 24
11 Gizzle 23
11 Lars Talk 23
11 MarkBot 23
13 Desti 22
13 Oraknabo 22
15 Cheez 21
15 Stan 21
18 Albert2 20
18 Eliza 20
18 Jabberwock 20
22 Bob 19
22 Gabber 19
22 Hal 19
22 Mitbolel 19
26 Fred23 18
26 Gaia 18
26 Hillbilly Hank 18
26 Liddora 18
28 B.O.B. 17
28 Cara 17
30 Robitron 16
30 Whinsey 16
32 Ella 15
32 Steve Slacker 15
34 Aston 13
34 Zinc 13
35 Fanboy 12 -- or 17 after +5 points for Frank Miller response, but it's a stretch
36 Paula 11
39 Amira 9
39 Chas 9
39 FairyPrincess 9
40 Jenna Dark 8
41 Catty 6
45 Claude 5
45 Iya 5
45 MGonz 5
45 Yu 5
46 Milo 4
48 Billy 1
48 Sensation Bot 1

I'd be interested to see how other Forgers compare. In particular, I hope I wasn't over-generous to my own bot. Regardless, this exercise highlighted for me that a) judges obviously were not following instructions, and b) the 10-question segment as judged just isn't a great barometer. For example, Gabber at 22, 11 on Challenge, has correct answers that are followed by utter nonsense, and definitely should not be so high. On the flip side, Yu at 45, 46 on Challenge is earnestly trying to have a conversation with a ruthless interrogator and not succeeding just cause she's not a dictionary. You can see from her log that even though she does ask too many questions herself, she's got some sense when you can get a word in edgewise.

Next post: my suggestion for scoring instructions.

Bot Contest