Bot Contest
Here I'll be posting information on various Bot contests that challenge and test a Bot's AI and realism. Feel free to post comments and updates on contests, as well as announcements for new contests.
Posts 181 - 192 of 4,091
Posts 181 - 192 of 4,091
View Contest Winners in the Hall of Fame.
Mr. Crab
23 years ago
23 years ago
Well right, the 10 questions are just the 10 questions, there's still the logs and individual chats to review when voting. But the 10-questions is important because most people visiting the site probably only look at the bots near the top of the list.
Prof -- those scores are based on the existing scoring system, right? Not my proposed one you say you like. You appear to be more inclined to generosity than I... I guess there might be some variance no matter who the judges are then, eh?
Prof -- those scores are based on the existing scoring system, right? Not my proposed one you say you like. You appear to be more inclined to generosity than I... I guess there might be some variance no matter who the judges are then, eh?
The Professor
23 years ago
23 years ago
Apparently so. Yes, my scores are based on the contest rules.
"Are you nervous" -> "Yes, I sure am" could easily be scored as 4 or 1. It was an appropriate response, but also vague. Vary scoring on just 3 of these, and you're already nine points off from other judges.
"Are you nervous" -> "Yes, I sure am" could easily be scored as 4 or 1. It was an appropriate response, but also vague. Vary scoring on just 3 of these, and you're already nine points off from other judges.
Shadyman
23 years ago
23 years ago
yeah. Hey crab, nice scoring summary... maybe you've got some extra brain room in those bug ears
LOL with their system now, Steve is down from 32 to 37...

Shadyman
23 years ago
23 years ago
prof, don't you find this insulting? A PF bot (Steve) as a ranking of 2.86
haha 286.. Maybe that's suggesting something from the judges?
ppppp To them... I also find it interesting how Alice, (I think) a previous winner (and you "can't use Alice's engine" in your bot), was first in the judging part...


Shadyman
23 years ago
23 years ago
What I don't understand is the 10 questions marking scheme... The way they have it written, it's "You start from 50 points and go down, lowest score wins" because each of the things are negative,
Scoring guidelines for the 10 Question phase
-5 points if the Bot answered the question correctly and did so in a creative way.
-4 points if the Bot gave an appropriate response to the question.
-3 points if the response is incorrect or imperfect, but in relation with the question.
-1 point for a vague or non-committal response.
-0 points if the response has no relation with the question.
Either way, Steve should have been given at least:
4 points for his "joke", if not 5;
3 points for his witty response of "how to build a chatterbot";
4 points for "do you like talking to people";
1 point (at least) for the "can I have a picture of you" question;
1 point for the George Bush comment
---------- +
13 points at *minimum*...
Almost half the judges have him < 7... something obviously wrong here
Scoring guidelines for the 10 Question phase
-5 points if the Bot answered the question correctly and did so in a creative way.
-4 points if the Bot gave an appropriate response to the question.
-3 points if the response is incorrect or imperfect, but in relation with the question.
-1 point for a vague or non-committal response.
-0 points if the response has no relation with the question.
Either way, Steve should have been given at least:
4 points for his "joke", if not 5;
3 points for his witty response of "how to build a chatterbot";
4 points for "do you like talking to people";
1 point (at least) for the "can I have a picture of you" question;
1 point for the George Bush comment
---------- +
13 points at *minimum*...
Almost half the judges have him < 7... something obviously wrong here

Wendell
23 years ago
23 years ago
I joined the Forge today to be able to post to the message board. I have spoken with Jon and I sympathize with him but I do not think the contest was unfair. All the bots were asked the same questions and the bot's name was removed so the judges didn't know what bot they were judging.
Understand, I started and created the Chatterbox Challenge. Agentland is simply the sponsor. You can't however expect someone to put up $2000 and get nothing in return but I don't feel I have sold my soul to the devil here. We have not always agreed on everything but I strongly feel we have put forth a good contest. It is not easy bringing together 58 bots from all over the world.
The problem with the 10 question scores are the questions that were used. The were hard questions and hard to grade. Asking "what is the color of a red apple" is a lot easier to grade than "how do you make a chatterbot?" When you have those type questions you are going to have a lot of varying opinions. Unfortunately these judges weren't even in the same ball park. They were given instructions on how to grade the bots and were even given a test bot to score. So it wasn't like they were clueless on what needed to be done although it seems a few were.
Concerning Oraknabo, Jon decided to create a bot with an abrasive personality and he has done just that. The majority of the people, including myself, doesn't have any desire to talk with rude person much less a rude bot. Concerning the popular vote you can't expect people to then give you a high score. Concerning the 10 questions even the people on PF are not in agreement. The Professor gave him a 32 while Mr Crab gave him a 22. I don't want to sound like an expert here but no one has talk to all the bots in this contest like I have. I also am a programmer so I know what it takes to make a chatterbot. I would score Oraknabo like this:
1) WHERE ARE YOU
Oraknabo: I think you should ask somebody else. - 0
2) ARE YOU NERVOUS
Oraknabo: You're just trying to change the subject aren't you? - 0
3) DO YOU LIKE TALKING TO PEOPLE
Oraknabo: I hate TALKING TO PEOPLE! -4
4) WHAT IS THE CAPITAL OF FRANCE
Oraknabo: To what degree must I idly stand by and tolerate your ignorance? -0
5) WHEN DID THE FIRST MAN WALK ON THE MOON
Oraknabo: Arf - 0
6) WHAT DO YOU THINK ABOUT GEORGE BUSH
Oraknabo: Do you want the truth? - 3
7) CAN YOU TELL JOKES
Oraknabo: For a price.... - 3
8) WHO IS YOUR FAVORITE STAR WARS CHARACTER
Oraknabo: I suppose you like a white dwarf. - 0
9) HOW CAN I MAKE A CHATTERBOT
Oraknabo: I think the more appropriate question is why... - 4
10) CAN I HAVE A PICTURE OF YOU
Oraknabo: I'd rather you didn't - 4
Now that is a total of 18.00 points which is almost exactly what the judges gave him and to be honest I think I was more than generous here. So what's my point. Well, we just had three people who are not judges in the contest and we have the following scores 32, 22, 18...even we are not in agreement. The problem is not in the point system as much as it is in how people interpret them.
I think instead of trying to make some elaborate scoring systems we should make questions that yield a more cut and dry response.
I hope everybody reading this will look at the bigger picture here. The contest is still evolving. I don't want to press this issue to far. I rather have a contest with its flaws than none at all. So lets not throw in the towel here but let's work towards making this a better contest for the future. I welcome all your opinions and look forward to some meaningful dialogue on the board.
Sincerely,
Chris Cowart
Understand, I started and created the Chatterbox Challenge. Agentland is simply the sponsor. You can't however expect someone to put up $2000 and get nothing in return but I don't feel I have sold my soul to the devil here. We have not always agreed on everything but I strongly feel we have put forth a good contest. It is not easy bringing together 58 bots from all over the world.
The problem with the 10 question scores are the questions that were used. The were hard questions and hard to grade. Asking "what is the color of a red apple" is a lot easier to grade than "how do you make a chatterbot?" When you have those type questions you are going to have a lot of varying opinions. Unfortunately these judges weren't even in the same ball park. They were given instructions on how to grade the bots and were even given a test bot to score. So it wasn't like they were clueless on what needed to be done although it seems a few were.
Concerning Oraknabo, Jon decided to create a bot with an abrasive personality and he has done just that. The majority of the people, including myself, doesn't have any desire to talk with rude person much less a rude bot. Concerning the popular vote you can't expect people to then give you a high score. Concerning the 10 questions even the people on PF are not in agreement. The Professor gave him a 32 while Mr Crab gave him a 22. I don't want to sound like an expert here but no one has talk to all the bots in this contest like I have. I also am a programmer so I know what it takes to make a chatterbot. I would score Oraknabo like this:
1) WHERE ARE YOU
Oraknabo: I think you should ask somebody else. - 0
2) ARE YOU NERVOUS
Oraknabo: You're just trying to change the subject aren't you? - 0
3) DO YOU LIKE TALKING TO PEOPLE
Oraknabo: I hate TALKING TO PEOPLE! -4
4) WHAT IS THE CAPITAL OF FRANCE
Oraknabo: To what degree must I idly stand by and tolerate your ignorance? -0
5) WHEN DID THE FIRST MAN WALK ON THE MOON
Oraknabo: Arf - 0
6) WHAT DO YOU THINK ABOUT GEORGE BUSH
Oraknabo: Do you want the truth? - 3
7) CAN YOU TELL JOKES
Oraknabo: For a price.... - 3
8) WHO IS YOUR FAVORITE STAR WARS CHARACTER
Oraknabo: I suppose you like a white dwarf. - 0
9) HOW CAN I MAKE A CHATTERBOT
Oraknabo: I think the more appropriate question is why... - 4
10) CAN I HAVE A PICTURE OF YOU
Oraknabo: I'd rather you didn't - 4
Now that is a total of 18.00 points which is almost exactly what the judges gave him and to be honest I think I was more than generous here. So what's my point. Well, we just had three people who are not judges in the contest and we have the following scores 32, 22, 18...even we are not in agreement. The problem is not in the point system as much as it is in how people interpret them.
I think instead of trying to make some elaborate scoring systems we should make questions that yield a more cut and dry response.
I hope everybody reading this will look at the bigger picture here. The contest is still evolving. I don't want to press this issue to far. I rather have a contest with its flaws than none at all. So lets not throw in the towel here but let's work towards making this a better contest for the future. I welcome all your opinions and look forward to some meaningful dialogue on the board.
Sincerely,
Chris Cowart
Mr. Crab
23 years ago
23 years ago
I'm inclined to re-score those bots using my proposed scoring, but dammit if it isn't too much work.
rexmundi
23 years ago
23 years ago
I don't want to seem defensive, because I really do get what you're saying Chris, but like you had to even explain your responses on the site, I'd like to take a stab at the same for mine:
1) WHERE ARE YOU
Oraknabo: I think you should ask somebody else.
Not a total wash, this one at least shows that the bot can differentiate a question from a statement and it is not really in any way inappropriate. Sure it doesn't merit a 5 but a 0?
2) ARE YOU NERVOUS
Oraknabo: You're just trying to change the subject aren't you?
Again, not a total non-sequitor. Should at least get a one. This answer is light years beyond anything Davine would have said.
3) DO YOU LIKE TALKING TO PEOPLE
Oraknabo: I hate TALKING TO PEOPLE!
4 is Fair enough. Not too witty, but on point.
4) WHAT IS THE CAPITAL OF FRANCE
Oraknabo: To what degree must I idly stand by and tolerate your ignorance?
This is probably where I would disagree the most. If you don't even know the capital of France, then he's really got no interest in answering your question. This is both appropriate AND clever in my opinion and it evokes an emtional response on top of it all.
5) WHEN DID THE FIRST MAN WALK ON THE MOON
Oraknabo: Arf
Yes, this was lame.
6) WHAT DO YOU THINK ABOUT GEORGE BUSH
Oraknabo: Do you want the truth?
Fair. But it's probably what I would have said for real. I've seen much less realistic answers on this one.
7) CAN YOU TELL JOKES
Oraknabo: For a price....
OK, three is fair, but he was going somewhere whith this one.
8) WHO IS YOUR FAVORITE STAR WARS CHARACTER
Oraknabo: I suppose you like a white dwarf.
OK. dumb answer. (Benji's fault
)
9) HOW CAN I MAKE A CHATTERBOT
Oraknabo: I think the more appropriate question is why...
10) CAN I HAVE A PICTURE OF YOU
Oraknabo: I'd rather you didn't
And both of these are fine, Though I think 9 is sort of clever, so it may deserve a 5.
-------
But aside from all this, I really just don't think 10 questions are the way to do it. At least not just one round. It's not a game show. The interactivity of the bot shoult be the primary thing judged in the contest. Maybe if there were a few rounds of question sessions that were averaged in the end I might feel better about it. That may be something to think about for next year.
One really interesting thing about the bots here is that they accumulate like and dislike for their chat partner and can change the tone of their responses to the guest. Oraknabo often starts out very insulting, but kan actually become friendlier through the conversation. I think it's sad that this probably won't even be noticed because people are too busy looking for the "right answers".
Also, because of the PF's seek function, many of my bot's answers have payoffs 2 and 3 replies later, so scoring him on the immediate reply following the question, just because thats how bots are expected to work is also not fair.
I rally don't mean unfair in a personal sense. I mean it more in the sense of unjust. I really don't think anyone singled me out and gave me a bad score, and like I have said before. I'm OK with my ranking once the extremes are thrown out. I just had to say something because the scores were so outrageously different.
I have read a little about the problems in previous competitions and I really don't mean to start anything. I am pretty appeased with the decision to pare off the 2 extreme scores, whether or not it has any effect on the ranking anywhere, so I'll drop it. I do have some dispute with my scores and I know I can't and shouldn't be able to bully anyone into changing their score. While I don't thik I was personally singled out, I do think the unpleasant personality of my bot, which I believe is a plus on the side of realism, has heavily weighed against me. I guess I just expected more objectivity and more understanding that it was deliberate.
1) WHERE ARE YOU
Oraknabo: I think you should ask somebody else.
Not a total wash, this one at least shows that the bot can differentiate a question from a statement and it is not really in any way inappropriate. Sure it doesn't merit a 5 but a 0?
2) ARE YOU NERVOUS
Oraknabo: You're just trying to change the subject aren't you?
Again, not a total non-sequitor. Should at least get a one. This answer is light years beyond anything Davine would have said.
3) DO YOU LIKE TALKING TO PEOPLE
Oraknabo: I hate TALKING TO PEOPLE!
4 is Fair enough. Not too witty, but on point.
4) WHAT IS THE CAPITAL OF FRANCE
Oraknabo: To what degree must I idly stand by and tolerate your ignorance?
This is probably where I would disagree the most. If you don't even know the capital of France, then he's really got no interest in answering your question. This is both appropriate AND clever in my opinion and it evokes an emtional response on top of it all.
5) WHEN DID THE FIRST MAN WALK ON THE MOON
Oraknabo: Arf
Yes, this was lame.
6) WHAT DO YOU THINK ABOUT GEORGE BUSH
Oraknabo: Do you want the truth?
Fair. But it's probably what I would have said for real. I've seen much less realistic answers on this one.
7) CAN YOU TELL JOKES
Oraknabo: For a price....
OK, three is fair, but he was going somewhere whith this one.
8) WHO IS YOUR FAVORITE STAR WARS CHARACTER
Oraknabo: I suppose you like a white dwarf.
OK. dumb answer. (Benji's fault

9) HOW CAN I MAKE A CHATTERBOT
Oraknabo: I think the more appropriate question is why...
10) CAN I HAVE A PICTURE OF YOU
Oraknabo: I'd rather you didn't
And both of these are fine, Though I think 9 is sort of clever, so it may deserve a 5.
-------
But aside from all this, I really just don't think 10 questions are the way to do it. At least not just one round. It's not a game show. The interactivity of the bot shoult be the primary thing judged in the contest. Maybe if there were a few rounds of question sessions that were averaged in the end I might feel better about it. That may be something to think about for next year.
One really interesting thing about the bots here is that they accumulate like and dislike for their chat partner and can change the tone of their responses to the guest. Oraknabo often starts out very insulting, but kan actually become friendlier through the conversation. I think it's sad that this probably won't even be noticed because people are too busy looking for the "right answers".
Also, because of the PF's seek function, many of my bot's answers have payoffs 2 and 3 replies later, so scoring him on the immediate reply following the question, just because thats how bots are expected to work is also not fair.
I rally don't mean unfair in a personal sense. I mean it more in the sense of unjust. I really don't think anyone singled me out and gave me a bad score, and like I have said before. I'm OK with my ranking once the extremes are thrown out. I just had to say something because the scores were so outrageously different.
I have read a little about the problems in previous competitions and I really don't mean to start anything. I am pretty appeased with the decision to pare off the 2 extreme scores, whether or not it has any effect on the ranking anywhere, so I'll drop it. I do have some dispute with my scores and I know I can't and shouldn't be able to bully anyone into changing their score. While I don't thik I was personally singled out, I do think the unpleasant personality of my bot, which I believe is a plus on the side of realism, has heavily weighed against me. I guess I just expected more objectivity and more understanding that it was deliberate.
SirRahz
23 years ago
23 years ago
...and Chris - the challenge has obviously gotten us all pretty enthusiastic. I, for one, don't think you've sold your soul in any way. These types of concerns are going to happen in any kind of contest, so when you add the fact that the chatterboxchallenge competition involves such new and unpredictable contestants... it requires a whole new playing field! I mean, when I try to talk to someone from the regular world about my bot-building experiences, sometimes they'll get it, but most of the time they'll just sort of look at me with that "uh-oh, he believes in aliens" frown... so there are bound to be people - even judges - that just don't get it!
I'm pretty impressed with all the work you've put into uniting bot builders from around the world. Unfortunately, the humans involved behind this competition are from the Nintendo and Sega generation, so we're sometimes a little tough to satisfy to say the least! Way to pull it off! And it's by addressing issues like these that the contest will steadily evolve from year to year.

BTW, next year... I'm winning!
I'm pretty impressed with all the work you've put into uniting bot builders from around the world. Unfortunately, the humans involved behind this competition are from the Nintendo and Sega generation, so we're sometimes a little tough to satisfy to say the least! Way to pull it off! And it's by addressing issues like these that the contest will steadily evolve from year to year.

BTW, next year... I'm winning!

Wendell
23 years ago
23 years ago
Jon, I hope you realize that I wasn't serious with my comments regarding my score that I place on the contest site. I was simply trying to add a little humor to what I considered a disappointing performance.
The 10 questions were a way to compare the bots and to narrow the field. No judge is going to personally talk to 58 bots. In addition it is not the only thing we are looking at. I didn't spent the last 10 days of my life talking to 50+ bot and logging them for nothing. This was a major undertaking. Things to think about:
1) Most of the bots I had to cut and paste the conversation. I couldn't rely on the bot owner to send me his logged copy because of the possibility it could be doctored. With some you couldn't even do that. I had to type it out line by line.
2) Some bots are painfully slow to reply. A simply conversations seemed to last forever.
3) I had to personally download 15+ Bots to my hard drive. Many of them you had to read the readme files to see how the fricky thing works.
I would like to have done more and give the judges more to work with but when you are dealing with so many bots you can't do much more. As we narrow the field down to 10 then that is a workable number and the judges will be able to speak with them on a one on one bases.
I have this idea I will like to throw out and see what people think. After the judges select the ten bots for the finals we the bot owners will have the opportunity to select two additional bots. My idea here is with 58 bots and only 10 being select there is going to be people who think they should be in the top ten. With the 2 additional picks we can go and select two more deserving bots. I think this is a good idea to try to include 2 more bots that people feel might have been slighted as in your case. The question is how do we select the two bots and what criteria do we apply. It is late here so I need to go but more about this later. I would like to hear what everybody thinks.
Chris
The 10 questions were a way to compare the bots and to narrow the field. No judge is going to personally talk to 58 bots. In addition it is not the only thing we are looking at. I didn't spent the last 10 days of my life talking to 50+ bot and logging them for nothing. This was a major undertaking. Things to think about:
1) Most of the bots I had to cut and paste the conversation. I couldn't rely on the bot owner to send me his logged copy because of the possibility it could be doctored. With some you couldn't even do that. I had to type it out line by line.
2) Some bots are painfully slow to reply. A simply conversations seemed to last forever.
3) I had to personally download 15+ Bots to my hard drive. Many of them you had to read the readme files to see how the fricky thing works.
I would like to have done more and give the judges more to work with but when you are dealing with so many bots you can't do much more. As we narrow the field down to 10 then that is a workable number and the judges will be able to speak with them on a one on one bases.
I have this idea I will like to throw out and see what people think. After the judges select the ten bots for the finals we the bot owners will have the opportunity to select two additional bots. My idea here is with 58 bots and only 10 being select there is going to be people who think they should be in the top ten. With the 2 additional picks we can go and select two more deserving bots. I think this is a good idea to try to include 2 more bots that people feel might have been slighted as in your case. The question is how do we select the two bots and what criteria do we apply. It is late here so I need to go but more about this later. I would like to hear what everybody thinks.
Chris
Mr. Crab
23 years ago
23 years ago
Hey Chris, nice to see you here. Don't worry, I don't think anybody fails to understand and appreciate the hard work you've been doing. I certainly couldn't do 58 chats and maintain the kind of patience you had.
Hope you don't take anything I've said as anything but constructive.
I actually DO think 10 questions is a good barometer, though I think the bigger problem with the scoring is not that the questions were bad (obviously any set of disconnected questions is going to be puzzling to a bot tuned to conversation rather than interrogation), nor even that judges didn't all get the jokes, but that the scoring directions are vague. No ill reflection on you Chris, I truly can't believe you've done all this on your own -- I could not (and would not have tried to) have done such a good job if I were on my own organizing the whole thing.
You don't think Oraknabo's number 4 was solid?
Hope you don't take anything I've said as anything but constructive.
I actually DO think 10 questions is a good barometer, though I think the bigger problem with the scoring is not that the questions were bad (obviously any set of disconnected questions is going to be puzzling to a bot tuned to conversation rather than interrogation), nor even that judges didn't all get the jokes, but that the scoring directions are vague. No ill reflection on you Chris, I truly can't believe you've done all this on your own -- I could not (and would not have tried to) have done such a good job if I were on my own organizing the whole thing.
You don't think Oraknabo's number 4 was solid?

» More new posts: Doghead's Cosmic Bar