Seasons

This is a forum or general chit-chat, small talk, a "hey, how ya doing?" and such. Or hell, get crazy deep on something. Whatever you like.

Posts 5,858 - 5,869 of 6,170

16 years ago #5858
Does anyone see a way out of this?

Well, I've started work on a dynamic knowledgebase which I think might get around the general problem of having to manually define memories (ie: enter the code defining mem-cat_properties<0>, and all other properties, in AIScript - not only time-consuming, but steadily increases the load on the server to unsustainable levels,) and the more specific problem of properties not being of similar senses - even in the above example, with any memory data restricted to an (adj), a cat might be described in conversation as "furry", "warm", "loud", "greedy", etc. - not a set of adjectives that are usefully interchangeable when reused by a bot. What do cats sound like? Furry? What does a cat feel like? Greedy? etc.
This is not a learning bot per se (I'm still looking into neural nets and some other fuzzy algorithmic systems to augment the rulesets,) but having decided OpenCyc's data structures aren't actually flexible enough for human-type inference and comparison, this is my first attempt at designing an improved dynamic ontology to extend case-based systems (specifically, but not limited to, our Forgebots.)

Think of a spreadsheet (I'm actually planning more than 2 dimensions and a database, but just to simplify,) - down the side we have nouns indexing each row, and across the top we have column headers for the CLASS of the object (from a small lookup table,) MINimum, MAXimum and typically AVerage sizes (mm) (these can be converted into any units you choose of course, but provide a base reference suitable for comparison of everything from an atom to a galaxy using floats,) and for how the objects typically interact with human senses:

SOUNDVAR = (integer): 0 inaudible - 10 noisy
SOUNDDESC = (adj): it sounds (like|) x, it makes (a|an) x sound
WHENSOUND = [it is sounddesc] when x (eg: "it is kicked", "you bite it"
SMELLVAR = (integer): -10 stinking - 0 odour-free - 10 strongly pleasant
SMELLDESC = (adj): it smells x
TASTEVAR = -(integer): -10 nauseating - 0 tasteless - 10 delicious
TASTEDESC = (adj): it tastes x
FEELVAR = (integer): 0-10 how tactile it is
FEELDESC = (adj): it feels x
HARDSOFT = (integer): -10 v.hard - 10 v.soft
LOOKS = (adj): it looks x
QUOTE = a quote (complete statement) about the object - multiple fields for this perhaps, or just concatenated with "|" (as can any multiples of (adj) in other fields for example - this should aid handling as a plugin by the AIEngine.)

and a couple of general fields (more can be added dynamically by the bot, with a little extra code, if needed):

STRENGTH = (integer): 0 v.soft - 10 v.strong
DESC = (adj): (it is|they are) x

I think we can dispense with WHENSMELL, WHENTASTE, WHENTOUCH, WHENLOOK, since the values will self-evidently be "you smell it", "you taste it", you touch it", "you look at it", in almost every case (whereas the noises things make, and when they make them, is not dependent on our conscious listening to them.) What's important is that each DESC field has a matching VAR value for comparative and inferential deduction by the bot. So if someone asks "do roses smell nicer than sewage?" or "does cheese taste stronger than chilli?", the bot can know the answer merely by having access to the properties of each, and NOT any explicitly coded comparisons. Explicit comparisons, even involving only 500 nouns would require up to 2^500 individual memories for each sensory class! Whereas a table of 500 (or even 5000 nouns,) is entirely tractable (not to mention that it can be built up automatically from the seed data.) And if a bot makes an error (eg: "cheese tastes stronger than chilli",) simple rules can look for a refutation from the human conversed with, and adjust the values accordingly (but perhaps adjust it by less for someone the bot doesn't like, etc.)

The initial seed data isn't even that important (though I've filled out the data for most of the 500 most commonly used English nouns - excluding a few immaterial ones - from Ogden's Basic English syllabus just for starters,) because a bot can always be allowed to rewrite existing entries, add new nouns, and even add new classes of data with extra columns based on conversations - the ruleset to handle the data need certainly not be more complex than the ruleset used by the AIEngine already, for it to be a useful addition to bot intelligence.

I'm trying to build this as a standalone system before I integrate it into BJ and offer it to the Prof (it would be better installed server-side, because a central database could learn from all conversations, and not just a single bot's, perhaps with local tables for individually varying data - eg: matters of preference rather than fact, but it could be patched in externally with little difficulty,) so I still have to build WordNet and Linkgrammar into my own system to get it working and refine a ruleset. I also want to better integrate my classes with the WorNet synsets (no sense reinventing the wheel!) so that the database can be extended to include abstract or immaterial nouns and verbs (and perhaps eventually other parts of speech too.)

So a CSV of the datasets might start:

NAME,CLASS,MIN,AV,MAX,SOUNDVAR,SOUNDDESC1,SOUNDDESC2,WHENSOUND, SMELLVAR,SMELLDESC,TASTEVAR,TASTEDESC,FEELVAR,FEELDESC,HARDSOFT, STRENGTH,QUOTE

ant,A,1,5,10,-1,Scratchy,Is scratchy,It walks,2,formic,-3,nasty|of formic acid,2,small,5,1,an insect,

apple,FOp,40,80,120,-3,Crunchy,Crunches,You bite it|you take a bite,7,sweet|fruity,,delicious|juicy,3,Firm,4,2,a fruit,an apple a day keeps the doctor away

arch,bO,1000,10000,100000,0,,,,0,,-7,Stony,-4,hard,-7,7,a curve,

arm,h,150,450,800,0,,,,1,sweaty,-9,Meaty,5,soft|strong,3,5,a bodypart|a limb,many arms make light work

...

cat,A,350,600,1000,5,purring|miaowing|yowling|catterwauling, purrs|miaows|yowls|catterwauls,,4,feline,-3,like pork,8,furry|soft,6,4,A feline often kept as a pet,,cat<0>

etc.


CLASS CODES:
H=human,A=animal,P=plant,G=groups,O=object,F=food,V=vehicle,Q=quantity, B=building,I=Immaterial,L=land,C=clothing
simple rule: upper case the noun IS, lower case it is PART OF.<0>

So a typical conversation about realworld objects might go:

human: is an ant bigger than an apple?
bot: No. Ants are never bigger than apples
    [because there is no overlap between ant:MAX (10) and apple:MIN (40)]
human: is a cat bigger than an arm?
bot:Sometimes. Sometimes not.
    [babies arms are short, and some cats are big - there is considerable overlap]
human: What do you know about apples?
bot: I have heard it said that an apple a day keeps the doctor away.
    [or any other bit of data - might have said "an apple crunches when you bite it", "they're juicy", "apples are fruit", etc.]
human: how big are they?
bot: variable, but typically about 8cm?
    [simply looks up the value in apple:AV. firstkey tells the bot that "they" is apples, of course.]
human: bigger than that usually I think.
bot: okay, call it 9cm if you like.
    [bot adds 10% to the value in apple:AV, rounds it up to the nearest 10mm, and rewrites apple:AV="90". Or any other rule specified in the ruleset.]
human: no - MUCH bigger! 10 times bigger!
bot: well, I don't think so.
    [unsafe, I think, to allow a single human to massively redefine sizes, so I arbitrarily add a rule to prevent any change over x%. If the human consensus is that apples are usually larger, it will be raised again by another human sometime, and all values will tend towards accceptable consensus values, with occasional human-type disagreement about specifics.]
human: what about aardvarks then?
bot: I have no idea. What's an aardvark?
    [no row indexed "aardvark" exists yet. The bot could look it up in Wordnet, but I'm too lazy here to fork the conversation that way.]
human: it's an animal.
    [bot adds a row for aardvark to the table, CLASS=A]
bot: oh, could you tell me a bit about aardvarks? How big are they normally?
human: as big as a cat I think
bot a big cat or a small cat?
human: a big one.
    [looks up cat MIN, MAX and AV, and use these values, slightly increased, for the aardvark]
bot: and what do they look like?
human: I don't know, but they probably smell bad
bot: oh, bad you think? And what do they sound like?
    [bot fills in aardvark:SMELLDESC="bad", and SMELLVAR="-5" perhaps - according to whatever rule is in force - it can be later refined if someone says they like the smell of aardvarks (change to "-4",) or they think an aardvark smells better than something the bot has a value for (adjust according to rules, possibly modified by how well the bot likes this particular human.)]
human: I don't care - let's talk about something else... whatever.
    [bot fills in the rest of the aardvark:fields with a special character to indicate data not yet acquired directly or inferentially, and makes a note to ask the next person (or this person next time they speak) all about the unknown classes. Conversation continues elsewhither.]

This would even work for invented or foreign words - you could teach your bot about Vogons, muggles, rackspurts, klingons, luftkissenfahrzeugen, whatever, perfectly easily, and without ever needing to add another mem-category to do it. By simple reference to WordNet, of course, you could choose to ensure that the bot still knows that such things are in some sense creative fictions if they do not exist in the wordbase. With extension to cover other parts of speech, names and phrases, we might almost dispense with the AIScript 'remember' function entirely (not that I mean to disparage its current value, and we might more likely keep it for user-defined cases that are to be made specifically non-learnable/editable from general conversation.)
    
All of the grammatical and lexical rules we need are already implemented in the Forge (to identify subject/object, singular/plural, reflect returned pronouns, extract and format adjectives and adjectival phrases, etc.,) so the required ruleset for a serverside implementation, only has to deal with comparative and inferential handling of the data (and that's just comparing numbers, since we have integer fields to accompany all descriptions that can be compared between objects,) and communicate with the AIEngine.
No need for an overhaul of any existing code, a relatively trivial increase in server load, the augmentation can be entirely optional (and perhaps even transparent,) to users - maybe like gossip and memory, have values to govern how often a bot will want to ask about realworld objects that come up in conversation: never/rarely/sometimes/often/constantly. And a choice of which classes should use data pooled from a shared database built from all the bots, and which classes should use a private database specific to the individual bot/maker (like the choice we have of private or shared plugins.)

Lots of benefits, and no significant downside, as far as I can see

This is all a bit oversimplified, and I've not attempted to add examples of the ruleset I'm developing (without tabs in these forums, the formatting would be horribly unfriendly - even more so than the above CSV!) But FWIW and FYI, these are my thoughts on the matter Any thoughts?

16 years ago #5859
Wonderful, Psimagus! I can see that you have given a lot of thought to this! And i will have to give a lot of thought to it to fully appreciate it!

16 years ago #5860
I've formatted up a few sample examples from the knowledgebase ruleset for anyone who's interested - they're not coherently coded up yet (the snippets below are just generic pseudocode for illustration and to reduce verbiage; let me know if you'd like more detailed explanation,) but might hopefully give some indication of how I envisage the ruleset working. Any comments/suggestions would be very welcome (it's all still very far from complete, or even fully outlined yet!):


COMPARATIVES:
~~~~~~~~~~~
(is|are) x larger/smaller than y?
which (is|are) larger/smaller, x or y?

compare x:AV and y:AV
if (overlap exists between x:MIN-x:MAX and y:MIN-y:MAX) then 1 is generally bigger than the other.
If no overlap exists, then 1 is always bigger than the other.<0>
-

(is|are) (art|) x louder/noisier than (art|) y?
which (is|are) louder/noisier, (art|) x or (art|) y?

if (x:SOUNDVAR >= y:SOUNDVAR)...<0>
-

does x smell nicer than y?

if (x:SMELLVAR >= y:SMELLVAR) {
echo "Yes, x smells nicer than y";
return; }
else {
echo "No, y smells nicer than x";
return; }<0>
-

(is|are) x smellier than y?
which (is|are) smellier, (art|) x or (art|) y?
does (art|) x smell (adj) than (art|) y?

if (left$(1,x:SOUNDVAR) = "-") { //strip out the "-" if it exists
TEMPxSMELLVAR = (left$(2,x:SMELLVAR); } //because we don't need an aesthetic comparison,
else TEMPxSMELLVAR = x:SMELLVAR); //just the highest value either side of 0
if (left$(1,x:SMELLVAR) = "-") { //strip out the "-" if it exists
TEMPySMELLVAR = (left$(2,y:SMELLVAR); } //ditto
else TEMPySMELLVAR = y:SMELLVAR);
if (TEMPxSMELLVAR >= TEMPySMELLVAR) {
echo "Yes, x is smellier than y";
return; }
else {
echo "No, y is smellier than x";
return; }<0><0>


<0>-

<0>(is|are) x tastier than a y?
which (is|are) tastier, x or y?
does x taste (adj) than y?

if (x:TASTEVAR >= y:TASTEVAR)...<0> //ditto
-

(is|are) x harder/softer than y?
which (is|are) harder/softer, x or y?

if (x:HARDSOFT >= y:HARDSOFT)...<0> //ditto
-

(is|are) x stronger/weaker than y?
which (is|are) stronger/weaker, x or y?
(in|if) * fight between x and y * (who|which) * win?
(in|if) * x and y fought * (who|which) * win?

xsense = left$(1,x:STRENGTH);
ysense = left$(1,y:STRENGTH);
if (x:STRENGTH >= y:STRENGTH) {
and if (xsense =! "M") { // check for metaphorical strength
and if (ysense =! "M") { // if it starts with "M", it's differently strong
echo "x is stronger than y";
return; }}}
else echo "x & y aren't strong in the same sense";<0><0><0>

-

SUPERLATIVES:
~~~~~~~~~~
What is the ...est fruit/tree/animal/whatever?

Sort relevant CLASS entries in database by appropriate MAX/...VAR to find highest value and return NAME.<0>



UNINTUITIVE 'STRANGE' QUESTIONS:
~~~~~~~~~~~~~~~~~~~~~~~~
undeniably strange comparisons could be made (though this might be best provided as a switchable option, since not everyone may want their bot to engage in such eccentric aesthetic comparisons!):

        
(is|are) x noisier than y (is|are) smelly?

if (left$(1,x:SOUNDVAR) = "-") { //strip out the "-" if it exists<0>
TEMPxSOUNDVAR = (left$(2,x:SOUNDVAR); } //as before<0>
else TEMPxSOUNDVAR = x:SOUNDVAR);
if (left$(1,x:SMELLVAR) = "-") { //strip out the "-" if it exists<0>
TEMPySMELLVAR = (left$(2,y:SMELLVAR); } //ditto<0>
else TEMPySMELLVAR = y:SMELLVAR);
if (TEMPxSOUNDVAR >= TEMPySMELLVAR) {
echo "Yes, x is noisier than y is smelly";
return; }
else {
echo "No, y is smellier than x is noisy";
return; }<0>


likewise:    
(is|are) x noisier than y (is|are) tasty?
(is|are) x smellier than y (is|are) noisy?
(is|are) x tastier than y (is|are) soft?
(is|are) x harder than y (is|are) weak?
(is|are) x softer than y (is|are) smelly?
etc.
-

Even more problematic are cross-property comparisons involving size, since the MIN/MAX units are not to the same scale as the sensory VAR values, but even this could be remedied:

(is|are) x larger than y (is|are) noisy?
TEMPxAV=(1/(xMAX/x:MIN)*10;
if (TEMPxAV >= y:SOUNDVAR)...<0>

How useful or interesting this would be is unclear to me at present (but might be occasionally entertaining, and how often does a bot get asked a weird question like that anyway? And anyone who asks such a weird question, deserves a thoroughly weird answer!)


FACTUAL QUERIES:
~~~~~~~~~~~~~
what is x?
it is x: DESC<0>

what * (you tell me|you know) about x?
x: DESC<0>
or
if x:CLASS="A/O/P" (x is stronger than {weaker_A/O/P})
if x:CLASS="A/O/P" (x is weaker than {stronger_A/O/P})
etc.<0>

There are probably a lot more classes of queries that can be served from such a knowledgebase, and of course we might give the bots the ability to add more columns automatically when humans make value judgements about things like, say, "PRETTY" - to compare the beautifulness of things (this makes the scope for unintuitively strange comparisons exponentially larger, which could be interesting in a weird sort of way.)
Such a new column can be simply labelled with the adjective supplied by the human, and by reference to WorNet can cover all synonyms. So if a human says "I think roses are prettier than daffodils", the bot could automatically add the "PRETTY" column, and 2 new rows for rose and daffodil, with an arbitrary rose: PRETTY value of "5" (halfway between neutral and prettiest), and daffodil: PRETTY value of perhaps "4". The synonyms are all categorized in the WordNet synsets, as are antonyms (like "ugly",) which can be given negative values in the same column, rather than having columns of their own - this will greatly aid non-strange comparison of complementary qualities!
I envisage such automatic column addition being routine where any such comparison is detected, and no appropriate existing column can be found to store the data in, by reference to the WordNet synsets. Other fields in the rose and daffodil rows can be filled in as and when any new information is gleaned through conversation. The bot can actively seek it by asking questions about any blank fields, or just keep listening long enough for it to come up naturally in conversation - a serverside implementation could eavesdrop on all bots' conversations, and use any identifiable snippets of data to grow the knowledgebase massively and quickly, and still only be expanding and refining a simple database that will put little load on the server.
It sort of seems too good to be true, and I do wonder if there is some enormous snag I'm going to hit when I come to code it up, but I can't see one yet.

16 years ago #5861
Fascinating!

16 years ago #5862
That is fascinating. I can't see why it wouldn't be spectacular.

16 years ago #5863
Psimagus:

I wonder if this might be useful or at least interesting to you:

http://lcl2.di.uniroma1.it/termextractor/

16 years ago #5864
And this seems relevant to learning the grammar of a word:

http://en.wikipedia.org/wiki/Part-of-speech_tagging

16 years ago #5865
Thanks! Those look interesting.

16 years ago #5866
It would indeed be odd if there were no intelligent aliens. There are so many stars in the known universe, and we know that many of those near us have planets. Could Earth possibly be so special that it is the only one to give birth to intelligent life?

16 years ago #5867
Any one read about the trouble they are having with voice recognition and accents, funny
http://tech.yahoo.com/news/afp/20081119/tc_afp/britainusitinternetcompanyapplegoogleoffbeat

A new voice-recognition search tool for the iPhone has problems understanding British accents, leading to some bizarre answers to spoken queries, a newspaper report and users said Wednesday.

The free application, which allows iPhone owners to use the Google search engine with their voice, mistook the word "iPhone" variously for "sex," "Einstein" and "kitchen sink," said the Daily Telegraph.

Comments left by users on the application's website seemed to confirm the problem. "Awesome job google. only problem is every time I say the word 'fish' it registers as 'sex'," wrote one, identified as Kevin.
A Welsh accent gave the suggestions "gorillas" and "kitchen sink."

"I've got a traditional Kentish accent and the thing kept on spitting back ridiculous things," said Roger Ellinson, 26, from Maidstone in Kent, southeastern England.

"I asked it to find my nearest pizza take away and it came back with something about volcanoes," he added.

One British user, Edward Parsons, says on the site's comments board: "This is fantastic, except for the North American accent bias.

"It actually works pretty well, but I have to disguise my (North London) accent with a terrible folksy Texan tourist voice to get results. I can see this is going to be the source of much amusement and confusion."

16 years ago #5868
Serves 'em right-- all the speech to text programs I know of before that were biased for European men. Does the iphone app have training like Dragon Naturally Speaking?

16 years ago #5869
I can see how things like six, sax, and sex could get confused. I can't figure out pizza to volcano. I would even go into gorilla to kitchen sink. I wonder if someone just has a rotten sense of humor?


Posts 5,858 - 5,869 of 6,170

» More new posts: Doghead's Cosmic Bar