Bug Stomp

Upgrades and changes sometimes have unpredictable results, so post your bugs and glitches in here and I'll get out my trusty wrench and get to fixin'!

Posts 6,046 - 6,057 of 8,681

Prev Next

MARK

psimagus
19 years ago #6046

re: regexes:

backslashes - yes, unfortunately the AIEngine habitually adds an unnecessary slash when you've edited a regex, so they multiply. I think it's trying to escape the slash with another one. The quick fix answer is simply to omit all slashes (but not the leading space where applicable), and to delete the slashes which have been added when editing existing regexes - allow the AIEngine to apply all slashes from scratch where necessary.

the [A-Z] class appears to match any character, not only uppercases. Indeed, there appears to be no intrinsic case sensitivity in this regex flavour. I expect there's a way, but I haven't seen any need for it. I tend to define characters explicitly (eg: [abcdefghijklmnopqrstuvwxyz]), since I have had problems with ranges in the past - particularly in the run-up to x-noneitis, it seems to leave BJ more prone to contracting it early (BTW Mick, have you been getting any more unexplained hangups?)

Bug report about hard wildcard :
...
(like-v2229) * (artpos-x2229) * movie(s|)
...
(verb) * (artpos-x2229) * movie(s|) (*)

There is an unpredictability with trailing wildcards in un-$-stopped keyphrases - it does appear to be a bit buggy, or at least not properly documented. The solution to differentiate the two is to use a regex $ with a slightly higher rank to indicate the stopped keyphrase, rather than using a hard wildcard to indicate the unstopped one:

(like-v2229) * (artpos-x2229) * movie(s|)$ (re) rank higher
(verb) * (artpos-x2229) * movie(s|) rank lower

MARK

psimagus
19 years ago #6047

"It would be helpful is someone would post their most complex regex that works"

For me that's probably:

^m([ei]+)n(e|)(s|)(n|)(r|) l([ueü]+)ft([ -]|)ki([s]+)en([fv])a(h|)rz([eu]+)g ist vo([l]+) (*) ([a]+)([l]+)en$ (re)

not actually as complex as it looks - just a bit of Monty Python meets experimental German spell/grammar-checking that came out of a previous discussion some months back.

MARK

Mr.W.
19 years ago #6048

Either I missed something, or I'm just stupid. What is "regex"?

MARK

psimagus
19 years ago #6049

a regular expression

MARK

Calandale
19 years ago #6050

Hmmm...I was hoping to see something using ? or .

These are pretty standard, but I haven't gotten them working correctly.

MARK

psimagus
19 years ago #6051

? is handy in some numerical contexts, but in a linguistic setting when what we primarily need to match are typos, gerunds, conjugations and occasional punctuation, what would a single character wildcard actually be useful for? I imagine it works, but I have never found the need to try it.

MARK

herode
19 years ago #6052

I tried using "?", with no success. Also, the "*" and "+" quantificators seem to behave both as "*".

Obviously, there is a syntactic problem about "." and "?". These quantificators are also punctuation symbols. In a true regex, you have to escape them in order to gain access to the punctuation symbol. Unescaped, they are read as quantificators. But in our keyphrases, a period is a period and so on. And you cannot enter punctuations.

MARK

Calandale
19 years ago #6053

I'm used to using it to make a specific string optional, as in (his|her)? Meaning zero or more occurances of that group.

MARK

psimagus
19 years ago #6054

Also, the "*" and "+" quantificators seem to behave both as "*".

Uh, no. * is a standard soft wildcard, and ([s]+) in a regex = any number of ss.

I think a lot of the problem here is that the regex system we're using is not a fully regex environment, it's a subset. The AIEngine processes all keyphrases using */(*) wildcards and local plugins as regexes, even without (re) - Basically it's in regex mode the whole time. When we specify "(re)" we get a sort of shell access to deeper regex functions, but it's not the whole regex environment - some functions are reserved by the AIEngine (including . I imagine.)

MARK

Calandale
19 years ago #6055

I really wish that the link to the regex system was working. I suppose no one has an electronic copy of it lying about?

MARK

psimagus
19 years ago #6056

I'm used to using it to make a specific string optional, as in (his|here)? Meaning zero or more occurances of that group.

Well, (his|here|) would do that, wouldn't it? "his", "here" or nothing...

MARK

Calandale
19 years ago #6057

Sure, I'm just used to thinking differently, I guess. I had just thought of that, and came to post about it.

Posts 6,046 - 6,057 of 8,681

Prev Next

» More new posts: Doghead's Cosmic Bar