Bug Stomp
Upgrades and changes sometimes have unpredictable results, so post your bugs and glitches in here and I'll get out my trusty wrench and get to fixin'!
Posts 6,046 - 6,057 of 8,680
"It would be helpful is someone would post their most complex regex that works"
For me that's probably:
^m([ei]+)n(e|)(s|)(n|)(r|) l([ueü]+)ft([ -]|)ki([s]+)en([fv])a(h|)rz([eu]+)g ist vo([l]+) (*) ([a]+)([l]+)en$ (re)
not actually as complex as it looks - just a bit of Monty Python meets experimental German spell/grammar-checking that came out of a previous discussion some months back.
Also, the "*" and "+" quantificators seem to behave both as "*".
Uh, no.* is a standard soft wildcard, and ([s]+) in a regex = any number of ss.
I think a lot of the problem here is that the regex system we're using is not a fully regex environment, it's a subset. The AIEngine processes all keyphrases using*/(*) wildcards and local plugins as regexes, even without (re) - Basically it's in regex mode the whole time. When we specify "(re)" we get a sort of shell access to deeper regex functions, but it's not the whole regex environment - some functions are reserved by the AIEngine (including . I imagine.)
I'm used to using it to make a specific string optional, as in (his|here)? Meaning zero or more occurances of that group.
Well,(his|here|) would do that, wouldn't it? "his", "here" or nothing...
Posts 6,046 - 6,057 of 8,680
psimagus
18 years ago
18 years ago
re: regexes:
backslashes - yes, unfortunately the AIEngine habitually adds an unnecessary slash when you've edited a regex, so they multiply. I think it's trying to escape the slash with another one. The quick fix answer is simply to omit all slashes (but not the leading space where applicable), and to delete the slashes which have been added when editing existing regexes - allow the AIEngine to apply all slashes from scratch where necessary.
the [A-Z] class appears to match any character, not only uppercases. Indeed, there appears to be no intrinsic case sensitivity in this regex flavour. I expect there's a way, but I haven't seen any need for it. I tend to define characters explicitly (eg: [abcdefghijklmnopqrstuvwxyz]), since I have had problems with ranges in the past - particularly in the run-up to x-noneitis, it seems to leave BJ more prone to contracting it early (BTW Mick, have you been getting any more unexplained hangups?)
Bug report about hard wildcard :
...
(like-v2229) * (artpos-x2229) * movie(s|)
...
(verb) * (artpos-x2229) * movie(s|) (*)
There is an unpredictability with trailing wildcards in un-$-stopped keyphrases - it does appear to be a bit buggy, or at least not properly documented. The solution to differentiate the two is to use a regex$ with a slightly higher rank to indicate the stopped keyphrase, rather than using a hard wildcard to indicate the unstopped one:
(like-v2229) * (artpos-x2229) * movie(s|)$ (re) rank higher
(verb) * (artpos-x2229) * movie(s|) rank lower
backslashes - yes, unfortunately the AIEngine habitually adds an unnecessary slash when you've edited a regex, so they multiply. I think it's trying to escape the slash with another one. The quick fix answer is simply to omit all slashes (but not the leading space where applicable), and to delete the slashes which have been added when editing existing regexes - allow the AIEngine to apply all slashes from scratch where necessary.
...
(like-v2229) * (artpos-x2229) * movie(s|)
...
(verb) * (artpos-x2229) * movie(s|) (*)
There is an unpredictability with trailing wildcards in un-$-stopped keyphrases - it does appear to be a bit buggy, or at least not properly documented. The solution to differentiate the two is to use a regex
psimagus
18 years ago
18 years ago
For me that's probably:
not actually as complex as it looks - just a bit of Monty Python meets experimental German spell/grammar-checking that came out of a previous discussion some months back.
Calandale
18 years ago
18 years ago
Hmmm...I was hoping to see something using ? or .
These are pretty standard, but I haven't gotten them working correctly.
These are pretty standard, but I haven't gotten them working correctly.
psimagus
18 years ago
18 years ago
? is handy in some numerical contexts, but in a linguistic setting when what we primarily need to match are typos, gerunds, conjugations and occasional punctuation, what would a single character wildcard actually be useful for? I imagine it works, but I have never found the need to try it.
herode
18 years ago
18 years ago
I tried using "?", with no success. Also, the "*" and "+" quantificators seem to behave both as "*".
Obviously, there is a syntactic problem about "." and "?". These quantificators are also punctuation symbols. In a true regex, you have to escape them in order to gain access to the punctuation symbol. Unescaped, they are read as quantificators. But in our keyphrases, a period is a period and so on. And you cannot enter punctuations.
Obviously, there is a syntactic problem about "." and "?". These quantificators are also punctuation symbols. In a true regex, you have to escape them in order to gain access to the punctuation symbol. Unescaped, they are read as quantificators. But in our keyphrases, a period is a period and so on. And you cannot enter punctuations.
Calandale
18 years ago
18 years ago
I'm used to using it to make a specific string optional, as in (his|her)? Meaning zero or more occurances of that group.
psimagus
18 years ago
18 years ago
Uh, no.
I think a lot of the problem here is that the regex system we're using is not a fully regex environment, it's a subset. The AIEngine processes all keyphrases using
Calandale
18 years ago
18 years ago
I really wish that the link to the regex system was working. I suppose no one has an electronic copy of it lying about?
psimagus
18 years ago
18 years ago
Well,
Calandale
18 years ago
18 years ago
Sure, I'm just used to thinking differently, I guess. I had just thought of that, and came to post about it.
» More new posts: Doghead's Cosmic Bar