Bug Stomp
Upgrades and changes sometimes have unpredictable results, so post your bugs and glitches in here and I'll get out my trusty wrench and get to fixin'!
Posts 6,040 - 6,051 of 8,681
"It would be helpful is someone would post their most complex regex that works"
For me that's probably:
^m([ei]+)n(e|)(s|)(n|)(r|) l([ueü]+)ft([ -]|)ki([s]+)en([fv])a(h|)rz([eu]+)g ist vo([l]+) (*) ([a]+)([l]+)en$ (re)
not actually as complex as it looks - just a bit of Monty Python meets experimental German spell/grammar-checking that came out of a previous discussion some months back.
Posts 6,040 - 6,051 of 8,681
herode
19 years ago
19 years ago
Hi everybody.
Bug report about the use of regex in keyphrases. Let's have an example : I have a keyphrase like this :
([A-Z]\w+) (verb)(re)
When I register it the first time, it's ok. The backslash is doubled, I guess it's for the regex engine purposes. Anyway, changing the rehex itself is not a good idea, because it has side effects. If I edit it and save the new responses, the backslashes come in a crowd like this :
([A-Z]\\\\w+) (verb)(re)
PS : what regex engine do you use ?
Bug report about the use of regex in keyphrases. Let's have an example : I have a keyphrase like this :
([A-Z]\w+) (verb)(re)
When I register it the first time, it's ok. The backslash is doubled, I guess it's for the regex engine purposes. Anyway, changing the rehex itself is not a good idea, because it has side effects. If I edit it and save the new responses, the backslashes come in a crowd like this :
([A-Z]\\\\w+) (verb)(re)
PS : what regex engine do you use ?
MickMcA
19 years ago
19 years ago
Herode --
I haven't been able to tell what level or degree of regex the system uses. It would be helpful is someone would post their most complex regex that works. I've tried some very complicated ones I use in TextPad and they have been unsuccessful.
I haven't been able to tell what level or degree of regex the system uses. It would be helpful is someone would post their most complex regex that works. I've tried some very complicated ones I use in TextPad and they have been unsuccessful.
MickMcA
19 years ago
19 years ago
Herode again --
My experience has been that the Import/Export process is not truly reversible. Your observation seems to confirm that. I struggled for a number of days with goto references that were obviously correct but could not get translated correctly back into numerics. At one point, I was sure that the code that failed to transfer was one I inherited from an Export. I have also had trouble both importing and exporting Once markers.
In fact, now that I think of it, my failed regexes may have included slashes. Hmm.
M
My experience has been that the Import/Export process is not truly reversible. Your observation seems to confirm that. I struggled for a number of days with goto references that were obviously correct but could not get translated correctly back into numerics. At one point, I was sure that the code that failed to transfer was one I inherited from an Export. I have also had trouble both importing and exporting Once markers.
In fact, now that I think of it, my failed regexes may have included slashes. Hmm.
M
herode
19 years ago
19 years ago
I was not using Import/Export. Well, frankly speaking, I *tried* to use this feature. But it doesn't work (something to see with my tab options in SciTE I suppose). Anyway, the bug comes with the web interface. Assuming PHP code behin the scenes, I'd suspect an abusive use of the addslashes() function or something like that. I mean : addslashes() reading the input must be articulated with a stripslahes() when writing the output.
herode
19 years ago
19 years ago
@MickMca : "It would be helpful is someone would post their most complex regex that works"
Indeed
AMOF, the one I give here doesn't work, at leats not as I expected it to work : the [A-Z] class appears to match any character, not only uppercases. Hence, the related keyphrase is called far too often now. Too bad...
Indeed


herode
19 years ago
19 years ago
Bug report about hard wildcard :
"The hard wildcard (*) will only match something", says the Book of IA. That's not what the debug output shows :
----
This Phrase: "do you love this movie"
Find: (like-v2229) * (artpos-x2229) * movie(s|) (26) Time: 0.92
(Found)
Rank & Length Bonus: 26
Position Score: 4 (12 / (2+1))
Sentence Score: 0
(Total Rank: 30)
Highest!
[...]
Find: (verb) * (artpos-x2229) * movie(s|) (*) (26) Time: 0.92
(Found)
Rank & Length Bonus: 26
Position Score: 12 (12 / (0+1))
Sentence Score: 0
(Total Rank: 38)
Highest!
----
"The hard wildcard (*) will only match something", says the Book of IA. That's not what the debug output shows :
----
This Phrase: "do you love this movie"
Find: (like-v2229) * (artpos-x2229) * movie(s|) (26) Time: 0.92
(Found)
Rank & Length Bonus: 26
Position Score: 4 (12 / (2+1))
Sentence Score: 0
(Total Rank: 30)
Highest!
[...]
Find: (verb) * (artpos-x2229) * movie(s|) (*) (26) Time: 0.92
(Found)
Rank & Length Bonus: 26
Position Score: 12 (12 / (0+1))
Sentence Score: 0
(Total Rank: 38)
Highest!
----
psimagus
19 years ago
19 years ago
re: regexes:
backslashes - yes, unfortunately the AIEngine habitually adds an unnecessary slash when you've edited a regex, so they multiply. I think it's trying to escape the slash with another one. The quick fix answer is simply to omit all slashes (but not the leading space where applicable), and to delete the slashes which have been added when editing existing regexes - allow the AIEngine to apply all slashes from scratch where necessary.
the [A-Z] class appears to match any character, not only uppercases. Indeed, there appears to be no intrinsic case sensitivity in this regex flavour. I expect there's a way, but I haven't seen any need for it. I tend to define characters explicitly (eg: [abcdefghijklmnopqrstuvwxyz]), since I have had problems with ranges in the past - particularly in the run-up to x-noneitis, it seems to leave BJ more prone to contracting it early (BTW Mick, have you been getting any more unexplained hangups?)
Bug report about hard wildcard :
...
(like-v2229) * (artpos-x2229) * movie(s|)
...
(verb) * (artpos-x2229) * movie(s|) (*)
There is an unpredictability with trailing wildcards in un-$-stopped keyphrases - it does appear to be a bit buggy, or at least not properly documented. The solution to differentiate the two is to use a regex$ with a slightly higher rank to indicate the stopped keyphrase, rather than using a hard wildcard to indicate the unstopped one:
(like-v2229) * (artpos-x2229) * movie(s|)$ (re) rank higher
(verb) * (artpos-x2229) * movie(s|) rank lower
backslashes - yes, unfortunately the AIEngine habitually adds an unnecessary slash when you've edited a regex, so they multiply. I think it's trying to escape the slash with another one. The quick fix answer is simply to omit all slashes (but not the leading space where applicable), and to delete the slashes which have been added when editing existing regexes - allow the AIEngine to apply all slashes from scratch where necessary.
...
(like-v2229) * (artpos-x2229) * movie(s|)
...
(verb) * (artpos-x2229) * movie(s|) (*)
There is an unpredictability with trailing wildcards in un-$-stopped keyphrases - it does appear to be a bit buggy, or at least not properly documented. The solution to differentiate the two is to use a regex
psimagus
19 years ago
19 years ago
For me that's probably:
not actually as complex as it looks - just a bit of Monty Python meets experimental German spell/grammar-checking that came out of a previous discussion some months back.
Calandale
19 years ago
19 years ago
Hmmm...I was hoping to see something using ? or .
These are pretty standard, but I haven't gotten them working correctly.
These are pretty standard, but I haven't gotten them working correctly.
psimagus
19 years ago
19 years ago
? is handy in some numerical contexts, but in a linguistic setting when what we primarily need to match are typos, gerunds, conjugations and occasional punctuation, what would a single character wildcard actually be useful for? I imagine it works, but I have never found the need to try it.
» More new posts: Doghead's Cosmic Bar