r94702 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r94701‎ | r94702 | r94703 >
Date:22:42, 16 August 2011
Author:brion
Status:ok (Comments)
Tags:
Comment:
Fix a regression in search highlighting from r90092

Followup on r94609, r94681 test cases.
This fixes the test case with a period that was failing; problem was not the period, but the space before.
The changed regex in r90092 ended up including the preceding whitespace in the match, so the position of the match was coming up one character before the actual word -- thus offsetting the highlight by one char.

Changed from search to match, using the regex's index property (same as search returns) and adding in the length of the first match component which covers the space/whatever
Modified paths:
  • /trunk/phase3/resources/jquery/jquery.highlightText.js (modified) (history)

Diff [purge]

Index: trunk/phase3/resources/jquery/jquery.highlightText.js
@@ -24,8 +24,9 @@
2525 // non latin characters can make regex think a new word has begun: do not use \b
2626 // http://stackoverflow.com/questions/3787072/regex-wordwrap-with-utf8-characters-in-js
2727 // look for an occurence of our pattern and store the starting position
28 - var pos = node.data.search( new RegExp( "(^|\\s)" + $.escapeRE( pat ), "i" ) );
29 - if ( pos >= 0 ) {
 28+ var match = node.data.match( new RegExp( "(^|\\s)" + $.escapeRE( pat ), "i" ) );
 29+ if ( match ) {
 30+ var pos = match.index + match[1].length; // include length of any matched spaces
3031 // create the span wrapper for the matched text
3132 var spannode = document.createElement( 'span' );
3233 spannode.className = 'highlight';

Follow-up revisions

RevisionCommit summaryAuthorDate
r96622MFT r94702: fix for a jquery.highlightText regression in r90092brion22:26, 8 September 2011
r96623MFT r90092, r94702 - jquery.highlightText.js fixes for non-ASCII chars in Vec...brion22:27, 8 September 2011
r96625MFT r90092, r94702 - jquery.highlightText.js fixes for non-ASCII chars in Vec...brion22:28, 8 September 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r90092fix for bug29371 . regex wordwrap with UTF8: do not use \b metacharacter. The...wikinaut21:42, 14 June 2011
r94609tests for jquery.highlightText...hashar08:04, 16 August 2011
r94681Followup r94609: fix jquery.highlightText test cases on IE, simplify test cas...brion19:34, 16 August 2011

Comments

#Comment by Wikinaut (talk | contribs)   00:32, 17 August 2011

works for me. Thanks

#Comment by Wikinaut (talk | contribs)   19:47, 17 August 2011

Summary: works.

Further comment regarding the way, the highlighter works (it splits the "needle" into space-separated "needles", which are then searched and marked).

Hi, I succesfully and for the first time used http://localhost/tests/qunit/?filter=jquery.highlightText%3A%20Check . this is a neat tool, thanks for the hint. In my local copy I added some more test vectors, which I will commit later.

The current function $.highlightText says clearly in the beginning "// Split our pattern string at spaces and run our highlight function on the results"

Thus the result of the highlighter of "Blue Öyster Haystack" with needle "Blue Ö" is

<span class="highlight">Blue</span> <span class="highlight">Ö</span>yster Haystack

i.e. "Blue" and "Ö" matched and marked separately.

Others (like me) would (perhaps) expect to receive

<span class="highlight">Blue Ö</span>yster Haystack

which looks smarter when you have the highlight function in suggestions.js with extension Vector.

I just wanted to point out, that the current function indeed treats a sequence of words as single space-separated terms, which are then searched, and the matches are highlighted separately. I am going to add test vectors and expected results for this behaviour.

#Comment by Wikinaut (talk | contribs)   22:08, 23 August 2011

highlight still fails for test case 025a committed in rev r95341

#Comment by Brion VIBBER (talk | contribs)   22:10, 23 August 2011

That highlight appears to work; it just matches in two places, while your expected text only lists one.

#Comment by Wikinaut (talk | contribs)   22:13, 23 August 2011

"« L" was the "needle" - why should it match with "..... l´île", where the "«" is not present?

#Comment by Wikinaut (talk | contribs)   22:14, 23 August 2011

oops -- correcting myself

#Comment by Wikinaut (talk | contribs)   22:15, 23 August 2011

the highlighter matches space separated string separately... thus it looks for "«" and for "L"

#Comment by Brion VIBBER (talk | contribs)   22:15, 23 August 2011

Well now we're getting into the ultimate problem: this highlighter code doesn't necessarily know anything about how the matches are being made by the search engine.

Do multiple words/symbols/tokens need to appear in the same order? Not according to what jquery.highlight does, since it breaks them into separate tokens and looks for them individually.

The default opensearch prefix search implementation does raw prefix matches only, so would not tend to match something like that -- what you get out of Lucene / MWSearch might be different.

#Comment by Wikinaut (talk | contribs)   22:22, 23 August 2011

Is ok.

#Comment by Wikinaut (talk | contribs)   06:10, 3 September 2011

tagging for back porting to 1.18 and also 1.17wmf1

#Comment by Wikinaut (talk | contribs)   06:11, 3 September 2011

tag for 1.18

#Comment by Wikinaut (talk | contribs)   06:11, 3 September 2011

tag for 1.17wmf1

#Comment by Catrope (talk | contribs)   17:33, 7 September 2011

Why is this tagged 1.17wmf1? The revision that it fixes doesn't seem to have been deployed.

#Comment by Wikinaut (talk | contribs)   20:45, 7 September 2011

I wanted to say that Brion's fix

+ var match = node.data.match( new RegExp( "(^|\\s)" + $.escapeRE( pat ), "i" ) ); + if ( match ) { + var pos = match.index + match[1].length; // include length of any matched spaces

is fine and should be backported to 1.17wmf1. I was not sure how exactly to communicate this to you.

In my view it should be patched into this http://svn.wikimedia.org/viewvc/mediawiki/branches/REL1_17/phase3/resources/jquery/jquery.highlightText.js?revision=79129&view=markup revision.

#Comment by Catrope (talk | contribs)   20:49, 7 September 2011

Aha, I do see the code in 1.17wmf1 now. I'll merge it later.

#Comment by Brion VIBBER (talk | contribs)   22:30, 8 September 2011

I've merged this along with r90092 to 1.18, 1.17, and 1.17wmf1 now. (1.18 had the earlier commit but not its correction here)

Status & tagging log