r44791 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r44790‎ | r44791 | r44792 >
Date:01:50, 19 December 2008
Author:brion
Status:ok
Tags:
Comment:
* (bug 15027) Internet domain names and IP addresses can now be indexed and searched sensibly with the default MySQL search backend.

Previously things like "192.168.1.1" couldn't be searched very cleanly in the MySQL backend for two reasons:
* First, the periods were stripped out. This resulted in it being broken into multiple short words: "192 168 1 1", leading at best to false positives and general weirdness.
* Second, for IP addresses these were shorter than the default minimum word length of 4 and thus didn't even get indexed!

The addition of padding for short words let them at least get indexed, but they still didn't turn up cleanly due to the word split. Now allowing periods through to the indexed text, and encoding periods that appear within a compound word so they get caught more cleanly.

Also made a tweak so highlighting works a bit better on word boundaries -- eg "192.168.1.1" no longer hits a highlight match for "192.168.1.100". However it's still not 100% handling some cases with the periods. Sigh.
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/SearchEngine.php (modified) (history)
  • /trunk/phase3/includes/SearchMySQL.php (modified) (history)
  • /trunk/phase3/languages/Language.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/SearchEngine.php
@@ -150,7 +150,7 @@
151151 }
152152
153153 public static function legalSearchChars() {
154 - return "A-Za-z_'0-9\\x80-\\xFF\\-";
 154+ return "A-Za-z_'.0-9\\x80-\\xFF\\-";
155155 }
156156
157157 /**
Index: trunk/phase3/includes/SearchMySQL.php
@@ -54,7 +54,11 @@
5555 if( !empty( $terms[3] ) ) {
5656 // Match individual terms in result highlighting...
5757 $regexp = preg_quote( $terms[3], '/' );
58 - if( $terms[4] ) $regexp .= "[0-9A-Za-z_]+";
 58+ if( $terms[4] ) {
 59+ $regexp = "\b$regexp"; // foo*
 60+ } else {
 61+ $regexp = "\b$regexp\b";
 62+ }
5963 } else {
6064 // Match the quoted term in result highlighting...
6165 $regexp = preg_quote( str_replace( '"', '', $terms[2] ), '/' );
Index: trunk/phase3/languages/Language.php
@@ -1549,6 +1549,17 @@
15501550 $out );
15511551 }
15521552
 1553+ // Periods within things like hostnames and IP addresses
 1554+ // are also important -- we want a search for "example.com"
 1555+ // or "192.168.1.1" to work sanely.
 1556+ //
 1557+ // MySQL's search seems to ignore them, so you'd match on
 1558+ // "example.wikipedia.com" and "192.168.83.1" as well.
 1559+ $out = preg_replace(
 1560+ "/(\w)\.(\w|\*)/u",
 1561+ "$1U82e$2",
 1562+ $out );
 1563+
15531564 wfProfileOut( __METHOD__ );
15541565 return $out;
15551566 }
Index: trunk/phase3/RELEASE-NOTES
@@ -428,6 +428,8 @@
429429 DB environments when $wgDBserver isn't set.
430430 * (bug 3691) Aspect ratio from viewBox attribute is now preserved for SVG
431431 images which do not specify width and height attributes.
 432+* (bug 15027) Internet domain names and IP addresses can now be indexed and
 433+ searched sensibly with the default MySQL search backend.
432434
433435 === API changes in 1.14 ===
434436

Status & tagging log