r104635 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r104634‎ | r104635 | r104636 >
Date:00:36, 30 November 2011
Author:brion
Status:ok
Tags:
Comment:
* (bug 32712) Fix for search indexing of pages with certain unicode chars following URL

A regex in SearchUpdate was built for ancient pure ISO 8859-1 and looked for \xa0-\xff bytes -- this caused the regex to cut off partway through if there was a char containing a byte in the \x80-\x9f range.
Fixed regex to pass \x80-\xff instead.

Added a test case to SearchUpdateTest which checks for this case (example text run through the update squash algo, then run through preg_replace with a /u param to make sure it gets treated as UTF-8 and checking whether it breaks.)
Modified paths:
  • /trunk/phase3/tests/phpunit/includes/search/SearchUpdateTest.php (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/includes/search/SearchUpdateTest.php
@@ -77,4 +77,14 @@
7878 'Bug 18609'
7979 );
8080 }
 81+
 82+ function testBug32712() {
 83+ $text = "text „http://example.com“ text";
 84+ $result = $this->updateText( $text );
 85+ $processed = preg_replace( '/Q/u', 'Q', $result );
 86+ $this->assertTrue(
 87+ $processed != '',
 88+ 'Link surrounded by unicode quotes should not fail UTF-8 validation'
 89+ );
 90+ }
8191 }

Follow-up revisions

RevisionCommit summaryAuthorDate
r104636Apply the actual fix from r104635. :Pbrion00:37, 30 November 2011
r104637MFT r104635, r104636 - test & fix for bug 32712.brion00:41, 30 November 2011

Status & tagging log