r104635 MediaWiki - Code Review archive

Repository:	MediaWiki
Revision:	< r104634‎ \| r104635 \| r104636 >
Date:	00:36, 30 November 2011
Author:	brion
Status:	ok
Tags:
Comment:	* (bug 32712) Fix for search indexing of pages with certain unicode chars following URL A regex in SearchUpdate was built for ancient pure ISO 8859-1 and looked for \xa0-\xff bytes -- this caused the regex to cut off partway through if there was a char containing a byte in the \x80-\x9f range. Fixed regex to pass \x80-\xff instead. Added a test case to SearchUpdateTest which checks for this case (example text run through the update squash algo, then run through preg_replace with a /u param to make sure it gets treated as UTF-8 and checking whether it breaks.)
Modified paths:	/trunk/phase3/tests/phpunit/includes/search/SearchUpdateTest.php (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/includes/search/SearchUpdateTest.php
—	—	@@ -77,4 +77,14 @@
78	78	'Bug 18609'
79	79	);
80	80	}
	81	+
	82	+ function testBug32712() {
	83	+ $text = "text „http://example.com“ text";
	84	+ $result = $this->updateText( $text );
	85	+ $processed = preg_replace( '/Q/u', 'Q', $result );
	86	+ $this->assertTrue(
	87	+ $processed != '',
	88	+ 'Link surrounded by unicode quotes should not fail UTF-8 validation'
	89	+ );
	90	+ }
81	91	}

Follow-up revisions

Revision	Commit summary	Author	Date
r104636	Apply the actual fix from r104635. :P	brion	00:37, 30 November 2011
r104637	MFT r104635, r104636 - test & fix for bug 32712.	brion	00:41, 30 November 2011

Status & tagging log

16:19, 5 December 2011 Nikerabbit (talk | contribs) changed the status of r104635 [removed: new added: ok]