r85327 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r85326‎ | r85327 | r85328 >
Date:12:59, 4 April 2011
Author:tstarling
Status:resolved (Comments)
Tags:
Comment:
The beginnings of HipHop compiled mode support. It works now for parser cache hits.

* Work around HipHop issue 314 (volatile broken) and issue 308 (no compilation detection) by adding some large and ugly compilation detection code to WebStart.php and doMaintenance.php.
* Provide an MW_COMPILED constant which can be used to detect compiled mode throughout the codebase.
* Introduced wfIsHipHop(), which detects either compiled or interpreted mode. Used this to work around unusual eval() return value in eval.php.
* Work around lack of ini_get() in Maintenance.php, by duplicating wfIsHipHop().
* In Maintenance::shouldExecute(), accept "include" as an inclusion function name, since all kinds of inclusion give this string in HipHop.
* Introduced new class MWInit, which provides some static functions in the pre-autoloader environment.
* Introduced MWInit::compiledPath(), which provides a relative path for invoking a compiled file, and MWInit::interpretedPath(), which provides an absolute path for interpreting a PHP file. Used these new functions in the appropriate places.
* When we are running compiled code, don't include files which would generate duplicate class, function or constant definitions. Documented the new requirements on the contents of Defines.php and UtfNormalDefines.php.
* In HipHop compiled mode, it's not possible to have executable code in the same file as a class definition.
** Moved MimeMagic initialisation to the constructor.
** Moved Namespace.php global variable initialisation to Setup.php.
** Moved MemcachedSessions.php initialisation to the caller in GlobalFunctions.php.
** Moved Sanitizer.php constants and global variables to static class members. Introduced an accessor function for the attribs regex, as a new place to put code formerly at file level.
** Moved Language.php initialisation of $wgLanguageNames to Language::getLanguageNames(). Removed the global variable, marked "private" since forever.

* In two places: don't use error_log() with type=3 to append to a file, HipHop doesn't support it. Use file_put_contents() with FILE_APPEND instead.
* Work around the terrible breakage of class_exists() by using MWInit::classExists() instead in various places. In WebInstaller::getPageByName(), the class_exists() was marked with a fixme comment already, so I replaced it with an autoloader solution.
Modified paths:
  • /trunk/phase3/includes/AutoLoader.php (modified) (history)
  • /trunk/phase3/includes/DefaultSettings.php (modified) (history)
  • /trunk/phase3/includes/Defines.php (modified) (history)
  • /trunk/phase3/includes/ExternalStore.php (modified) (history)
  • /trunk/phase3/includes/GlobalFunctions.php (modified) (history)
  • /trunk/phase3/includes/MemcachedSessions.php (modified) (history)
  • /trunk/phase3/includes/MimeMagic.php (modified) (history)
  • /trunk/phase3/includes/Namespace.php (modified) (history)
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)
  • /trunk/phase3/includes/Setup.php (modified) (history)
  • /trunk/phase3/includes/Skin.php (modified) (history)
  • /trunk/phase3/includes/Title.php (modified) (history)
  • /trunk/phase3/includes/User.php (modified) (history)
  • /trunk/phase3/includes/WebStart.php (modified) (history)
  • /trunk/phase3/includes/installer/WebInstaller.php (modified) (history)
  • /trunk/phase3/includes/normal/UtfNormalDefines.php (modified) (history)
  • /trunk/phase3/includes/normal/UtfNormalUtil.php (modified) (history)
  • /trunk/phase3/languages/Language.php (modified) (history)
  • /trunk/phase3/languages/LanguageConverter.php (modified) (history)
  • /trunk/phase3/languages/Names.php (modified) (history)
  • /trunk/phase3/maintenance/Maintenance.php (modified) (history)
  • /trunk/phase3/maintenance/addwiki.php (modified) (history)
  • /trunk/phase3/maintenance/doMaintenance.php (modified) (history)
  • /trunk/phase3/maintenance/eval.php (modified) (history)
  • /trunk/phase3/maintenance/hiphop/file-list.small (modified) (history)

Diff [purge]

Index: trunk/phase3/languages/Language.php
@@ -174,20 +174,22 @@
175175 $class = 'Language';
176176 } else {
177177 $class = 'Language' . str_replace( '-', '_', ucfirst( $code ) );
178 - // Preload base classes to work around APC/PHP5 bug
179 - if ( file_exists( "$IP/languages/classes/$class.deps.php" ) ) {
180 - include_once( "$IP/languages/classes/$class.deps.php" );
 178+ if ( !defined( 'MW_COMPILED' ) ) {
 179+ // Preload base classes to work around APC/PHP5 bug
 180+ if ( file_exists( "$IP/languages/classes/$class.deps.php" ) ) {
 181+ include_once( "$IP/languages/classes/$class.deps.php" );
 182+ }
 183+ if ( file_exists( "$IP/languages/classes/$class.php" ) ) {
 184+ include_once( "$IP/languages/classes/$class.php" );
 185+ }
181186 }
182 - if ( file_exists( "$IP/languages/classes/$class.php" ) ) {
183 - include_once( "$IP/languages/classes/$class.php" );
184 - }
185187 }
186188
187189 if ( $recursionLevel > 5 ) {
188190 throw new MWException( "Language fallback loop detected when creating class $class\n" );
189191 }
190192
191 - if ( !class_exists( $class ) ) {
 193+ if ( !MWInit::classExists( $class ) ) {
192194 $fallback = Language::getFallbackFor( $code );
193195 ++$recursionLevel;
194196 $lang = Language::newFromCode( $fallback );
@@ -540,8 +542,14 @@
541543 * If $customisedOnly is true, only returns codes with a messages file
542544 */
543545 public static function getLanguageNames( $customisedOnly = false ) {
544 - global $wgLanguageNames, $wgExtraLanguageNames;
545 - $allNames = $wgExtraLanguageNames + $wgLanguageNames;
 546+ global $wgExtraLanguageNames;
 547+ static $coreLanguageNames;
 548+
 549+ if ( $coreLanguageNames === null ) {
 550+ include( MWInit::compiledPath( 'languages/Names.php' ) );
 551+ }
 552+
 553+ $allNames = $wgExtraLanguageNames + $coreLanguageNames;
546554 if ( !$customisedOnly ) {
547555 return $allNames;
548556 }
Index: trunk/phase3/languages/Names.php
@@ -5,7 +5,7 @@
66 *
77 * @ingroup Language
88 */
9 -/* private */ $wgLanguageNames = array(
 9+/* private */ $coreLanguageNames = array(
1010 'aa' => 'Qafár af', # Afar
1111 'ab' => 'Аҧсуа', # Abkhaz, should possibly add ' бысжѡа'
1212 'ace' => 'Acèh', # Aceh
Index: trunk/phase3/languages/LanguageConverter.php
@@ -67,12 +67,12 @@
6868 public function __construct( $langobj, $maincode, $variants = array(),
6969 $variantfallbacks = array(), $flags = array(),
7070 $manualLevel = array() ) {
71 - global $wgDisabledVariants, $wgLanguageNames;
 71+ global $wgDisabledVariants;
7272 $this->mLangObj = $langobj;
7373 $this->mMainLanguageCode = $maincode;
7474 $this->mVariants = array_diff( $variants, $wgDisabledVariants );
7575 $this->mVariantFallbacks = $variantfallbacks;
76 - $this->mVariantNames = $wgLanguageNames;
 76+ $this->mVariantNames = Language::getLanguageNames();
7777 $this->mCacheKey = wfMemcKey( 'conversiontables', $maincode );
7878 $defaultflags = array(
7979 // 'S' show converted text
Index: trunk/phase3/maintenance/doMaintenance.php
@@ -53,17 +53,35 @@
5454 // to $maintenance->mSelf. Keep that here for b/c
5555 $self = $maintenance->getName();
5656
 57+// Detect compiled mode
 58+try {
 59+ $r = new ReflectionFunction( 'wfHipHopCompilerVersion' );
 60+} catch ( ReflectionException $e ) {
 61+ $r = false;
 62+}
 63+
 64+if ( $r ) {
 65+ define( 'MW_COMPILED', 1 );
 66+}
 67+
 68+# Get the MWInit class
 69+if ( !defined( 'MW_COMPILED' ) ) {
 70+ require_once( "$IP/includes/Init.php" );
 71+}
 72+
5773 # Setup the profiler
5874 global $IP;
59 -if ( file_exists( "$IP/StartProfiler.php" ) ) {
 75+if ( !defined( 'MW_COMPILED' ) && file_exists( "$IP/StartProfiler.php" ) ) {
6076 require_once( "$IP/StartProfiler.php" );
6177 } else {
62 - require_once( "$IP/includes/ProfilerStub.php" );
 78+ require_once( MWInit::compiledPath( 'includes/ProfilerStub.php' ) );
6379 }
6480
6581 // Some other requires
66 -require_once( "$IP/includes/AutoLoader.php" );
67 -require_once( "$IP/includes/Defines.php" );
 82+if ( !defined( 'MW_COMPILED' ) ) {
 83+ require_once( "$IP/includes/AutoLoader.php" );
 84+ require_once( "$IP/includes/Defines.php" );
 85+}
6886 require_once( "$IP/includes/DefaultSettings.php" );
6987
7088 if ( defined( 'MW_CONFIG_CALLBACK' ) ) {
@@ -77,10 +95,10 @@
7896 global $cluster;
7997 $wgWikiFarm = true;
8098 $cluster = 'pmtpa';
81 - require_once( "$IP/includes/SiteConfiguration.php" );
82 - require( "$IP/wmf-config/wgConf.php" );
 99+ require_once( MWInit::compiledPath( 'includes/SiteConfiguration.php' ) );
 100+ require( MWInit::interpretedPath( 'wmf-config/wgConf.php' ) );
83101 $maintenance->loadWikimediaSettings();
84 - require( $IP . '/wmf-config/CommonSettings.php' );
 102+ require( MWInit::interpretedPath( '/wmf-config/CommonSettings.php' ) );
85103 } else {
86104 require_once( $maintenance->loadSettings() );
87105 }
@@ -88,12 +106,12 @@
89107 if ( $maintenance->getDbType() === Maintenance::DB_ADMIN &&
90108 is_readable( "$IP/AdminSettings.php" ) )
91109 {
92 - require( "$IP/AdminSettings.php" );
 110+ require( MWInit::interpretedPath( 'AdminSettings.php' ) );
93111 }
94112 $maintenance->finalSetup();
95113 // Some last includes
96 -require_once( "$IP/includes/Setup.php" );
97 -require_once( "$IP/maintenance/install-utils.inc" );
 114+require_once( MWInit::compiledPath( 'includes/Setup.php' ) );
 115+require_once( MWInit::compiledPath( 'maintenance/install-utils.inc' ) );
98116
99117 // Much much faster startup than creating a title object
100118 $wgTitle = null;
Index: trunk/phase3/maintenance/addwiki.php
@@ -57,7 +57,7 @@
5858 $languageNames = Language::getLanguageNames();
5959
6060 if ( !isset( $languageNames[$lang] ) ) {
61 - $this->error( "Language $lang not found in \$wgLanguageNames", true );
 61+ $this->error( "Language $lang not found in Names.php", true );
6262 }
6363 $name = $languageNames[$lang];
6464
Index: trunk/phase3/maintenance/Maintenance.php
@@ -139,7 +139,7 @@
140140 if( count( $bt ) !== 2 ) {
141141 return false;
142142 }
143 - return ( $bt[1]['function'] == 'require_once' || $bt[1]['function'] == 'require' ) &&
 143+ return in_array( $bt[1]['function'], array( 'require_once', 'require', 'include' ) ) &&
144144 $bt[0]['class'] == 'Maintenance' &&
145145 $bt[0]['function'] == 'shouldExecute';
146146 }
@@ -430,11 +430,11 @@
431431 */
432432 public function runChild( $maintClass, $classFile = null ) {
433433 // Make sure the class is loaded first
434 - if ( !class_exists( $maintClass ) ) {
 434+ if ( !MWInit::classExists( $maintClass ) ) {
435435 if ( $classFile ) {
436436 require_once( $classFile );
437437 }
438 - if ( !class_exists( $maintClass ) ) {
 438+ if ( !MWInit::classExists( $maintClass ) ) {
439439 $this->error( "Cannot spawn child: $maintClass" );
440440 }
441441 }
@@ -456,7 +456,7 @@
457457 }
458458
459459 # Make sure we can handle script parameters
460 - if ( !ini_get( 'register_argc_argv' ) ) {
 460+ if ( !function_exists( 'hphp_thread_set_warmup_enabled' ) && !ini_get( 'register_argc_argv' ) ) {
461461 $this->error( 'Cannot get command line arguments, register_argc_argv is set to false', true );
462462 }
463463
Index: trunk/phase3/maintenance/eval.php
@@ -74,7 +74,7 @@
7575 readline_write_history( $historyFile );
7676 }
7777 $val = eval( $line . ";" );
78 - if ( is_null( $val ) ) {
 78+ if ( wfIsHipHop() || is_null( $val ) ) {
7979 echo "\n";
8080 } elseif ( is_string( $val ) || is_numeric( $val ) ) {
8181 echo "$val\n";
Index: trunk/phase3/maintenance/hiphop/file-list.small
@@ -158,6 +158,7 @@
159159 includes/filerepo/OldLocalFile.php
160160 includes/filerepo/RepoGroup.php
161161 includes/filerepo/UnregisteredLocalFile.php
 162+includes/installer/Installer.php
162163 includes/job/DoubleRedirectJob.php
163164 includes/job/EmaillingJob.php
164165 includes/job/EnotifNotifyJob.php
@@ -280,6 +281,8 @@
281282 redirect.php
282283 resources/Resources.php
283284 serialized/serialize.php
 285+skins/MonoBook.deps.php
 286+skins/MonoBook.php
284287 skins/Vector.deps.php
285288 skins/Vector.php
286289 thumb.php
Index: trunk/phase3/includes/AutoLoader.php
@@ -457,6 +457,22 @@
458458 'LBFactory_InstallerFake' => 'includes/installer/Installer.php',
459459 'LocalSettingsGenerator' => 'includes/installer/LocalSettingsGenerator.php',
460460 'WebInstaller' => 'includes/installer/WebInstaller.php',
 461+ 'WebInstaller_Complete' => 'includes/installer/WebInstallerPage.php',
 462+ 'WebInstaller_Copying' => 'includes/installer/WebInstallerPage.php',
 463+ 'WebInstaller_DBConnect' => 'includes/installer/WebInstallerPage.php',
 464+ 'WebInstaller_DBSettings' => 'includes/installer/WebInstallerPage.php',
 465+ 'WebInstaller_Document' => 'includes/installer/WebInstallerPage.php',
 466+ 'WebInstaller_ExistingWiki' => 'includes/installer/WebInstallerPage.php',
 467+ 'WebInstaller_Install' => 'includes/installer/WebInstallerPage.php',
 468+ 'WebInstaller_Language' => 'includes/installer/WebInstallerPage.php',
 469+ 'WebInstaller_Name' => 'includes/installer/WebInstallerPage.php',
 470+ 'WebInstaller_Options' => 'includes/installer/WebInstallerPage.php',
 471+ 'WebInstaller_Readme' => 'includes/installer/WebInstallerPage.php',
 472+ 'WebInstaller_ReleaseNotes' => 'includes/installer/WebInstallerPage.php',
 473+ 'WebInstaller_Restart' => 'includes/installer/WebInstallerPage.php',
 474+ 'WebInstaller_Upgrade' => 'includes/installer/WebInstallerPage.php',
 475+ 'WebInstaller_UpgradeDoc' => 'includes/installer/WebInstallerPage.php',
 476+ 'WebInstaller_Welcome' => 'includes/installer/WebInstallerPage.php',
461477 'WebInstallerPage' => 'includes/installer/WebInstallerPage.php',
462478 'WebInstallerOutput' => 'includes/installer/WebInstallerOutput.php',
463479 'MysqlInstaller' => 'includes/installer/MysqlInstaller.php',
Index: trunk/phase3/includes/Setup.php
@@ -218,6 +218,35 @@
219219 $wgMetaNamespace = str_replace( ' ', '_', $wgSitename );
220220 }
221221
 222+/**
 223+ * Definitions of the NS_ constants are in Defines.php
 224+ * @private
 225+ */
 226+$wgCanonicalNamespaceNames = array(
 227+ NS_MEDIA => 'Media',
 228+ NS_SPECIAL => 'Special',
 229+ NS_TALK => 'Talk',
 230+ NS_USER => 'User',
 231+ NS_USER_TALK => 'User_talk',
 232+ NS_PROJECT => 'Project',
 233+ NS_PROJECT_TALK => 'Project_talk',
 234+ NS_FILE => 'File',
 235+ NS_FILE_TALK => 'File_talk',
 236+ NS_MEDIAWIKI => 'MediaWiki',
 237+ NS_MEDIAWIKI_TALK => 'MediaWiki_talk',
 238+ NS_TEMPLATE => 'Template',
 239+ NS_TEMPLATE_TALK => 'Template_talk',
 240+ NS_HELP => 'Help',
 241+ NS_HELP_TALK => 'Help_talk',
 242+ NS_CATEGORY => 'Category',
 243+ NS_CATEGORY_TALK => 'Category_talk',
 244+);
 245+
 246+/// @todo UGLY UGLY
 247+if( is_array( $wgExtraNamespaces ) ) {
 248+ $wgCanonicalNamespaceNames = $wgCanonicalNamespaceNames + $wgExtraNamespaces;
 249+}
 250+
222251 # These are now the same, always
223252 # To determine the user language, use $wgLang->getCode()
224253 $wgContLanguageCode = $wgLanguageCode;
@@ -274,22 +303,24 @@
275304 $wgLogActions['newusers/autocreate'] = 'newuserlog-autocreate-entry';
276305 }
277306
278 -if ( !class_exists( 'AutoLoader' ) ) {
279 - require_once( "$IP/includes/AutoLoader.php" );
280 -}
 307+if ( !defined( 'MW_COMPILED' ) ) {
 308+ if ( !MWInit::classExists( 'AutoLoader' ) ) {
 309+ require_once( "$IP/includes/AutoLoader.php" );
 310+ }
281311
282 -wfProfileIn( $fname . '-exception' );
283 -require_once( "$IP/includes/Exception.php" );
284 -wfInstallExceptionHandler();
285 -wfProfileOut( $fname . '-exception' );
 312+ wfProfileIn( $fname . '-exception' );
 313+ require_once( "$IP/includes/Exception.php" );
 314+ wfInstallExceptionHandler();
 315+ wfProfileOut( $fname . '-exception' );
286316
287 -wfProfileIn( $fname . '-includes' );
288 -require_once( "$IP/includes/GlobalFunctions.php" );
289 -require_once( "$IP/includes/Hooks.php" );
290 -require_once( "$IP/includes/Namespace.php" );
291 -require_once( "$IP/includes/ProxyTools.php" );
292 -require_once( "$IP/includes/ImageFunctions.php" );
293 -wfProfileOut( $fname . '-includes' );
 317+ wfProfileIn( $fname . '-includes' );
 318+ require_once( "$IP/includes/GlobalFunctions.php" );
 319+ require_once( "$IP/includes/Hooks.php" );
 320+ require_once( "$IP/includes/Namespace.php" );
 321+ require_once( "$IP/includes/ProxyTools.php" );
 322+ require_once( "$IP/includes/ImageFunctions.php" );
 323+ wfProfileOut( $fname . '-includes' );
 324+}
294325 wfProfileIn( $fname . '-misc1' );
295326
296327 # Raise the memory limit if it's too low
Index: trunk/phase3/includes/User.php
@@ -1333,7 +1333,7 @@
13341334 if( $count > $max ) {
13351335 wfDebug( __METHOD__ . ": tripped! $key at $count $summary\n" );
13361336 if( $wgRateLimitLog ) {
1337 - @error_log( wfTimestamp( TS_MW ) . ' ' . wfWikiID() . ': ' . $this->getName() . " tripped $key at $count $summary\n", 3, $wgRateLimitLog );
 1337+ @file_put_contents( $wgRateLimitLog, wfTimestamp( TS_MW ) . ' ' . wfWikiID() . ': ' . $this->getName() . " tripped $key at $count $summary\n", FILE_APPEND );
13381338 }
13391339 $triggered = true;
13401340 } else {
Index: trunk/phase3/includes/MemcachedSessions.php
@@ -96,10 +96,3 @@
9797 session_write_close();
9898 }
9999
100 -session_set_save_handler( 'memsess_open', 'memsess_close', 'memsess_read', 'memsess_write', 'memsess_destroy', 'memsess_gc' );
101 -
102 -// It's necessary to register a shutdown function to call session_write_close(),
103 -// because by the time the request shutdown function for the session module is
104 -// called, $wgMemc has already been destroyed. Shutdown functions registered
105 -// this way are called before object destruction.
106 -register_shutdown_function( 'memsess_write_close' );
Index: trunk/phase3/includes/installer/WebInstaller.php
@@ -408,9 +408,6 @@
409409 * @return WebInstallerPage
410410 */
411411 public function getPageByName( $pageName ) {
412 - // Totally lame way to force autoload of WebInstallerPage.php
413 - class_exists( 'WebInstallerPage' );
414 -
415412 $pageClass = 'WebInstaller_' . $pageName;
416413
417414 return new $pageClass( $this );
Index: trunk/phase3/includes/Sanitizer.php
@@ -25,322 +25,323 @@
2626 */
2727
2828 /**
29 - * Regular expression to match various types of character references in
30 - * Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences
 29+ * XHTML sanitizer for MediaWiki
 30+ * @ingroup Parser
3131 */
32 -define( 'MW_CHAR_REFS_REGEX',
33 - '/&([A-Za-z0-9\x80-\xff]+);
34 - |&\#([0-9]+);
35 - |&\#[xX]([0-9A-Fa-f]+);
36 - |(&)/x' );
 32+class Sanitizer {
 33+ /**
 34+ * Regular expression to match various types of character references in
 35+ * Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences
 36+ */
 37+ const CHAR_REFS_REGEX =
 38+ '/&([A-Za-z0-9\x80-\xff]+);
 39+ |&\#([0-9]+);
 40+ |&\#[xX]([0-9A-Fa-f]+);
 41+ |(&)/x';
3742
38 -/**
39 - * Regular expression to match HTML/XML attribute pairs within a tag.
40 - * Allows some... latitude.
41 - * Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes
42 - */
43 -$attribFirst = '[:A-Z_a-z0-9]';
44 -$attrib = '[:A-Z_a-z-.0-9]';
45 -$space = '[\x09\x0a\x0d\x20]';
46 -define( 'MW_ATTRIBS_REGEX',
47 - "/(?:^|$space)({$attribFirst}{$attrib}*)
48 - ($space*=$space*
49 - (?:
50 - # The attribute value: quoted or alone
51 - \"([^<\"]*)\"
52 - | '([^<']*)'
53 - | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+)
54 - | (\#[0-9a-fA-F]+) # Technically wrong, but lots of
55 - # colors are specified like this.
56 - # We'll be normalizing it.
57 - )
58 - )?(?=$space|\$)/sx" );
 43+ const EVIL_URI_PATTERN = '!(^|\s|\*/\s*)(javascript|vbscript)([^\w]|$)!i';
 44+ const XMLNS_ATTRIBUTE_PATTERN = "/^xmlns:[:A-Z_a-z-.0-9]+$/";
5945
60 -/**
61 - * Regular expression to match URIs that could trigger script execution
62 - */
63 -define( 'MW_EVIL_URI_PATTERN', '!(^|\s|\*/\s*)(javascript|vbscript)([^\w]|$)!i' );
 46+ /**
 47+ * List of all named character entities defined in HTML 4.01
 48+ * http://www.w3.org/TR/html4/sgml/entities.html
 49+ * @private
 50+ */
 51+ static $htmlEntities = array(
 52+ 'Aacute' => 193,
 53+ 'aacute' => 225,
 54+ 'Acirc' => 194,
 55+ 'acirc' => 226,
 56+ 'acute' => 180,
 57+ 'AElig' => 198,
 58+ 'aelig' => 230,
 59+ 'Agrave' => 192,
 60+ 'agrave' => 224,
 61+ 'alefsym' => 8501,
 62+ 'Alpha' => 913,
 63+ 'alpha' => 945,
 64+ 'amp' => 38,
 65+ 'and' => 8743,
 66+ 'ang' => 8736,
 67+ 'Aring' => 197,
 68+ 'aring' => 229,
 69+ 'asymp' => 8776,
 70+ 'Atilde' => 195,
 71+ 'atilde' => 227,
 72+ 'Auml' => 196,
 73+ 'auml' => 228,
 74+ 'bdquo' => 8222,
 75+ 'Beta' => 914,
 76+ 'beta' => 946,
 77+ 'brvbar' => 166,
 78+ 'bull' => 8226,
 79+ 'cap' => 8745,
 80+ 'Ccedil' => 199,
 81+ 'ccedil' => 231,
 82+ 'cedil' => 184,
 83+ 'cent' => 162,
 84+ 'Chi' => 935,
 85+ 'chi' => 967,
 86+ 'circ' => 710,
 87+ 'clubs' => 9827,
 88+ 'cong' => 8773,
 89+ 'copy' => 169,
 90+ 'crarr' => 8629,
 91+ 'cup' => 8746,
 92+ 'curren' => 164,
 93+ 'dagger' => 8224,
 94+ 'Dagger' => 8225,
 95+ 'darr' => 8595,
 96+ 'dArr' => 8659,
 97+ 'deg' => 176,
 98+ 'Delta' => 916,
 99+ 'delta' => 948,
 100+ 'diams' => 9830,
 101+ 'divide' => 247,
 102+ 'Eacute' => 201,
 103+ 'eacute' => 233,
 104+ 'Ecirc' => 202,
 105+ 'ecirc' => 234,
 106+ 'Egrave' => 200,
 107+ 'egrave' => 232,
 108+ 'empty' => 8709,
 109+ 'emsp' => 8195,
 110+ 'ensp' => 8194,
 111+ 'Epsilon' => 917,
 112+ 'epsilon' => 949,
 113+ 'equiv' => 8801,
 114+ 'Eta' => 919,
 115+ 'eta' => 951,
 116+ 'ETH' => 208,
 117+ 'eth' => 240,
 118+ 'Euml' => 203,
 119+ 'euml' => 235,
 120+ 'euro' => 8364,
 121+ 'exist' => 8707,
 122+ 'fnof' => 402,
 123+ 'forall' => 8704,
 124+ 'frac12' => 189,
 125+ 'frac14' => 188,
 126+ 'frac34' => 190,
 127+ 'frasl' => 8260,
 128+ 'Gamma' => 915,
 129+ 'gamma' => 947,
 130+ 'ge' => 8805,
 131+ 'gt' => 62,
 132+ 'harr' => 8596,
 133+ 'hArr' => 8660,
 134+ 'hearts' => 9829,
 135+ 'hellip' => 8230,
 136+ 'Iacute' => 205,
 137+ 'iacute' => 237,
 138+ 'Icirc' => 206,
 139+ 'icirc' => 238,
 140+ 'iexcl' => 161,
 141+ 'Igrave' => 204,
 142+ 'igrave' => 236,
 143+ 'image' => 8465,
 144+ 'infin' => 8734,
 145+ 'int' => 8747,
 146+ 'Iota' => 921,
 147+ 'iota' => 953,
 148+ 'iquest' => 191,
 149+ 'isin' => 8712,
 150+ 'Iuml' => 207,
 151+ 'iuml' => 239,
 152+ 'Kappa' => 922,
 153+ 'kappa' => 954,
 154+ 'Lambda' => 923,
 155+ 'lambda' => 955,
 156+ 'lang' => 9001,
 157+ 'laquo' => 171,
 158+ 'larr' => 8592,
 159+ 'lArr' => 8656,
 160+ 'lceil' => 8968,
 161+ 'ldquo' => 8220,
 162+ 'le' => 8804,
 163+ 'lfloor' => 8970,
 164+ 'lowast' => 8727,
 165+ 'loz' => 9674,
 166+ 'lrm' => 8206,
 167+ 'lsaquo' => 8249,
 168+ 'lsquo' => 8216,
 169+ 'lt' => 60,
 170+ 'macr' => 175,
 171+ 'mdash' => 8212,
 172+ 'micro' => 181,
 173+ 'middot' => 183,
 174+ 'minus' => 8722,
 175+ 'Mu' => 924,
 176+ 'mu' => 956,
 177+ 'nabla' => 8711,
 178+ 'nbsp' => 160,
 179+ 'ndash' => 8211,
 180+ 'ne' => 8800,
 181+ 'ni' => 8715,
 182+ 'not' => 172,
 183+ 'notin' => 8713,
 184+ 'nsub' => 8836,
 185+ 'Ntilde' => 209,
 186+ 'ntilde' => 241,
 187+ 'Nu' => 925,
 188+ 'nu' => 957,
 189+ 'Oacute' => 211,
 190+ 'oacute' => 243,
 191+ 'Ocirc' => 212,
 192+ 'ocirc' => 244,
 193+ 'OElig' => 338,
 194+ 'oelig' => 339,
 195+ 'Ograve' => 210,
 196+ 'ograve' => 242,
 197+ 'oline' => 8254,
 198+ 'Omega' => 937,
 199+ 'omega' => 969,
 200+ 'Omicron' => 927,
 201+ 'omicron' => 959,
 202+ 'oplus' => 8853,
 203+ 'or' => 8744,
 204+ 'ordf' => 170,
 205+ 'ordm' => 186,
 206+ 'Oslash' => 216,
 207+ 'oslash' => 248,
 208+ 'Otilde' => 213,
 209+ 'otilde' => 245,
 210+ 'otimes' => 8855,
 211+ 'Ouml' => 214,
 212+ 'ouml' => 246,
 213+ 'para' => 182,
 214+ 'part' => 8706,
 215+ 'permil' => 8240,
 216+ 'perp' => 8869,
 217+ 'Phi' => 934,
 218+ 'phi' => 966,
 219+ 'Pi' => 928,
 220+ 'pi' => 960,
 221+ 'piv' => 982,
 222+ 'plusmn' => 177,
 223+ 'pound' => 163,
 224+ 'prime' => 8242,
 225+ 'Prime' => 8243,
 226+ 'prod' => 8719,
 227+ 'prop' => 8733,
 228+ 'Psi' => 936,
 229+ 'psi' => 968,
 230+ 'quot' => 34,
 231+ 'radic' => 8730,
 232+ 'rang' => 9002,
 233+ 'raquo' => 187,
 234+ 'rarr' => 8594,
 235+ 'rArr' => 8658,
 236+ 'rceil' => 8969,
 237+ 'rdquo' => 8221,
 238+ 'real' => 8476,
 239+ 'reg' => 174,
 240+ 'rfloor' => 8971,
 241+ 'Rho' => 929,
 242+ 'rho' => 961,
 243+ 'rlm' => 8207,
 244+ 'rsaquo' => 8250,
 245+ 'rsquo' => 8217,
 246+ 'sbquo' => 8218,
 247+ 'Scaron' => 352,
 248+ 'scaron' => 353,
 249+ 'sdot' => 8901,
 250+ 'sect' => 167,
 251+ 'shy' => 173,
 252+ 'Sigma' => 931,
 253+ 'sigma' => 963,
 254+ 'sigmaf' => 962,
 255+ 'sim' => 8764,
 256+ 'spades' => 9824,
 257+ 'sub' => 8834,
 258+ 'sube' => 8838,
 259+ 'sum' => 8721,
 260+ 'sup' => 8835,
 261+ 'sup1' => 185,
 262+ 'sup2' => 178,
 263+ 'sup3' => 179,
 264+ 'supe' => 8839,
 265+ 'szlig' => 223,
 266+ 'Tau' => 932,
 267+ 'tau' => 964,
 268+ 'there4' => 8756,
 269+ 'Theta' => 920,
 270+ 'theta' => 952,
 271+ 'thetasym' => 977,
 272+ 'thinsp' => 8201,
 273+ 'THORN' => 222,
 274+ 'thorn' => 254,
 275+ 'tilde' => 732,
 276+ 'times' => 215,
 277+ 'trade' => 8482,
 278+ 'Uacute' => 218,
 279+ 'uacute' => 250,
 280+ 'uarr' => 8593,
 281+ 'uArr' => 8657,
 282+ 'Ucirc' => 219,
 283+ 'ucirc' => 251,
 284+ 'Ugrave' => 217,
 285+ 'ugrave' => 249,
 286+ 'uml' => 168,
 287+ 'upsih' => 978,
 288+ 'Upsilon' => 933,
 289+ 'upsilon' => 965,
 290+ 'Uuml' => 220,
 291+ 'uuml' => 252,
 292+ 'weierp' => 8472,
 293+ 'Xi' => 926,
 294+ 'xi' => 958,
 295+ 'Yacute' => 221,
 296+ 'yacute' => 253,
 297+ 'yen' => 165,
 298+ 'Yuml' => 376,
 299+ 'yuml' => 255,
 300+ 'Zeta' => 918,
 301+ 'zeta' => 950,
 302+ 'zwj' => 8205,
 303+ 'zwnj' => 8204
 304+ );
64305
65 -/**
66 - * Regular expression to match namespace attributes
67 - */
68 -define( 'MW_XMLNS_ATTRIBUTE_PATTRN', "/^xmlns:$attrib+$/" );
 306+ /**
 307+ * Character entity aliases accepted by MediaWiki
 308+ */
 309+ static $htmlEntityAliases = array(
 310+ 'רלמ' => 'rlm',
 311+ 'رلم' => 'rlm',
 312+ );
69313
70 -/**
71 - * List of all named character entities defined in HTML 4.01
72 - * http://www.w3.org/TR/html4/sgml/entities.html
73 - * @private
74 - */
75 -global $wgHtmlEntities;
76 -$wgHtmlEntities = array(
77 - 'Aacute' => 193,
78 - 'aacute' => 225,
79 - 'Acirc' => 194,
80 - 'acirc' => 226,
81 - 'acute' => 180,
82 - 'AElig' => 198,
83 - 'aelig' => 230,
84 - 'Agrave' => 192,
85 - 'agrave' => 224,
86 - 'alefsym' => 8501,
87 - 'Alpha' => 913,
88 - 'alpha' => 945,
89 - 'amp' => 38,
90 - 'and' => 8743,
91 - 'ang' => 8736,
92 - 'Aring' => 197,
93 - 'aring' => 229,
94 - 'asymp' => 8776,
95 - 'Atilde' => 195,
96 - 'atilde' => 227,
97 - 'Auml' => 196,
98 - 'auml' => 228,
99 - 'bdquo' => 8222,
100 - 'Beta' => 914,
101 - 'beta' => 946,
102 - 'brvbar' => 166,
103 - 'bull' => 8226,
104 - 'cap' => 8745,
105 - 'Ccedil' => 199,
106 - 'ccedil' => 231,
107 - 'cedil' => 184,
108 - 'cent' => 162,
109 - 'Chi' => 935,
110 - 'chi' => 967,
111 - 'circ' => 710,
112 - 'clubs' => 9827,
113 - 'cong' => 8773,
114 - 'copy' => 169,
115 - 'crarr' => 8629,
116 - 'cup' => 8746,
117 - 'curren' => 164,
118 - 'dagger' => 8224,
119 - 'Dagger' => 8225,
120 - 'darr' => 8595,
121 - 'dArr' => 8659,
122 - 'deg' => 176,
123 - 'Delta' => 916,
124 - 'delta' => 948,
125 - 'diams' => 9830,
126 - 'divide' => 247,
127 - 'Eacute' => 201,
128 - 'eacute' => 233,
129 - 'Ecirc' => 202,
130 - 'ecirc' => 234,
131 - 'Egrave' => 200,
132 - 'egrave' => 232,
133 - 'empty' => 8709,
134 - 'emsp' => 8195,
135 - 'ensp' => 8194,
136 - 'Epsilon' => 917,
137 - 'epsilon' => 949,
138 - 'equiv' => 8801,
139 - 'Eta' => 919,
140 - 'eta' => 951,
141 - 'ETH' => 208,
142 - 'eth' => 240,
143 - 'Euml' => 203,
144 - 'euml' => 235,
145 - 'euro' => 8364,
146 - 'exist' => 8707,
147 - 'fnof' => 402,
148 - 'forall' => 8704,
149 - 'frac12' => 189,
150 - 'frac14' => 188,
151 - 'frac34' => 190,
152 - 'frasl' => 8260,
153 - 'Gamma' => 915,
154 - 'gamma' => 947,
155 - 'ge' => 8805,
156 - 'gt' => 62,
157 - 'harr' => 8596,
158 - 'hArr' => 8660,
159 - 'hearts' => 9829,
160 - 'hellip' => 8230,
161 - 'Iacute' => 205,
162 - 'iacute' => 237,
163 - 'Icirc' => 206,
164 - 'icirc' => 238,
165 - 'iexcl' => 161,
166 - 'Igrave' => 204,
167 - 'igrave' => 236,
168 - 'image' => 8465,
169 - 'infin' => 8734,
170 - 'int' => 8747,
171 - 'Iota' => 921,
172 - 'iota' => 953,
173 - 'iquest' => 191,
174 - 'isin' => 8712,
175 - 'Iuml' => 207,
176 - 'iuml' => 239,
177 - 'Kappa' => 922,
178 - 'kappa' => 954,
179 - 'Lambda' => 923,
180 - 'lambda' => 955,
181 - 'lang' => 9001,
182 - 'laquo' => 171,
183 - 'larr' => 8592,
184 - 'lArr' => 8656,
185 - 'lceil' => 8968,
186 - 'ldquo' => 8220,
187 - 'le' => 8804,
188 - 'lfloor' => 8970,
189 - 'lowast' => 8727,
190 - 'loz' => 9674,
191 - 'lrm' => 8206,
192 - 'lsaquo' => 8249,
193 - 'lsquo' => 8216,
194 - 'lt' => 60,
195 - 'macr' => 175,
196 - 'mdash' => 8212,
197 - 'micro' => 181,
198 - 'middot' => 183,
199 - 'minus' => 8722,
200 - 'Mu' => 924,
201 - 'mu' => 956,
202 - 'nabla' => 8711,
203 - 'nbsp' => 160,
204 - 'ndash' => 8211,
205 - 'ne' => 8800,
206 - 'ni' => 8715,
207 - 'not' => 172,
208 - 'notin' => 8713,
209 - 'nsub' => 8836,
210 - 'Ntilde' => 209,
211 - 'ntilde' => 241,
212 - 'Nu' => 925,
213 - 'nu' => 957,
214 - 'Oacute' => 211,
215 - 'oacute' => 243,
216 - 'Ocirc' => 212,
217 - 'ocirc' => 244,
218 - 'OElig' => 338,
219 - 'oelig' => 339,
220 - 'Ograve' => 210,
221 - 'ograve' => 242,
222 - 'oline' => 8254,
223 - 'Omega' => 937,
224 - 'omega' => 969,
225 - 'Omicron' => 927,
226 - 'omicron' => 959,
227 - 'oplus' => 8853,
228 - 'or' => 8744,
229 - 'ordf' => 170,
230 - 'ordm' => 186,
231 - 'Oslash' => 216,
232 - 'oslash' => 248,
233 - 'Otilde' => 213,
234 - 'otilde' => 245,
235 - 'otimes' => 8855,
236 - 'Ouml' => 214,
237 - 'ouml' => 246,
238 - 'para' => 182,
239 - 'part' => 8706,
240 - 'permil' => 8240,
241 - 'perp' => 8869,
242 - 'Phi' => 934,
243 - 'phi' => 966,
244 - 'Pi' => 928,
245 - 'pi' => 960,
246 - 'piv' => 982,
247 - 'plusmn' => 177,
248 - 'pound' => 163,
249 - 'prime' => 8242,
250 - 'Prime' => 8243,
251 - 'prod' => 8719,
252 - 'prop' => 8733,
253 - 'Psi' => 936,
254 - 'psi' => 968,
255 - 'quot' => 34,
256 - 'radic' => 8730,
257 - 'rang' => 9002,
258 - 'raquo' => 187,
259 - 'rarr' => 8594,
260 - 'rArr' => 8658,
261 - 'rceil' => 8969,
262 - 'rdquo' => 8221,
263 - 'real' => 8476,
264 - 'reg' => 174,
265 - 'rfloor' => 8971,
266 - 'Rho' => 929,
267 - 'rho' => 961,
268 - 'rlm' => 8207,
269 - 'rsaquo' => 8250,
270 - 'rsquo' => 8217,
271 - 'sbquo' => 8218,
272 - 'Scaron' => 352,
273 - 'scaron' => 353,
274 - 'sdot' => 8901,
275 - 'sect' => 167,
276 - 'shy' => 173,
277 - 'Sigma' => 931,
278 - 'sigma' => 963,
279 - 'sigmaf' => 962,
280 - 'sim' => 8764,
281 - 'spades' => 9824,
282 - 'sub' => 8834,
283 - 'sube' => 8838,
284 - 'sum' => 8721,
285 - 'sup' => 8835,
286 - 'sup1' => 185,
287 - 'sup2' => 178,
288 - 'sup3' => 179,
289 - 'supe' => 8839,
290 - 'szlig' => 223,
291 - 'Tau' => 932,
292 - 'tau' => 964,
293 - 'there4' => 8756,
294 - 'Theta' => 920,
295 - 'theta' => 952,
296 - 'thetasym' => 977,
297 - 'thinsp' => 8201,
298 - 'THORN' => 222,
299 - 'thorn' => 254,
300 - 'tilde' => 732,
301 - 'times' => 215,
302 - 'trade' => 8482,
303 - 'Uacute' => 218,
304 - 'uacute' => 250,
305 - 'uarr' => 8593,
306 - 'uArr' => 8657,
307 - 'Ucirc' => 219,
308 - 'ucirc' => 251,
309 - 'Ugrave' => 217,
310 - 'ugrave' => 249,
311 - 'uml' => 168,
312 - 'upsih' => 978,
313 - 'Upsilon' => 933,
314 - 'upsilon' => 965,
315 - 'Uuml' => 220,
316 - 'uuml' => 252,
317 - 'weierp' => 8472,
318 - 'Xi' => 926,
319 - 'xi' => 958,
320 - 'Yacute' => 221,
321 - 'yacute' => 253,
322 - 'yen' => 165,
323 - 'Yuml' => 376,
324 - 'yuml' => 255,
325 - 'Zeta' => 918,
326 - 'zeta' => 950,
327 - 'zwj' => 8205,
328 - 'zwnj' => 8204 );
 314+ /**
 315+ * Lazy-initialised attributes regex, see getAttribsRegex()
 316+ */
 317+ static $attribsRegex;
329318
330 -/**
331 - * Character entity aliases accepted by MediaWiki
332 - */
333 -global $wgHtmlEntityAliases;
334 -$wgHtmlEntityAliases = array(
335 - 'רלמ' => 'rlm',
336 - 'رلم' => 'rlm',
337 -);
 319+ /**
 320+ * Regular expression to match HTML/XML attribute pairs within a tag.
 321+ * Allows some... latitude.
 322+ * Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes
 323+ */
 324+ static function getAttribsRegex() {
 325+ if ( self::$attribsRegex === null ) {
 326+ $attribFirst = '[:A-Z_a-z0-9]';
 327+ $attrib = '[:A-Z_a-z-.0-9]';
 328+ $space = '[\x09\x0a\x0d\x20]';
 329+ self::$attribsRegex =
 330+ "/(?:^|$space)({$attribFirst}{$attrib}*)
 331+ ($space*=$space*
 332+ (?:
 333+ # The attribute value: quoted or alone
 334+ \"([^<\"]*)\"
 335+ | '([^<']*)'
 336+ | ([a-zA-Z0-9!#$%&()*,\\-.\\/:;<>?@[\\]^_`{|}~]+)
 337+ | (\#[0-9a-fA-F]+) # Technically wrong, but lots of
 338+ # colors are specified like this.
 339+ # We'll be normalizing it.
 340+ )
 341+ )?(?=$space|\$)/sx";
 342+ }
 343+ return self::$attribsRegex;
 344+ }
338345
339 -
340 -/**
341 - * XHTML sanitizer for MediaWiki
342 - * @ingroup Parser
343 - */
344 -class Sanitizer {
345346 /**
346347 * Cleans up HTML, removes dangerous tags and attributes, and
347348 * removes HTML comments
@@ -635,8 +636,8 @@
636637 $out = array();
637638 foreach( $attribs as $attribute => $value ) {
638639 #allow XML namespace declaration if RDFa is enabled
639 - if ( $wgAllowRdfaAttributes && preg_match( MW_XMLNS_ATTRIBUTE_PATTRN, $attribute ) ) {
640 - if ( !preg_match( MW_EVIL_URI_PATTERN, $value ) ) {
 640+ if ( $wgAllowRdfaAttributes && preg_match( self::XMLNS_ATTRIBUTE_PATTERN, $attribute ) ) {
 641+ if ( !preg_match( self::EVIL_URI_PATTERN, $value ) ) {
641642 $out[$attribute] = $value;
642643 }
643644
@@ -666,7 +667,7 @@
667668 $attribute === 'itemscope' || $attribute === 'itemtype' ) { #HTML5 microdata
668669
669670 //Paranoia. Allow "simple" values but suppress javascript
670 - if ( preg_match( MW_EVIL_URI_PATTERN, $value ) ) {
 671+ if ( preg_match( self::EVIL_URI_PATTERN, $value ) ) {
671672 continue;
672673 }
673674 }
@@ -1002,7 +1003,7 @@
10031004 $attribs = array();
10041005 $pairs = array();
10051006 if( !preg_match_all(
1006 - MW_ATTRIBS_REGEX,
 1007+ self::getAttribsRegex(),
10071008 $text,
10081009 $pairs,
10091010 PREG_SET_ORDER ) ) {
@@ -1025,7 +1026,7 @@
10261027
10271028 /**
10281029 * Pick the appropriate attribute value from a match set from the
1029 - * MW_ATTRIBS_REGEX matches.
 1030+ * attribs regex matches.
10301031 *
10311032 * @param $set Array
10321033 * @return String
@@ -1105,7 +1106,7 @@
11061107 */
11071108 static function normalizeCharReferences( $text ) {
11081109 return preg_replace_callback(
1109 - MW_CHAR_REFS_REGEX,
 1110+ self::CHAR_REFS_REGEX,
11101111 array( 'Sanitizer', 'normalizeCharReferencesCallback' ),
11111112 $text );
11121113 }
@@ -1140,14 +1141,13 @@
11411142 * @return String
11421143 */
11431144 static function normalizeEntity( $name ) {
1144 - global $wgHtmlEntities, $wgHtmlEntityAliases;
1145 - if ( isset( $wgHtmlEntityAliases[$name] ) ) {
1146 - return "&{$wgHtmlEntityAliases[$name]};";
 1145+ if ( isset( self::$htmlEntityAliases[$name] ) ) {
 1146+ return '&' . self::$htmlEntityAliases[$name] . ';';
11471147 } elseif ( in_array( $name,
11481148 array( 'lt', 'gt', 'amp', 'quot' ) ) ) {
11491149 return "&$name;";
1150 - } elseif ( isset( $wgHtmlEntities[$name] ) ) {
1151 - return "&#{$wgHtmlEntities[$name]};";
 1150+ } elseif ( isset( self::$htmlEntities[$name] ) ) {
 1151+ return '&#' . self::$htmlEntities[$name] . ';';
11521152 } else {
11531153 return "&amp;$name;";
11541154 }
@@ -1194,7 +1194,7 @@
11951195 */
11961196 public static function decodeCharReferences( $text ) {
11971197 return preg_replace_callback(
1198 - MW_CHAR_REFS_REGEX,
 1198+ self::CHAR_REFS_REGEX,
11991199 array( 'Sanitizer', 'decodeCharReferencesCallback' ),
12001200 $text );
12011201 }
@@ -1212,7 +1212,7 @@
12131213 public static function decodeCharReferencesAndNormalize( $text ) {
12141214 global $wgContLang;
12151215 $text = preg_replace_callback(
1216 - MW_CHAR_REFS_REGEX,
 1216+ self::CHAR_REFS_REGEX,
12171217 array( 'Sanitizer', 'decodeCharReferencesCallback' ),
12181218 $text, /* limit */ -1, $count );
12191219
@@ -1263,12 +1263,11 @@
12641264 * @return String
12651265 */
12661266 static function decodeEntity( $name ) {
1267 - global $wgHtmlEntities, $wgHtmlEntityAliases;
1268 - if ( isset( $wgHtmlEntityAliases[$name] ) ) {
1269 - $name = $wgHtmlEntityAliases[$name];
 1267+ if ( isset( self::$htmlEntityAliases[$name] ) ) {
 1268+ $name = self::$htmlEntityAliases[$name];
12701269 }
1271 - if( isset( $wgHtmlEntities[$name] ) ) {
1272 - return codepointToUtf8( $wgHtmlEntities[$name] );
 1270+ if( isset( self::$htmlEntities[$name] ) ) {
 1271+ return codepointToUtf8( self::$htmlEntities[$name] );
12731272 } else {
12741273 return "&$name;";
12751274 }
@@ -1493,9 +1492,8 @@
14941493 * @return String
14951494 */
14961495 static function hackDocType() {
1497 - global $wgHtmlEntities;
14981496 $out = "<!DOCTYPE html [\n";
1499 - foreach( $wgHtmlEntities as $entity => $codepoint ) {
 1497+ foreach( self::$htmlEntities as $entity => $codepoint ) {
15001498 $out .= "<!ENTITY $entity \"&#$codepoint;\">";
15011499 }
15021500 $out .= "]>\n";
Index: trunk/phase3/includes/Defines.php
@@ -1,6 +1,11 @@
22 <?php
33 /**
4 - * A few constants that might be needed during LocalSettings.php
 4+ * A few constants that might be needed during LocalSettings.php.
 5+ *
 6+ * Note: these constants must all be resolvable at compile time by HipHop,
 7+ * since this file will not be executed during request startup for a compiled
 8+ * MediaWiki.
 9+ *
510 * @file
611 */
712
Index: trunk/phase3/includes/Title.php
@@ -5,15 +5,6 @@
66 */
77
88 /**
9 - * @todo: determine if it is really necessary to load this. Appears to be left over from pre-autoloader versions, and
10 - * is only really needed to provide access to constant UTF8_REPLACEMENT, which actually resides in UtfNormalDefines.php
11 - * and is loaded by UtfNormalUtil.php, which is loaded by UtfNormal.php.
12 - */
13 -if ( !class_exists( 'UtfNormal' ) ) {
14 - require_once( dirname( __FILE__ ) . '/normal/UtfNormal.php' );
15 -}
16 -
17 -/**
189 * @deprecated This used to be a define, but was moved to
1910 * Title::GAID_FOR_UPDATE in 1.17. This will probably be removed in 1.18
2011 */
Index: trunk/phase3/includes/GlobalFunctions.php
@@ -8,7 +8,9 @@
99 die( "This file is part of MediaWiki, it is not a valid entry point" );
1010 }
1111
12 -require_once dirname( __FILE__ ) . '/normal/UtfNormalUtil.php';
 12+if ( !defined( 'MW_COMPILED' ) ) {
 13+ require_once( dirname( __FILE__ ) . '/normal/UtfNormalUtil.php' );
 14+}
1315
1416 // Hide compatibility functions from Doxygen
1517 /// @cond
@@ -329,7 +331,7 @@
330332 $exists = file_exists( $file );
331333 $size = $exists ? filesize( $file ) : false;
332334 if ( !$exists || ( $size !== false && $size + strlen( $text ) < 0x7fffffff ) ) {
333 - error_log( $text, 3, $file );
 335+ file_put_contents( $file, $text, FILE_APPEND );
334336 }
335337 wfRestoreWarnings();
336338 }
@@ -495,7 +497,7 @@
496498 */
497499 function wfMessageFallback( /*...*/ ) {
498500 $args = func_get_args();
499 - return call_user_func_array( array( 'Message', 'newFallbackSequence' ), $args );
 501+ return MWFunction::callArray( 'Message::newFallbackSequence', $args );
500502 }
501503
502504 /**
@@ -1994,6 +1996,13 @@
19951997 }
19961998
19971999 /**
 2000+ * Check if we are running under HipHop
 2001+ */
 2002+function wfIsHipHop() {
 2003+ return function_exists( 'hphp_thread_set_warmup_enabled' );
 2004+}
 2005+
 2006+/**
19982007 * Swap two variables
19992008 */
20002009 function swap( &$x, &$y ) {
@@ -2781,7 +2790,17 @@
27822791 global $wgSessionsInMemcached, $wgCookiePath, $wgCookieDomain,
27832792 $wgCookieSecure, $wgCookieHttpOnly, $wgSessionHandler;
27842793 if( $wgSessionsInMemcached ) {
2785 - require_once( 'MemcachedSessions.php' );
 2794+ if ( !defined( 'MW_COMPILED' ) ) {
 2795+ require_once( 'MemcachedSessions.php' );
 2796+ }
 2797+ session_set_save_handler( 'memsess_open', 'memsess_close', 'memsess_read',
 2798+ 'memsess_write', 'memsess_destroy', 'memsess_gc' );
 2799+
 2800+ // It's necessary to register a shutdown function to call session_write_close(),
 2801+ // because by the time the request shutdown function for the session module is
 2802+ // called, $wgMemc has already been destroyed. Shutdown functions registered
 2803+ // this way are called before object destruction.
 2804+ register_shutdown_function( 'memsess_write_close' );
27862805 } elseif( $wgSessionHandler && $wgSessionHandler != ini_get( 'session.save_handler' ) ) {
27872806 # Only set this if $wgSessionHandler isn't null and session.save_handler
27882807 # hasn't already been set to the desired value (that causes errors)
Index: trunk/phase3/includes/Namespace.php
@@ -5,35 +5,6 @@
66 */
77
88 /**
9 - * Definitions of the NS_ constants are in Defines.php
10 - * @private
11 - */
12 -$wgCanonicalNamespaceNames = array(
13 - NS_MEDIA => 'Media',
14 - NS_SPECIAL => 'Special',
15 - NS_TALK => 'Talk',
16 - NS_USER => 'User',
17 - NS_USER_TALK => 'User_talk',
18 - NS_PROJECT => 'Project',
19 - NS_PROJECT_TALK => 'Project_talk',
20 - NS_FILE => 'File',
21 - NS_FILE_TALK => 'File_talk',
22 - NS_MEDIAWIKI => 'MediaWiki',
23 - NS_MEDIAWIKI_TALK => 'MediaWiki_talk',
24 - NS_TEMPLATE => 'Template',
25 - NS_TEMPLATE_TALK => 'Template_talk',
26 - NS_HELP => 'Help',
27 - NS_HELP_TALK => 'Help_talk',
28 - NS_CATEGORY => 'Category',
29 - NS_CATEGORY_TALK => 'Category_talk',
30 -);
31 -
32 -/// @todo UGLY UGLY
33 -if( is_array( $wgExtraNamespaces ) ) {
34 - $wgCanonicalNamespaceNames = $wgCanonicalNamespaceNames + $wgExtraNamespaces;
35 -}
36 -
37 -/**
389 * This is a utility class with only static functions
3910 * for dealing with namespaces that encodes all the
4011 * "magic" behaviors of them based on index. The textual
Index: trunk/phase3/includes/ExternalStore.php
@@ -67,7 +67,7 @@
6868
6969 $class = 'ExternalStore' . ucfirst( $proto );
7070 /* Any custom modules should be added to $wgAutoLoadClasses for on-demand loading */
71 - if( !class_exists( $class ) ) {
 71+ if( !MWInit::classExists( $class ) ) {
7272 return false;
7373 }
7474
Index: trunk/phase3/includes/normal/UtfNormalUtil.php
@@ -25,7 +25,7 @@
2626 * @ingroup UtfNormal
2727 */
2828
29 -require_once dirname(__FILE__).'/UtfNormalDefines.php';
 29+require_once( MWInit::compiledPath( 'includes/normal/UtfNormalDefines.php' ) );
3030
3131 /**
3232 * Return UTF-8 sequence for a given Unicode code point.
Index: trunk/phase3/includes/normal/UtfNormalDefines.php
@@ -1,7 +1,11 @@
22 <?php
33 /**
4 - * Some constant definitions for the unicode normalization module
 4+ * Some constant definitions for the unicode normalization module.
55 *
 6+ * Note: these constants must all be resolvable at compile time by HipHop,
 7+ * since this file will not be executed during request startup for a compiled
 8+ * MediaWiki.
 9+ *
610 * @file
711 * @ingroup UtfNormal
812 */
Index: trunk/phase3/includes/WebStart.php
@@ -8,6 +8,20 @@
99 * @file
1010 */
1111
 12+/**
 13+ * Detect compiled mode by looking for a function that only exists if compiled
 14+ * in. Note that we can't use function_exists(), because it is terribly broken
 15+ * under HipHop due to the "volatile" feature.
 16+ */
 17+function wfDetectCompiledMode() {
 18+ try {
 19+ $r = new ReflectionFunction( 'wfHipHopCompilerVersion' );
 20+ } catch ( ReflectionException $e ) {
 21+ $r = false;
 22+ }
 23+ return $r !== false;
 24+}
 25+
1226 # Protect against register_globals
1327 # This must be done before any globals are set by the code
1428 if ( ini_get( 'register_globals' ) ) {
@@ -67,40 +81,51 @@
6882 $IP = realpath( '.' );
6983 }
7084
71 -
72 -# Start profiler
73 -if( file_exists("$IP/StartProfiler.php") ) {
74 - require_once( "$IP/StartProfiler.php" );
75 -} else {
76 - require_once( "$IP/includes/ProfilerStub.php" );
 85+if ( wfDetectCompiledMode() ) {
 86+ define( 'MW_COMPILED', 1 );
7787 }
78 -wfProfileIn( 'WebStart.php-conf' );
7988
80 -# Load up some global defines.
81 -require_once( "$IP/includes/Defines.php" );
 89+if ( !defined( 'MW_COMPILED' ) ) {
 90+ # Get MWInit class
 91+ require_once( "$IP/includes/Init.php" );
8292
83 -# Check for PHP 5
84 -if ( !function_exists( 'version_compare' )
85 - || version_compare( phpversion(), '5.0.0' ) < 0
86 -) {
87 - define( 'MW_PHP4', '1' );
88 - require( "$IP/includes/DefaultSettings.php" );
89 - require( "$IP/includes/templates/PHP4.php" );
90 - exit;
 93+ # Start profiler
 94+ # FIXME: rewrite wfProfileIn/wfProfileOut so that they can work in compiled mode
 95+ if ( file_exists( "$IP/StartProfiler.php" ) ) {
 96+ require_once( "$IP/StartProfiler.php" );
 97+ } else {
 98+ require_once( "$IP/includes/ProfilerStub.php" );
 99+ }
 100+
 101+ # Load up some global defines.
 102+ require_once( "$IP/includes/Defines.php" );
 103+
 104+ # Check for PHP 5
 105+ if ( !function_exists( 'version_compare' )
 106+ || version_compare( phpversion(), '5.0.0' ) < 0
 107+ ) {
 108+ define( 'MW_PHP4', '1' );
 109+ require( "$IP/includes/DefaultSettings.php" );
 110+ require( "$IP/includes/templates/PHP4.php" );
 111+ exit;
 112+ }
 113+
 114+ # Start the autoloader, so that extensions can derive classes from core files
 115+ require_once( "$IP/includes/AutoLoader.php" );
91116 }
92117
93 -# Start the autoloader, so that extensions can derive classes from core files
94 -require_once( "$IP/includes/AutoLoader.php" );
 118+wfProfileIn( 'WebStart.php-conf' );
 119+
95120 # Load default settings
96 -require_once( "$IP/includes/DefaultSettings.php" );
 121+require_once( MWInit::compiledPath( "includes/DefaultSettings.php" ) );
97122
98123 if ( defined( 'MW_CONFIG_CALLBACK' ) ) {
99124 # Use a callback function to configure MediaWiki
100125 MWFunction::call( MW_CONFIG_CALLBACK );
101 -
102126 } else {
103 - if ( !defined('MW_CONFIG_FILE') )
104 - define('MW_CONFIG_FILE', "$IP/LocalSettings.php");
 127+ if ( !defined( 'MW_CONFIG_FILE' ) ) {
 128+ define('MW_CONFIG_FILE', MWInit::interpretedPath( 'LocalSettings.php' ) );
 129+ }
105130
106131 # LocalSettings.php is the per site customization file. If it does not exist
107132 # the wiki installer needs to be launched or the generated file uploaded to
@@ -115,7 +140,7 @@
116141 }
117142
118143 if ( $wgEnableSelenium ) {
119 - require_once( "$IP/includes/SeleniumWebSettings.php" );
 144+ require_once( MWInit::compiledPath( "includes/SeleniumWebSettings.php" ) );
120145 }
121146
122147 wfProfileOut( 'WebStart.php-conf' );
@@ -126,12 +151,14 @@
127152 # that would cause us to potentially mix gzip and non-gzip output, creating a
128153 # big mess.
129154 if ( !defined( 'MW_NO_OUTPUT_BUFFER' ) && ob_get_level() == 0 ) {
130 - require_once( "$IP/includes/OutputHandler.php" );
 155+ if ( !defined( 'MW_COMPILED' ) ) {
 156+ require_once( "$IP/includes/OutputHandler.php" );
 157+ }
131158 ob_start( 'wfOutputHandler' );
132159 }
133160 wfProfileOut( 'WebStart.php-ob_start' );
134161
135162 if ( !defined( 'MW_NO_SETUP' ) ) {
136 - require_once( "$IP/includes/Setup.php" );
 163+ require_once( MWInit::compiledPath( "includes/Setup.php" ) );
137164 }
138165
Index: trunk/phase3/includes/Skin.php
@@ -138,24 +138,28 @@
139139 $className = "Skin{$skinName}";
140140
141141 # Grab the skin class and initialise it.
142 - if ( !class_exists( $className ) ) {
143 - // Preload base classes to work around APC/PHP5 bug
144 - $deps = "{$wgStyleDirectory}/{$skinName}.deps.php";
 142+ if ( !MWInit::classExists( $className ) ) {
145143
146 - if ( file_exists( $deps ) ) {
147 - include_once( $deps );
 144+ if ( !defined( 'MW_COMPILED' ) ) {
 145+ // Preload base classes to work around APC/PHP5 bug
 146+ $deps = "{$wgStyleDirectory}/{$skinName}.deps.php";
 147+ if ( file_exists( $deps ) ) {
 148+ include_once( $deps );
 149+ }
 150+ require_once( "{$wgStyleDirectory}/{$skinName}.php" );
148151 }
149 - require_once( "{$wgStyleDirectory}/{$skinName}.php" );
150152
151153 # Check if we got if not failback to default skin
152 - if ( !class_exists( $className ) ) {
 154+ if ( !MWInit::classExists( $className ) ) {
153155 # DO NOT die if the class isn't found. This breaks maintenance
154156 # scripts and can cause a user account to be unrecoverable
155157 # except by SQL manipulation if a previously valid skin name
156158 # is no longer valid.
157159 wfDebug( "Skin class does not exist: $className\n" );
158160 $className = 'SkinVector';
159 - require_once( "{$wgStyleDirectory}/Vector.php" );
 161+ if ( !defined( 'MW_COMPILED' ) ) {
 162+ require_once( "{$wgStyleDirectory}/Vector.php" );
 163+ }
160164 }
161165 }
162166 $skin = new $className;
Index: trunk/phase3/includes/MimeMagic.php
@@ -117,14 +117,6 @@
118118 END_STRING
119119 );
120120
121 -// Note: because this file is possibly included by a function,
122 -// we need to access the global scope explicitely!
123 -global $wgLoadFileinfoExtension;
124 -
125 -if ( $wgLoadFileinfoExtension ) {
126 - wfDl( 'fileinfo' );
127 -}
128 -
129121 /**
130122 * Implements functions related to mime types such as detection and mapping to
131123 * file extension.
@@ -160,6 +152,10 @@
161153 */
162154 private static $instance;
163155
 156+ /** True if the fileinfo extension has been loaded
 157+ */
 158+ private static $extensionLoaded = false;
 159+
164160 /** Initializes the MimeMagic object. This is called by MimeMagic::singleton().
165161 *
166162 * This constructor parses the mime.types and mime.info files and build internal mappings.
@@ -169,7 +165,7 @@
170166 * --- load mime.types ---
171167 */
172168
173 - global $wgMimeTypeFile, $IP;
 169+ global $wgMimeTypeFile, $IP, $wgLoadFileinfoExtension;
174170
175171 $types = MM_WELL_KNOWN_MIME_TYPES;
176172
@@ -177,6 +173,11 @@
178174 $wgMimeTypeFile = "$IP/$wgMimeTypeFile";
179175 }
180176
 177+ if ( $wgLoadFileinfoExtension && !self::$extensionLoaded ) {
 178+ self::$extensionLoaded = true;
 179+ wfDl( 'fileinfo' );
 180+ }
 181+
181182 if ( $wgMimeTypeFile ) {
182183 if ( is_file( $wgMimeTypeFile ) and is_readable( $wgMimeTypeFile ) ) {
183184 wfDebug( __METHOD__.": loading mime types from $wgMimeTypeFile\n" );
Index: trunk/phase3/includes/DefaultSettings.php
@@ -28,7 +28,9 @@
2929
3030 # Create a site configuration object. Not used for much in a default install
3131 if ( !defined( 'MW_PHP4' ) ) {
32 - require_once( "$IP/includes/SiteConfiguration.php" );
 32+ if ( !defined( 'MW_COMPILED' ) ) {
 33+ require_once( "$IP/includes/SiteConfiguration.php" );
 34+ }
3335 $wgConf = new SiteConfiguration;
3436 }
3537 /** @endcond */

Follow-up revisions

RevisionCommit summaryAuthorDate
r85330Added missing file from r85327.tstarling14:40, 4 April 2011
r85664No need to force inclusion of Namespace.php since r85327ialex07:19, 8 April 2011
r85918Improvements to handling of 'catastrophic' errors, like unsupported PHP versi...happy-melon20:38, 12 April 2011
r86245(patch by Yuvipanda) Change all uses of $wgLanguageNames to Language::getLang...bawolff22:52, 16 April 2011
r87748(bug 28864) Fix UtfNormal benchmark & test case runners from regression in r8......brion17:43, 9 May 2011

Comments

#Comment by Raymond (talk | contribs)   14:34, 4 April 2011

Seen on Translatewiki and on my local wiki:

Warning: require_once(D:\F_Programmierung\xampp\htdocs\wiki2/includes/Init.php) [function.require-once]: failed to open stream: No such file or directory in D:\F_Programmierung\xampp\htdocs\wiki2\includes\WebStart.php on line 90

Fatal error: require_once() [function.require]: Failed opening required 'D:\F_Programmierung\xampp\htdocs\wiki2/includes/Init.php' (include_path='.;D:\F_Programmierung\xampp\php\PEAR') in D:\F_Programmierung\xampp\htdocs\wiki2\includes\WebStart.php on line 90

#Comment by Krinkle (talk | contribs)   17:41, 5 April 2011

I haven't read all the diffs, but I noticed a few changes that may not be obvious and/or are conflicting with our current conventions.

Since HipHop is apparently the way we go, perhaps you (Tim) could take a look at these three pages to make sure they are still correct:

#Comment by Platonides (talk | contribs)   21:44, 7 April 2011

You are using too many ways of detecting HipHop: wfDetectCompiledMode(), wfIsHipHop(), defined( 'MW_COMPILED' ), function_exists( 'hphp_thread_set_warmup_enabled' ), ReflectionFunction( 'wfHipHopCompilerVersion' );

I think Sanitizar::$attribsRegex should be a function static

I'm not convinced about the change in Names.php. In any case, you missed removing its require and global from the top of Language.php

Otherwise it's ok

#Comment by Tim Starling (talk | contribs)   00:28, 12 April 2011

The ReflectionFunction boilerplate and the function_exists() detect two different things: one detects compiled mode and the other detects HipHop, whether it's compiled or interpreted. These correspond to defined( 'MW_COMPILED' ) and wfIsHipHop() in the bulk of the codebase, however it's not possible to use them before they exist, and it's not possible to make sure they exist without first checking for HipHop. That's why the relevant code needs to be duplicated in WebStart.php and Maintenance.php. wfDetectCompiledMode() is a helper function for use inside WebStart.php only, and is used to set MW_COMPILED.

#Comment by Reedy (talk | contribs)   17:44, 11 April 2011

See also bug 28470

#Comment by DieBuche (talk | contribs)   18:24, 12 April 2011

This broke the parserTest.php

#Comment by YuviPanda (talk | contribs)   21:15, 16 April 2011

It also broke the following extensions on trunk, since they were referring to $wgLanguageNames:

  • CentralAuth
  • LanguageSelector
  • Polyglot
  • SiteMatrix
  • GetFamily
  • OtherSites
  • WikimediaIncubator
#Comment by Brion VIBBER (talk | contribs)   17:45, 9 May 2011

Note that r87748 restores the original version of the require_once line in UtfNormalUtil.php (without the MWInit::compiledPath call added here which caused it to simply break 100% on the generator, test, and benchmark code which does not depend on the rest of MediaWiki) -- the line had been moved to Setup.php in r85944 which still left all the UtfNormal scripts broken (bug 28864).

This doesn't seem to have any problems with regular MediaWiki -- as the file is already now included from Setup.php -- but it might have some effect on the HipHop builds, untested.

#Comment by Tim Starling (talk | contribs)   04:06, 29 August 2011

I suggest a resolved status, since the extensions and parserTests.php have been fixed, and Platonides' complaint was never a problem in the first place.

#Comment by 😂 (talk | contribs)   23:08, 2 September 2011

Agreed, marking as such.

Status & tagging log