r79520 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r79519‎ | r79520 | r79521 >
Date:20:17, 3 January 2011
Author:dantman
Status:resolved (Comments)
Tags:
Comment:
Try to output editsection markers instead of rendered editsection links and defer the rendering to a point where the markup does not need to be stored in the cache.
Doing this allows skins to override doEditSectionLinks without poluting the parser cache or fragmenting the parser cache more.
As a side effect it eliminates the primary cause of user language based parser cache fragmentation.
Because this makes most old parser cache entries invalid $wgUseEditSectionTokens is provided so that large installations like Wikipedia can keep their old parser cache entries.
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/DefaultSettings.php (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)
  • /trunk/phase3/includes/parser/ParserOptions.php (modified) (history)
  • /trunk/phase3/includes/parser/ParserOutput.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/parser/Parser.php
@@ -3696,6 +3696,12 @@
36973697 } else {
36983698 $showEditLink = $this->mOptions->getEditSection();
36993699 }
 3700+ if ( $showEditLink ) {
 3701+ $editLinkAsToken = $this->mOptions->getEditSectionTokens();
 3702+ if ( $editLinkAsToken ) {
 3703+ $this->mOutput->setEditSectionTokens( "{$this->mUniqPrefix}-editsection-", self::MARKER_SUFFIX );
 3704+ }
 3705+ }
37003706
37013707 # Get all headlines for numbering them and adding funky stuff like [edit]
37023708 # links - this is for later, but we need the number of headlines right now
@@ -3949,12 +3955,28 @@
39503956
39513957 # give headline the correct <h#> tag
39523958 if ( $showEditLink && $sectionIndex !== false ) {
3953 - if ( $isTemplate ) {
3954 - # Put a T flag in the section identifier, to indicate to extractSections()
3955 - # that sections inside <includeonly> should be counted.
3956 - $editlink = $sk->doEditSectionLink( Title::newFromText( $titleText ), "T-$sectionIndex", null, $this->mOptions->getUserLang() );
 3959+ if ( $editLinkAsToken ) {
 3960+ // Output edit section links as markers with styles that can be customized by skins
 3961+ if ( $isTemplate ) {
 3962+ # Put a T flag in the section identifier, to indicate to extractSections()
 3963+ # that sections inside <includeonly> should be counted.
 3964+ $editlinkArgs = array( $titleText, "T-$sectionIndex", null );
 3965+ } else {
 3966+ $editlinkArgs = array( $this->mTitle->getPrefixedText(), $sectionIndex, $headlineHint );
 3967+ }
 3968+ // We use nearly the same structure as uniqPrefix and the marker stuffix (besides there being nothing random)
 3969+ // However the this is output into the parser output itself not replaced early, so we hardcode this in case
 3970+ // the constants change in a different version of MediaWiki, which would break this code.
 3971+ $editlink = "{$this->mUniqPrefix}-editsection-" . serialize($editlinkArgs) . self::MARKER_SUFFIX;
39573972 } else {
3958 - $editlink = $sk->doEditSectionLink( $this->mTitle, $sectionIndex, $headlineHint, $this->mOptions->getUserLang() );
 3973+ // Output edit section links directly as markup like we used to
 3974+ if ( $isTemplate ) {
 3975+ # Put a T flag in the section identifier, to indicate to extractSections()
 3976+ # that sections inside <includeonly> should be counted.
 3977+ $editlink = $sk->doEditSectionLink( Title::newFromText( $titleText ), "T-$sectionIndex", null, $this->mOptions->getUserLang() );
 3978+ } else {
 3979+ $editlink = $sk->doEditSectionLink( $this->mTitle, $sectionIndex, $headlineHint, $this->mOptions->getUserLang() );
 3980+ }
39593981 }
39603982 } else {
39613983 $editlink = '';
Index: trunk/phase3/includes/parser/ParserOutput.php
@@ -119,6 +119,7 @@
120120 $mOutputHooks = array(), # Hook tags as per $wgParserOutputHooks
121121 $mWarnings = array(), # Warning text to be returned to the user. Wikitext formatted, in the key only
122122 $mSections = array(), # Table of contents
 123+ $mEditSectionTokens = false, # prefix/suffix markers if edit sections were output as tokens
123124 $mProperties = array(), # Name/value pairs to be cached in the DB
124125 $mTOCHTML = ''; # HTML of the TOC
125126 private $mIndexPolicy = ''; # 'index' or 'noindex'? Any other value will result in no change.
@@ -134,13 +135,34 @@
135136 $this->mTitleText = $titletext;
136137 }
137138
138 - function getText() { return $this->mText; }
 139+ function getText() {
 140+ if ( $this->mEditSectionTokens ) {
 141+ $editSectionTokens = $this->mEditSectionTokens;
 142+ return preg_replace_callback( "#{$editSectionTokens[0]}(.*?){$editSectionTokens[1]}#", array( &$this, 'replaceEditSectionLinksCallback' ), $this->mText );
 143+ }
 144+ return $this->mText;
 145+ }
 146+
 147+ /**
 148+ * callback used by getText to replace editsection tokens
 149+ * @private
 150+ */
 151+ function replaceEditSectionLinksCallback( $m ) {
 152+ global $wgUser, $wgLang;
 153+ $args = unserialize($m[1]);
 154+ $args[0] = Title::newFromText( $args[0] );
 155+ $args[] = $wgLang->getCode();
 156+ $skin = $wgUser->getSkin();
 157+ return call_user_func_array( array( $skin, 'doEditSectionLink' ), $args );
 158+ }
 159+
139160 function &getLanguageLinks() { return $this->mLanguageLinks; }
140161 function getInterwikiLinks() { return $this->mInterwikiLinks; }
141162 function getCategoryLinks() { return array_keys( $this->mCategories ); }
142163 function &getCategories() { return $this->mCategories; }
143164 function getTitleText() { return $this->mTitleText; }
144165 function getSections() { return $this->mSections; }
 166+ function getEditSectionTokens() { return $this->mEditSectionTokens; }
145167 function &getLinks() { return $this->mLinks; }
146168 function &getTemplates() { return $this->mTemplates; }
147169 function &getImages() { return $this->mImages; }
@@ -159,6 +181,7 @@
160182
161183 function setTitleText( $t ) { return wfSetVar( $this->mTitleText, $t ); }
162184 function setSections( $toc ) { return wfSetVar( $this->mSections, $toc ); }
 185+ function setEditSectionTokens( $p, $s ) { return wfSetVar( $this->mEditSectionTokens, array( $p, $s ) ); }
163186 function setIndexPolicy( $policy ) { return wfSetVar( $this->mIndexPolicy, $policy ); }
164187 function setTOCHTML( $tochtml ) { return wfSetVar( $this->mTOCHTML, $tochtml ); }
165188
Index: trunk/phase3/includes/parser/ParserOptions.php
@@ -21,6 +21,7 @@
2222 var $mSkin = null; # Reference to the preferred skin
2323 var $mDateFormat = null; # Date format index
2424 var $mEditSection = true; # Create "edit section" links
 25+ var $mEditSectionTokens = false; # Output replaceable tokens for editsections instead of markup
2526 var $mAllowSpecialInclusion; # Allow inclusion of special pages
2627 var $mTidy = false; # Ask for tidy cleanup
2728 var $mInterfaceMessage = false; # Which lang to call for PLURAL and GRAMMAR
@@ -58,6 +59,8 @@
5960 function getEnableImageWhitelist() { return $this->mEnableImageWhitelist; }
6061 function getEditSection() { $this->optionUsed('editsection');
6162 return $this->mEditSection; }
 63+ function getEditSectionTokens() { $this->optionUsed('editsectiontokens');
 64+ return $this->mEditSectionTokens; }
6265 function getNumberHeadings() { $this->optionUsed('numberheadings');
6366 return $this->mNumberHeadings; }
6467 function getAllowSpecialInclusion() { return $this->mAllowSpecialInclusion; }
@@ -123,6 +126,7 @@
124127 function setEnableImageWhitelist( $x ) { return wfSetVar( $this->mEnableImageWhitelist, $x ); }
125128 function setDateFormat( $x ) { return wfSetVar( $this->mDateFormat, $x ); }
126129 function setEditSection( $x ) { return wfSetVar( $this->mEditSection, $x ); }
 130+ function setEditSectionTokens( $x ) { return wfSetVar( $this->mEditSectionTokens, $x ); }
127131 function setNumberHeadings( $x ) { return wfSetVar( $this->mNumberHeadings, $x ); }
128132 function setAllowSpecialInclusion( $x ) { return wfSetVar( $this->mAllowSpecialInclusion, $x ); }
129133 function setTidy( $x ) { return wfSetVar( $this->mTidy, $x); }
@@ -171,7 +175,7 @@
172176 function initialiseFromUser( $userInput ) {
173177 global $wgUseDynamicDates, $wgInterwikiMagic, $wgAllowExternalImages;
174178 global $wgAllowExternalImagesFrom, $wgEnableImageWhitelist, $wgAllowSpecialInclusion, $wgMaxArticleSize;
175 - global $wgMaxPPNodeCount, $wgMaxTemplateDepth, $wgMaxPPExpandDepth, $wgCleanSignatures;
 179+ global $wgMaxPPNodeCount, $wgMaxTemplateDepth, $wgMaxPPExpandDepth, $wgCleanSignatures, $wgUseEditSectionTokens;
176180 global $wgExternalLinkTarget, $wgLang;
177181
178182 wfProfileIn( __METHOD__ );
@@ -201,6 +205,7 @@
202206 $this->mMaxTemplateDepth = $wgMaxTemplateDepth;
203207 $this->mCleanSignatures = $wgCleanSignatures;
204208 $this->mExternalLinkTarget = $wgExternalLinkTarget;
 209+ $this->mEditSectionTokens = $wgUseEditSectionTokens;
205210
206211 $this->mNumberHeadings = $user->getOption( 'numberheadings' );
207212 $this->mMath = $user->getOption( 'math' );
@@ -307,6 +312,8 @@
308313
309314 if ( !$this->mEditSection && in_array( 'editsection', $forOptions ) )
310315 $confstr .= '!edit=0';
 316+ if ( $this->mEditSectionTokens && in_array( 'editsectiontokens', $forOptions ) )
 317+ $confstr .= '!estok=1';
311318 if ( $this->mIsPrintable && in_array( 'printable', $forOptions ) )
312319 $confstr .= '!printable=1';
313320
Index: trunk/phase3/includes/DefaultSettings.php
@@ -2825,6 +2825,18 @@
28262826 */
28272827 $wgTranscludeCacheExpiry = 3600;
28282828
 2829+/**
 2830+ * Output edit section links as tokens in the parser output for articles instead
 2831+ * of directly as markup.
 2832+ * This feature changes the default parser cache key so it's presented with a
 2833+ * configuration option so that large installations with a large number of already
 2834+ * existing parser cache keys can retain them.
 2835+ * The purpose of this feature is to allow skins to customize the editsection
 2836+ * links, however it has the side effect of also removing the most common use of
 2837+ * the getUserLang parser option which causes cache fragmentation by user lang.
 2838+ */
 2839+$wgUseEditSectionTokens = true;
 2840+
28292841 /** @} */ # end of parser settings }
28302842
28312843 /************************************************************************//**
Index: trunk/phase3/RELEASE-NOTES
@@ -26,6 +26,11 @@
2727 * Skin names are no longer created based on a ucfirst version of the key in $wgValidSkinNames but now
2828 the value. This means for $wgValidSkinNames["monobook"] = "MonoBook"; the skin
2929 loader will no longer try loading SkinMonobook and will instead load SkinMonoBook.
 30+* The parser now attempts to output markers for editsection tokens and defer the rendering
 31+ of them so skin and language specific markup does not need to be saved inside the parser cache
 32+ note that this changes the cache key making all old entries in the parser cache invalid you
 33+ can set $wgUseEditSectionTokens to false to disable this and keep your old parser cache entries.
 34+ Note that this feature should reduce parser cache fragmentation when enabled.
3035
3136 === New features in 1.18 ===
3237 * Added a special page, disabled by default, that allows users with the

Sign-offs

UserFlagDate
Hasharinspected07:49, 24 March 2011

Follow-up revisions

RevisionCommit summaryAuthorDate
r79528Followup r79520, some characters inside the encoded data were being modified ...dantman21:04, 3 January 2011
r79558Followup r79520 and r79528, one of the args was missing when calling doEditSe...dantman04:20, 4 January 2011
r79568Partial revert of r79520, follow up to r79558. Unconditionally use the new fo...platonides11:31, 4 January 2011
r79714Prepare for r79568. This uses a different hash key for pages without edit links,...platonides10:51, 6 January 2011
r84661Better RELEASE-NOTES for r79520, also taking it's follow ups into account.dantman03:00, 24 March 2011

Comments

#Comment by Raymond (talk | contribs)   20:54, 3 January 2011

Seen on Translatewiki:

 PHP Catchable fatal error: Argument 1 passed to Linker::doEditSectionLink() must be an instance of Title, null given in /www/w/includes/Linker.php on line 1383
#Comment by Dantman (talk | contribs)   21:05, 3 January 2011

Should be fixed now. The parser was modifying the data inside the marker causing that bug, switched to outputting the data in a way that shouldn't be touched by the parser.

#Comment by Platonides (talk | contribs)   20:58, 3 January 2011

I'm not convinced by the $wgUseEditSectionTokens. I think it could be made with a nicer fallback.

#Comment by OverlordQ (talk | contribs)   01:08, 4 January 2011

Dont forget to fix the parser tests you broke.

#Comment by Platonides (talk | contribs)   11:32, 4 January 2011

Should we revert now r70940 ?

#Comment by Happy-melon (talk | contribs)   15:00, 5 January 2011

I really really dislike embedding this data in the wikitext output. Can't we store it as a variable in ParserOutput, keyed to the id of the section?

#Comment by Catrope (talk | contribs)   15:05, 5 January 2011

Sounds good. You'd still have to insert the UNIQ string though, except it'd just contain the key into the data array.

#Comment by Happy-melon (talk | contribs)   15:09, 5 January 2011

We go to so much trouble generating a unique HTML id for each section header for the ToC, why not just use that?

#Comment by Catrope (talk | contribs)   15:13, 5 January 2011

Because it might occur in other locations in the wikitext too? In this case, it's actually pretty likely to (if it's pure ASCII and doesn't contain spaces). UNIQ-blahblah-QINU markers, however, are so crazy and random that they're practically never gonna produce collisions with actual content.

#Comment by Dantman (talk | contribs)   16:06, 5 January 2011

Even worse, the LanguageConverter completely screws everything over. It converts UNIQ to things like УНИQ and completely breaks the edit section tokens. And it expects certain parts of the edit token to be language transcoded, while expecting other parts to be left untouched. And the only solution seams to be to encode things in a <editsection page="Title" section="1">Text</editsection> format so that the LanguageConverter will properly convert "Text" but will not touch the page title, section hint, and actual editsection markers.

#Comment by Dantman (talk | contribs)   16:13, 5 January 2011

And just since someone is probably going to make a comment on it.

Tags are already escaped at this point in time, so using <editsection> in the output doesn't cause any issue with user content, '<editsection page="Title" section="1">Text</editsection>' in an article will not be turned into an arbitrary editsection token unless you actually create a tag hook to do that for you. It's even safer than the UNIQ tokens which have the (even though it's insanely remote) chance that the random part of the marker might be a match if a user tries to insert some arbitrary text in the same format as the UNIQ tokens.

#Comment by Happy-melon (talk | contribs)   16:34, 5 January 2011

If the LanguageConverter is hacking at HTML IDs, that's a bug; surely that's going to break css all over the place? But the best way to 'protect' this content from it is not to include it in page text at all. It's not unreasonable for the Converter to assume that anything in the text output is fair game for conversion; the problem is arising from inserting material into the text which is not text.

#Comment by Dantman (talk | contribs)   16:59, 5 January 2011

No, it's hacking away at the UNIQ tags left in the page. The LanguageConverter has some intelligent handling for html tags, it translates content, and some attributes like title, but leaves tag names and the rest of the attributes alone.

The other issue is that we actually have to let the LanguageConverter do transcoding of some of the editsection, namely allowing it to transcode the headline hint. Otherwise we break parser tests.

#Comment by Happy-melon (talk | contribs)   17:16, 5 January 2011

Exactly, if you didn't use UNIQ tags to embed this stuff, you wouldn't be fighting with the Converter. You're trying to reuse various parser structures in a novel context, and finding that other elements of the parser are unhappy because they don't expect those elements to be there at the point when they run. I'm saying you shouldn't be altering the text output at all, you should be looking at what's in the output anyway that you can use to find sections (eg the HTML IDs), and store the necessary data elsewhere in ParserOutput, where other bits of the parser ecosystem aren't going to attack it. Surely you can call the LanguageConverter separately on the bits of text you need converted?

Ideally, something like this would be implemented entirely with client-side JS; failing that, a server-side postprocessing step. Basically, you're correct to want it out of the parser text output. Why not take it all the way out?

#Comment by PhiLiP (talk | contribs)   16:58, 5 January 2011

LanguageConverter avoid converting UNIQ stuffs and HTML attrbutes with some regular expressions in autoConvert(). It's not the best solution but still effective.

#Comment by Catrope (talk | contribs)   15:03, 5 January 2011

Instead of using $wgUseEditSectionTokens, you should just bump the parser version. That'll cause any cached ParserOutputs generated by the old version to automatically be considered invalid.

#Comment by Platonides (talk | contribs)   21:58, 5 January 2011

I already killed $wgUseEditSectionTokens in r79568

#Comment by Hashar (talk | contribs)   02:18, 18 March 2011

Can you please rephrase the RELEASE NOTE entry. It is a five lines sentence!

#Comment by Dantman (talk | contribs)   03:01, 24 March 2011

Ok, done.

#Comment by Werdna (talk | contribs)   23:25, 18 July 2011

I've done something similar to this before, for embedding dynamic LiquidThreads pages into other pages with wiki markup.

Given how that went for me, I'd recommend the following:

  • Use an HTML comment with a prefix and a random hexadecimal string instead of parser strip markers. This won't be touched by the language converter.
  • Generate the section edit links at parse time instead of at replace time, and store them in the Parser Output as a straight associative array of string to string replacements. Then, in OutputPage::addHTML or thereabouts, figure out whether you're going to show them or not, and either do the replacement or strip them out.
  • There's no need to purge the parser cache en-masse. I'm not quite sure how you'd factor it, but if the appropriate entry is available in the parser cache with section edit links substituted in, then in theory there's no need to regenerate the parser output with them optional if the user wants section edit links.
#Comment by Dantman (talk | contribs)   05:49, 19 July 2011
  • Editsection tokens DO need to be partially touched by the LanguageConverter, if specific parts aren't touched by the LC we introduce a regression
  • We don't want to generate edit section links at parse time because edit section links are rendered in the user's lang not the content lang, and we don't want to re-render the same page for each different user lang just because a piece of the ui we could defer has to be re-rendered.
  • We replace stuff inside ParserOutput::getText because it's the common api, we're just trying to defer editsection link rendering, the OutputPage::addHTML area is only one of the possible places we may grab the ParserOutput's text, if we only do the replacements there we'll introduce a regression where we need to make the same code refactoring to other places, and potentially even inside extension code.
  • Stuff to do with the parser cache has already been handled in followup commits.
#Comment by Platonides (talk | contribs)   18:34, 19 July 2011

The parser cache mass-purge was fixed by r79568

Status & tagging log