r52393 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r52392‎ | r52393 | r52394 >
Date:11:05, 25 June 2009
Author:catrope
Status:deferred (Comments)
Tags:
Comment:
Core changes for NavigableTOC extension:
* Always generate the section tree, even when we're not generating a TOC
* Add Parser::mergeSectionTrees() to merge two section trees into one
* Add Linker::generateTOC() to generate the HTML for a TOC from a section tree, and add the section anchor to the section tree to facilitate this. This adds the ability to generate TOCs in extensions; haven't converted Parser.php to use it (yet?). As a side effect, this fixes API bug 18720
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/Linker.php (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/parser/Parser.php
@@ -3489,64 +3489,61 @@
34903490 }
34913491 $level = $matches[1][$headlineCount];
34923492
3493 - if( $doNumberHeadings || $enoughToc ) {
3494 -
3495 - if ( $level > $prevlevel ) {
3496 - # Increase TOC level
3497 - $toclevel++;
3498 - $sublevelCount[$toclevel] = 0;
3499 - if( $toclevel<$wgMaxTocLevel ) {
3500 - $prevtoclevel = $toclevel;
3501 - $toc .= $sk->tocIndent();
3502 - $numVisible++;
3503 - }
 3493+ if ( $level > $prevlevel ) {
 3494+ # Increase TOC level
 3495+ $toclevel++;
 3496+ $sublevelCount[$toclevel] = 0;
 3497+ if( $toclevel<$wgMaxTocLevel ) {
 3498+ $prevtoclevel = $toclevel;
 3499+ $toc .= $sk->tocIndent();
 3500+ $numVisible++;
35043501 }
3505 - elseif ( $level < $prevlevel && $toclevel > 1 ) {
3506 - # Decrease TOC level, find level to jump to
 3502+ }
 3503+ elseif ( $level < $prevlevel && $toclevel > 1 ) {
 3504+ # Decrease TOC level, find level to jump to
35073505
3508 - for ($i = $toclevel; $i > 0; $i--) {
3509 - if ( $levelCount[$i] == $level ) {
3510 - # Found last matching level
3511 - $toclevel = $i;
3512 - break;
3513 - }
3514 - elseif ( $levelCount[$i] < $level ) {
3515 - # Found first matching level below current level
3516 - $toclevel = $i + 1;
3517 - break;
3518 - }
 3506+ for ($i = $toclevel; $i > 0; $i--) {
 3507+ if ( $levelCount[$i] == $level ) {
 3508+ # Found last matching level
 3509+ $toclevel = $i;
 3510+ break;
35193511 }
3520 - if( $i == 0 ) $toclevel = 1;
3521 - if( $toclevel<$wgMaxTocLevel ) {
3522 - if($prevtoclevel < $wgMaxTocLevel) {
3523 - # Unindent only if the previous toc level was shown :p
3524 - $toc .= $sk->tocUnindent( $prevtoclevel - $toclevel );
3525 - $prevtoclevel = $toclevel;
3526 - } else {
3527 - $toc .= $sk->tocLineEnd();
3528 - }
 3512+ elseif ( $levelCount[$i] < $level ) {
 3513+ # Found first matching level below current level
 3514+ $toclevel = $i + 1;
 3515+ break;
35293516 }
35303517 }
3531 - else {
3532 - # No change in level, end TOC line
3533 - if( $toclevel<$wgMaxTocLevel ) {
 3518+ if( $i == 0 ) $toclevel = 1;
 3519+ if( $toclevel<$wgMaxTocLevel ) {
 3520+ if($prevtoclevel < $wgMaxTocLevel) {
 3521+ # Unindent only if the previous toc level was shown :p
 3522+ $toc .= $sk->tocUnindent( $prevtoclevel - $toclevel );
 3523+ $prevtoclevel = $toclevel;
 3524+ } else {
35343525 $toc .= $sk->tocLineEnd();
35353526 }
35363527 }
 3528+ }
 3529+ else {
 3530+ # No change in level, end TOC line
 3531+ if( $toclevel<$wgMaxTocLevel ) {
 3532+ $toc .= $sk->tocLineEnd();
 3533+ }
 3534+ }
35373535
3538 - $levelCount[$toclevel] = $level;
 3536+ $levelCount[$toclevel] = $level;
35393537
3540 - # count number of headlines for each level
3541 - @$sublevelCount[$toclevel]++;
3542 - $dot = 0;
3543 - for( $i = 1; $i <= $toclevel; $i++ ) {
3544 - if( !empty( $sublevelCount[$i] ) ) {
3545 - if( $dot ) {
3546 - $numbering .= '.';
3547 - }
3548 - $numbering .= $wgContLang->formatNum( $sublevelCount[$i] );
3549 - $dot = 1;
 3538+ # count number of headlines for each level
 3539+ @$sublevelCount[$toclevel]++;
 3540+ $dot = 0;
 3541+ for( $i = 1; $i <= $toclevel; $i++ ) {
 3542+ if( !empty( $sublevelCount[$i] ) ) {
 3543+ if( $dot ) {
 3544+ $numbering .= '.';
35503545 }
 3546+ $numbering .= $wgContLang->formatNum( $sublevelCount[$i] );
 3547+ $dot = 1;
35513548 }
35523549 }
35533550
@@ -3648,28 +3645,31 @@
36493646 if( $enoughToc && ( !isset($wgMaxTocLevel) || $toclevel<$wgMaxTocLevel ) ) {
36503647 $toc .= $sk->tocLine($anchor, $tocline,
36513648 $numbering, $toclevel, ($isTemplate ? false : $sectionIndex));
3652 -
3653 - # Find the DOM node for this header
3654 - while ( $node && !$isTemplate ) {
3655 - if ( $node->getName() === 'h' ) {
3656 - $bits = $node->splitHeading();
3657 - if ( $bits['i'] == $sectionIndex )
3658 - break;
3659 - }
3660 - $byteOffset += mb_strlen( $this->mStripState->unstripBoth(
3661 - $frame->expand( $node, PPFrame::RECOVER_ORIG ) ) );
3662 - $node = $node->getNextSibling();
 3649+ }
 3650+
 3651+ # Add the section to the section tree
 3652+ # Find the DOM node for this header
 3653+ while ( $node && !$isTemplate ) {
 3654+ if ( $node->getName() === 'h' ) {
 3655+ $bits = $node->splitHeading();
 3656+ if ( $bits['i'] == $sectionIndex )
 3657+ break;
36633658 }
3664 - $tocraw[] = array(
3665 - 'toclevel' => $toclevel,
3666 - 'level' => $level,
3667 - 'line' => $tocline,
3668 - 'number' => $numbering,
3669 - 'index' => $sectionIndex,
3670 - 'fromtitle' => $titleText,
3671 - 'byteoffset' => ( $isTemplate ? null : $byteOffset ),
3672 - );
 3659+ $byteOffset += mb_strlen( $this->mStripState->unstripBoth(
 3660+ $frame->expand( $node, PPFrame::RECOVER_ORIG ) ) );
 3661+ $node = $node->getNextSibling();
36733662 }
 3663+ $tocraw[] = array(
 3664+ 'toclevel' => $toclevel,
 3665+ 'level' => $level,
 3666+ 'line' => $tocline,
 3667+ 'number' => $numbering,
 3668+ 'index' => ($isTemplate ? 'T-' : '' ) . $sectionIndex,
 3669+ 'fromtitle' => $titleText,
 3670+ 'byteoffset' => ( $isTemplate ? null : $byteOffset ),
 3671+ 'anchor' => $anchor,
 3672+ );
 3673+
36743674 # give headline the correct <h#> tag
36753675 if( $showEditLink && $sectionIndex !== false ) {
36763676 if( $isTemplate ) {
@@ -3741,6 +3741,96 @@
37423742 }
37433743
37443744 /**
 3745+ * Merge $tree2 into $tree1 by replacing the section with index
 3746+ * $section in $tree1 and its descendants with the sections in $tree2.
 3747+ * Note that in the returned section tree, only the 'index' and
 3748+ * 'byteoffset' fields are guaranteed to be correct.
 3749+ * @param $tree1 array Section tree from ParserOutput::getSectons()
 3750+ * @param $tree2 array Section tree
 3751+ * @param $section int Section index
 3752+ * @param $title Title Title both section trees come from
 3753+ * @param $len2 int Length of the original wikitext for $tree2
 3754+ * @return array Merged section tree
 3755+ */
 3756+ public static function mergeSectionTrees( $tree1, $tree2, $section, $title, $len2 ) {
 3757+ global $wgContLang;
 3758+ $newTree = array();
 3759+ $targetLevel = false;
 3760+ $merged = false;
 3761+ $lastLevel = 1;
 3762+ $nextIndex = 1;
 3763+ $numbering = array( 0 );
 3764+ $titletext = $title->getPrefixedDBkey();
 3765+ foreach ( $tree1 as $s ) {
 3766+ if ( $targetLevel !== false ) {
 3767+ if ( $s['level'] <= $targetLevel )
 3768+ // We've skipped enough
 3769+ $targetLevel = false;
 3770+ else
 3771+ continue;
 3772+ }
 3773+ if ( $s['index'] != $section ||
 3774+ $s['fromtitle'] != $titletext ) {
 3775+ self::incrementNumbering( $numbering,
 3776+ $s['toclevel'], $lastLevel );
 3777+
 3778+ // Rewrite index, byteoffset and number
 3779+ if ( $s['fromtitle'] == $titletext ) {
 3780+ $s['index'] = $nextIndex++;
 3781+ if ( $merged )
 3782+ $s['byteoffset'] += $len2;
 3783+ }
 3784+ $s['number'] = implode( '.', array_map(
 3785+ array( $wgContLang, 'formatnum' ),
 3786+ $numbering ) );
 3787+ $lastLevel = $s['toclevel'];
 3788+ $newTree[] = $s;
 3789+ } else {
 3790+ // We're at $section
 3791+ // Insert sections from $tree2 here
 3792+ foreach ( $tree2 as $s2 ) {
 3793+ // Rewrite the fields in $s2
 3794+ // before inserting it
 3795+ $s2['toclevel'] += $s['toclevel'] - 1;
 3796+ $s2['level'] += $s['level'] - 1;
 3797+ $s2['index'] = $nextIndex++;
 3798+ $s2['byteoffset'] += $s['byteoffset'];
 3799+
 3800+ self::incrementNumbering( $numbering,
 3801+ $s2['toclevel'], $lastLevel );
 3802+ $s2['number'] = implode( '.', array_map(
 3803+ array( $wgContLang, 'formatnum' ),
 3804+ $numbering ) );
 3805+ $lastLevel = $s2['toclevel'];
 3806+ $newTree[] = $s2;
 3807+ }
 3808+ // Skip all descendants of $section in $tree1
 3809+ $targetLevel = $s['level'];
 3810+ $merged = true;
 3811+ }
 3812+ }
 3813+ return $newTree;
 3814+ }
 3815+
 3816+ /**
 3817+ * Increment a section number. Helper function for mergeSectionTrees()
 3818+ * @param $number array Array representing a section number
 3819+ * @param $level int Current TOC level (depth)
 3820+ * @param $lastLevel int Level of previous TOC entry
 3821+ */
 3822+ private static function incrementNumbering( &$number, $level, $lastLevel ) {
 3823+ if ( $level > $lastLevel )
 3824+ $number[$level - 1] = 1;
 3825+ else if ( $level < $lastLevel ) {
 3826+ foreach ( $number as $key => $unused )
 3827+ if ( $key >= $level )
 3828+ unset( $number[$key] );
 3829+ $number[$level - 1]++;
 3830+ } else
 3831+ $number[$level - 1]++;
 3832+ }
 3833+
 3834+ /**
37453835 * Transform wiki markup when saving a page by doing \r\n -> \n
37463836 * conversion, substitting signatures, {{subst:}} templates, etc.
37473837 *
Index: trunk/phase3/includes/Linker.php
@@ -1176,6 +1176,32 @@
11771177 . ' } '
11781178 . "</script>\n";
11791179 }
 1180+
 1181+ /**
 1182+ * Generate a table of contents from a section tree
 1183+ * @param $tree Return value of ParserOutput::getSections()
 1184+ * @return string HTML
 1185+ */
 1186+ public function generateTOC( $tree ) {
 1187+ $toc = '';
 1188+ $lastLevel = 0;
 1189+ foreach ( $tree as $section ) {
 1190+ if ( $section['toclevel'] > $lastLevel )
 1191+ $toc .= $this->tocIndent();
 1192+ else if ( $secton['toclevel'] < $lastLevel )
 1193+ $toc .= $this->tocUnindent(
 1194+ $lastLevel - $section['toclevel'] );
 1195+ else
 1196+ $toc .= $this->tocLineEnd();
 1197+
 1198+ $toc .= $this->tocLine( $section['anchor'],
 1199+ $section['line'], $section['number'],
 1200+ $section['toclevel'], $section['index'] );
 1201+ $lastLevel = $section['toclevel'];
 1202+ }
 1203+ $toc .= $this->tocLineEnd();
 1204+ return $this->tocList( $toc );
 1205+ }
11801206
11811207 /**
11821208 * Create a section edit link. This supersedes editSectionLink() and
Index: trunk/phase3/RELEASE-NOTES
@@ -237,6 +237,7 @@
238238 * (bug 19313) action=rollback returns wrong revid on master/slave setups
239239 * (bug 19323) action=parse doesn't return section tree on pages with Cite
240240 warnings
 241+* (bug 18720) Add anchor field to action=parse&prop=sections output
241242
242243 === Languages updated in 1.16 ===
243244

Comments

#Comment by Tim Starling (talk | contribs)   12:46, 26 August 2010

What is byteoffset meant to be? It seems to be trying to count wikitext, but it fails because RECOVER_ORIG does not disable extension tag processing, and expansion is done in HTML mode. So if there are any extension tags in the text before a section, the byteoffset will be wrong.

#Comment by Catrope (talk | contribs)   16:31, 16 October 2010

byteoffset is supposed to count the number of bytes of wikitext preceding the start of the section, yes. So how should I do it right?

#Comment by Tim Starling (talk | contribs)   01:38, 18 October 2010

One possible solution would be to fix RECOVER_ORIG so it works properly. But there's a potential performance issue here, since calling expand() on every node in the document could be quite slow, so there may be a case for changing the architecture more aggressively, say by having the preprocessor store source byte offsets in header nodes. I'd have to look at the applications.

#Comment by MarkAHershberger (talk | contribs)   20:52, 2 February 2011

Moved this to bug 27118

#Comment by Platonides (talk | contribs)   22:20, 15 October 2010

Parser.php changes reverted in r69586

#Comment by Tim Starling (talk | contribs)   01:39, 18 October 2010

No, the new functions were removed, but the patch to formatHeadings() remains.

Status & tagging log