r44919 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r44918‎ | r44919 | r44920 >
Date:23:38, 22 December 2008
Author:brion
Status:ok
Tags:
Comment:
* (bug 2585) HTTP 404 return code is now given for a page view if the page
does not exist, allowing spiders and link checkers to detect broken links.

This is less expansive than the old 2005 implementation (r11307), hitting
only page views (won't affect action=edit) and doesn't attempt to cover
error conditions either (many of which should probably return a different code).

Pages which exist in the DB or return true for Title::isAlwaysKnown() such
as file pages for existing files, as well as category pages that exist, are
treated as existing by returning true for Article::hasViewableContent().
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/Article.php (modified) (history)
  • /trunk/phase3/includes/CategoryPage.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/CategoryPage.php
@@ -36,6 +36,19 @@
3737 $this->closeShowCategory();
3838 }
3939 }
 40+
 41+ /**
 42+ * Don't return a 404 for categories in use.
 43+ */
 44+ function hasViewableContent() {
 45+ if( parent::hasViewableContent() ) {
 46+ return true;
 47+ } else {
 48+ $cat = Category::newFromTitle( $this->mTitle );
 49+ return $cat->getId() != 0;
 50+ }
 51+
 52+ }
4053
4154 function openShowCategory() {
4255 # For overloading
Index: trunk/phase3/includes/Article.php
@@ -508,6 +508,18 @@
509509 public function exists() {
510510 return $this->getId() > 0;
511511 }
 512+
 513+ /**
 514+ * Check if this page is something we're going to be showing
 515+ * some sort of sensible content for. If we return false, page
 516+ * views (plain action=view) will return an HTTP 404 response,
 517+ * so spiders and robots can know they're following a bad link.
 518+ *
 519+ * @return bool
 520+ */
 521+ public function hasViewableContent() {
 522+ return $this->exists() || $this->mTitle->isAlwaysKnown();
 523+ }
512524
513525 /**
514526 * @return int The view count for the page
@@ -714,6 +726,7 @@
715727 $rdfrom = $wgRequest->getVal( 'rdfrom' );
716728 $diffOnly = $wgRequest->getBool( 'diffonly', $wgUser->getOption( 'diffonly' ) );
717729 $purge = $wgRequest->getVal( 'action' ) == 'purge';
 730+ $return404 = false;
718731
719732 $wgOut->setArticleFlag( true );
720733
@@ -813,12 +826,22 @@
814827 $text = wfMsg( 'noarticletext' );
815828 }
816829 }
 830+
817831 # Non-existent pages
818832 if( $this->getID() === 0 ) {
819833 $wgOut->setRobotPolicy( 'noindex,nofollow' );
820834 $text = "<div class='noarticletext'>\n$text\n</div>";
 835+ if( !$this->hasViewableContent() ) {
 836+ // If there's no backing content, send a 404 Not Found
 837+ // for better machine handling of broken links.
 838+ $return404 = true;
 839+ }
821840 }
822841
 842+ if( $return404 ) {
 843+ $wgRequest->response()->header( "HTTP/1.x 404 Not Found" );
 844+ }
 845+
823846 # Another whitelist check in case oldid is altering the title
824847 if( !$this->mTitle->userCanRead() ) {
825848 $wgOut->loginToUse();
Index: trunk/phase3/RELEASE-NOTES
@@ -241,8 +241,9 @@
242242 * (bug 16760) Add CSS-class to action links of Special:Log
243243 * (bug 505) Time zones can now be specified by location in user preferences,
244244 avoiding the need to manually update for DST. Patch by Brad Jorsch.
 245+* (bug 2585) HTTP 404 return code is now given for a page view if the page
 246+ does not exist, allowing spiders and link checkers to detect broken links.
245247
246 -
247248 === Bug fixes in 1.14 ===
248249
249250 * (bug 14907) DatabasePostgres::fieldType now defined.

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r11307(bug 2585) Return proper 404 code when pages don't existtomgilder03:12, 12 October 2005

Status & tagging log