r46078 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r46077‎ | r46078 | r46079 >
Date:18:03, 23 January 2009
Author:simetrical
Status:ok (Comments)
Tags:todo 
Comment:
Allow exempting domain names from rel="nofollow"

This introduces a new configuration option, $wgNoFollowDomainExceptions.
By default this is an empty array; perhaps it should be null by default
and initialize to something extracted from $wgServer. An appropriate
value for Wikimedia would be something like:

$wgNoFollowDomainExceptions = array( 'wikipedia.org', 'wiktionary.org',
'wikibooks.org', ... );

It's fairly silly that we're nofollowing links to our own sites. :)
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/DefaultSettings.php (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/parser/Parser.php
@@ -1130,7 +1130,7 @@
11311131 if ( $text === false ) {
11321132 # Not an image, make a link
11331133 $text = $sk->makeExternalLink( $url, $wgContLang->markNoConversion($url), true, 'free',
1134 - $this->getExternalLinkAttribs() );
 1134+ $this->getExternalLinkAttribs( $url ) );
11351135 # Register it in the output object...
11361136 # Replace unnecessary URL escape codes with their equivalent characters
11371137 $pasteurized = self::replaceUnusualEscapes( $url );
@@ -1410,8 +1410,8 @@
14111411 # This means that users can paste URLs directly into the text
14121412 # Funny characters like ö aren't valid in URLs anyway
14131413 # This was changed in August 2004
1414 - $s .= $sk->makeExternalLink( $url, $text, false, $linktype, $this->getExternalLinkAttribs() )
1415 - . $dtrail . $trail;
 1414+ $s .= $sk->makeExternalLink( $url, $text, false, $linktype,
 1415+ $this->getExternalLinkAttribs( $url ) ) . $dtrail . $trail;
14161416
14171417 # Register link in the output object.
14181418 # Replace unnecessary URL escape codes with the referenced character
@@ -1424,12 +1424,36 @@
14251425 return $s;
14261426 }
14271427
1428 - function getExternalLinkAttribs() {
 1428+ /**
 1429+ * Get an associative array of additional HTML attributes appropriate for a
 1430+ * particular external link. This currently may include rel => nofollow
 1431+ * (depending on configuration, namespace, and the URL's domain) and/or a
 1432+ * target attribute (depending on configuration).
 1433+ *
 1434+ * @param string $url Optional URL, to extract the domain from for rel =>
 1435+ * nofollow if appropriate
 1436+ * @return array Associative array of HTML attributes
 1437+ */
 1438+ function getExternalLinkAttribs( $url = false ) {
14291439 $attribs = array();
14301440 global $wgNoFollowLinks, $wgNoFollowNsExceptions;
14311441 $ns = $this->mTitle->getNamespace();
14321442 if( $wgNoFollowLinks && !in_array($ns, $wgNoFollowNsExceptions) ) {
14331443 $attribs['rel'] = 'nofollow';
 1444+
 1445+ global $wgNoFollowDomainExceptions;
 1446+ if ( $wgNoFollowDomainExceptions ) {
 1447+ $bits = wfParseUrl( $url );
 1448+ if ( is_array( $bits ) && isset( $bits['host'] ) ) {
 1449+ foreach ( $wgNoFollowDomainExceptions as $domain ) {
 1450+ if( substr( $bits['host'], -strlen( $domain ) )
 1451+ == $domain ) {
 1452+ unset( $attribs['rel'] );
 1453+ break;
 1454+ }
 1455+ }
 1456+ }
 1457+ }
14341458 }
14351459 if ( $this->mOptions->getExternalLinkTarget() ) {
14361460 $attribs['target'] = $this->mOptions->getExternalLinkTarget();
Index: trunk/phase3/includes/DefaultSettings.php
@@ -3085,6 +3085,19 @@
30863086 $wgNoFollowNsExceptions = array();
30873087
30883088 /**
 3089+ * If this is set to an array of domains, external links to these domain names
 3090+ * (or any subdomains) will not be set to rel="nofollow" regardless of the
 3091+ * value of $wgNoFollowLinks. For instance:
 3092+ *
 3093+ * $wgNoFollowDomainExceptions = array( 'en.wikipedia.org', 'wiktionary.org' );
 3094+ *
 3095+ * This would add rel="nofollow" to links to de.wikipedia.org, but not
 3096+ * en.wikipedia.org, wiktionary.org, en.wiktionary.org, us.en.wikipedia.org,
 3097+ * etc.
 3098+ */
 3099+$wgNoFollowDomainExceptions = array();
 3100+
 3101+/**
30893102 * Default robot policy. The default policy is to encourage indexing and fol-
30903103 * lowing of links. It may be overridden on a per-namespace and/or per-page
30913104 * basis.
Index: trunk/phase3/RELEASE-NOTES
@@ -23,6 +23,8 @@
2424 * Added $wgNewPasswordExpiry, to specify an expiry time (in seconds) to
2525 temporary passwords
2626 * Added $wgUseTwoButtonsSearchForm to choose the Search form behavior/look
 27+* Added $wgNoFollowDomainExceptions to allow exempting particular domain names
 28+ from rel="nofollow" on external links
2729
2830 === New features in 1.15 ===
2931

Comments

#Comment by Brion VIBBER (talk | contribs)   22:36, 26 January 2009

This isn't very flexible; it seems to need a complete in-memory list of domains. Perhaps better to allow a hook to pass particular URLs or domain names through for checking, which might provide a fancier system?

#Comment by Simetrical (talk | contribs)   01:37, 27 January 2009

I intended this to be a basic "let's not block our own sites here" kind of thing, which it does well for the overwhelming majority of users (e.g., Wikimedia). I guess you're thinking about something more like "let's have sysops maintain a giant list of thousands of sites known not to be spammy" or "let's use some complicated heuristic to figure out whether links to a given site tend to stay in legitimate large articles" or something like that. That might be a good idea, but it's not what I meant to fix and I'd consider it a separate (and considerably more complicated) feature.

#Comment by Brion VIBBER (talk | contribs)   22:29, 27 January 2009

Should be cleaned up with a sensible interface for isolated URL lookups and a one-line hook point which a fancier system could hook into.

Status & tagging log