r90526 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r90525‎ | r90526 | r90527 >
Date:14:08, 21 June 2011
Author:kbrown
Status:deferred (Comments)
Tags:
Comment:
add hook and function for rewriting links external links to have a link to the archived version
Modified paths:
  • /trunk/extensions/ArchiveLinks/ArchiveLinks.php (modified) (history)

Diff [purge]

Index: trunk/extensions/ArchiveLinks/ArchiveLinks.php
@@ -45,10 +45,16 @@
4646
4747 //$wgHooks['LinkerMakeExternalLink'][] = 'getExternalLinks';
4848 //$wgHooks['EditPage::attemptSave'][] = 'getExternalLinks';
49 -$wgHooks['ArticleSaveComplete'][] = 'ArchiveLinks::getExternalLinks'; #We want to use this hook in production
5049
 50+$wgHooks['ArticleSaveComplete'][] = 'ArchiveLinks::queueExternalLinks';
 51+$wgHooks['LinkerMakeExternalLink'][] = 'ArchiveLinks::rewriteLinks';
 52+
 53+$wgArchiveService = 'wikiwix';
 54+$wgUseMultipleArchives = false;
 55+$wgWhatToCallArchive = '[cache]';
 56+
5157 class ArchiveLinks {
52 - public static function getExternalLinks ( &$article ) {
 58+ public static function queueExternalLinks ( &$article ) {
5359 global $wgParser;
5460 $external_links = $wgParser->getOutput();
5561 $external_links = $external_links->mExternalLinks;
@@ -59,7 +65,7 @@
6066 $db_slave = wfGetDB( DB_SLAVE );
6167 $db_result = array();
6268
63 - //$db_master->begin();
 69+ $db_master->begin();
6470
6571 foreach ( $external_links as $link => $unused_value ) {
6672 //$db_result['resource'] = $db_slave->select( 'el_archive_resource', '*', '`el_archive_resource`.`resource_url` = "' . $db_slave->strencode( $link ) . '"');
@@ -76,7 +82,7 @@
7783 $db_master->insert( 'el_archive_queue', array (
7884 'page_id' => $article->getID(),
7985 'url' => $link,
80 - //'delay_time' => '',
 86+ 'delay_time' => '0',
8187 'insertion_time' => time(),
8288 'in_progress' => '0',
8389 ));
@@ -94,10 +100,47 @@
95101 //$db_master->insert('el_archive_queue', $array );
96102 }
97103
98 - //$db_master->commit();
 104+ $db_master->commit();
99105
100106 return true;
101107 }
 108+
 109+ public static function rewriteLinks ( &$url, &$text, &$link, &$attributes ) {
 110+ if ( array_key_exists('rel', $attributes) && $attributes['rel'] === 'nofollow' ) {
 111+ global $wgArchiveService;
 112+ global $wgUseMultipleArchives;
 113+ global $wgWhatToCallArchive;
 114+ if ( $wgUseMultipleArchives ) {
 115+ //add support for more than one archival service at once
 116+ // (a page where you can select more than one)
 117+ } else {
 118+ switch ( $wgArchiveService ) {
 119+ case 'local':
 120+ //We need to have something to figure out where the filestore is...
 121+ $link_to_archive = urlencode( substr_replace( $url, '', 0, 7 ) );
 122+ break;
 123+ case 'wikiwix':
 124+ $link_to_archive = 'http://archive.wikiwix.org/cache/?url=' . $link;
 125+ break;
 126+ case 'internet_archive':
 127+ $link_to_archive = 'http://wayback.archive.org/web/*/' . $link;
 128+ break;
 129+ case 'webcitation':
 130+ $link_to_archive = 'http://webcitation.org/query?url=' . $link;
 131+ break;
 132+ }
 133+ }
 134+ $link = "<a rel=\"nofollow\" class=\"{$attributes['class']}\" href=\"{$url}\">{$text}</a>&nbsp;<sup><small><a href=\""
 135+ . $link_to_archive . "\">{$wgWhatToCallArchive}</a></small></sup>&nbsp;";
 136+ return false;
 137+ } else {
 138+ return true;
 139+ }
 140+ }
 141+
 142+ /*function retrieveLinks ( ) {
 143+
 144+ }*/
102145
103146 /*function queueURL ( $url, &$db_master ) {
104147
@@ -115,17 +158,19 @@
116159 'bl_reason' => 'test'
117160 ));*/
118161
119 - $db_slave = wfGetDB( DB_SLAVE );
 162+ //$db_slave = wfGetDB( DB_SLAVE );
120163
121164 /*$db_result = $db_slave->select( 'el_archive_blacklist', '*',
122165 '`el_archive_blacklist`.`bl_url` = "' . $db_slave->strencode( 'http://example.com' ) . '"');
123166 */
124 - $db_result['queue'] = $db_slave->select( 'el_archive_queue', '*', '`el_archive_queue`.`url` = "' . $db_slave->strencode( 'http://example.com' ) . '"' );
 167+ //$db_result['queue'] = $db_slave->select( 'el_archive_queue', '*', '`el_archive_queue`.`url` = "' . $db_slave->strencode( 'http://example.com' ) . '"' );
125168
126 - file_put_contents ( './extensions/ArchiveLinks/stuff.txt', var_export( $db_result['queue']->numRows() , TRUE ));
 169+ //file_put_contents ( './extensions/ArchiveLinks/stuff.txt', var_export( $db_result['queue']->numRows() , TRUE ));
127170 //$add_quotes = 'http://example.com';
128171 //file_put_contents ( './extensions/ArchiveLinks/stuff.txt', var_export( $db_slave->addQuotes( $add_quotes ) , TRUE ));
129172
 173+
 174+
130175 return false;
131176 }
132177 /*

Follow-up revisions

RevisionCommit summaryAuthorDate
r90691Fix problems in r90526. Started adding proper i18n support as well as replace...kbrown06:11, 24 June 2011

Comments

#Comment by NeilK (talk | contribs)   18:20, 23 June 2011

1) don't use a global for $wgWhatToCallArchive -- what you want to do is create your own messages file and then the translatewiki people will pick that up and add the translations for you.

- create a ArchiveLinks.i18n.php file (look at others for what they're supposed to look like... just start the english section, others will add more) - in ArchiveLinks.php, add your .i18n.php to $wgExtensionMessagesFiles['ArchiveLinks'] - use wfMsg() to get the string into your HTML

More info here - http://www.mediawiki.org/wiki/Localisation

Hopefully, this is the one and only thing you have to localize, though.

2) Some people prefer it if you use the includes/Html.php functions to create all your HTML. Less chance of someone sneaking evil HTML into your code. It can be super-verbose. You'll have an easier time with production review if you do it that way.

3) You've added two config globals so far. Not a serious problem, but try to avoid adding globals... one way is to have a global like $wgArchiveLinksConfig that is an array of values, you can add anything into that.

#Comment by Bawolff (talk | contribs)   18:36, 23 June 2011

btw, I believe we're trying to avoid using named entities for html5+xml compatibility, and prefer &#160; instead.

#Comment by NeilK (talk | contribs)   18:41, 23 June 2011

That doesn't make sense to me. HTML5 has a vastly *expanded* set of named entities, including of course  .

  http://www.w3.org/TR/html5/named-character-references.html

And as for XML, nothing in MediaWiki markup makes sense if you don't have an HTML doctype defined, which would also pull in all the named entities.

#Comment by Brion VIBBER (talk | contribs)   18:46, 23 June 2011

HTML5's <!DOCTYPE html> doesn't include a DTD reference, which leaves XML parsers perfectly able to parse the element structures (if kept in well-formed XML compatibility formats), but unable to parse the named character references other than lt/gt/quot/apos which are defined as part of XML spec.

#Comment by NeilK (talk | contribs)   18:42, 23 June 2011

That blank space in my other comment is of course &nbsp;

#Comment by Bawolff (talk | contribs)   18:44, 23 June 2011

See r67090

Status & tagging log