r85911 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r85910‎ | r85911 | r85912 >
Date:19:25, 12 April 2011
Author:btongminh
Status:ok (Comments)
Tags:todo 
Comment:
Add support for importing/exporting files. This can be done by embedding the image as base64 in the XML stream or by copying the images directory manually and pointing the importer to the base images directory.
Currently only backend code available and a few member variables need to be modified to enable the functionality.

Export.php:
* Add <rel> and <sha1base36> elememnts to the XML output
* Add optional <archivename> and <contents> elements to the XML output. <contents> contains an encoding attribute, which is currently only set to base64.
Import.php:
* Add Import::$mImageBasePath which should point to the images/ directory to import from
* Add methods to WikiRevision (terrible name btw) to set the rel, hash, archivename and filesrc.
* Cleanup and made WikiRevision::importUpload working. It's still quite a mess though
OldLocalFiel.php:
* Fix a few timestamp related things from r85635
Modified paths:
  • /trunk/phase3/includes/Export.php (modified) (history)
  • /trunk/phase3/includes/Import.php (modified) (history)
  • /trunk/phase3/includes/filerepo/OldLocalFile.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/filerepo/OldLocalFile.php
@@ -227,12 +227,14 @@
228228 * @param $archiveName string Full archive name of the file, in the form
229229 * $timestamp!$filename, where $filename must match $this->getName()
230230 */
231 - function uploadOld( $srcPath, $archiveName, $comment, $user ) {
 231+ function uploadOld( $srcPath, $archiveName, $timestamp, $comment, $user, $flags = 0 ) {
232232 $this->lock();
233 - $status = $this->publish( $srcPath, $flags, $archiveName );
 233+ $status = $this->publish( $srcPath,
 234+ $flags & File::DELETE_SOURCE ? FileRepo::DELETE_SOURCE : 0,
 235+ $archiveName );
234236
235237 if ( $status->isGood() ) {
236 - if ( !$this->recordOldUpload( $srcPath, $archiveName, $comment, $user ) ) {
 238+ if ( !$this->recordOldUpload( $srcPath, $archiveName, $timestamp, $comment, $user ) ) {
237239 $status->fatal( 'filenotfound', $srcPath );
238240 }
239241 }
@@ -251,7 +253,7 @@
252254 * @param $user User User who did this upload
253255 * @return bool
254256 */
255 - function recordOldUpload( $srcPath, $archiveName, $comment, $user ) {
 257+ function recordOldUpload( $srcPath, $archiveName, $timestamp, $comment, $user ) {
256258 $dbw = $this->repo->getMasterDB();
257259 $dbw->begin();
258260
@@ -269,7 +271,7 @@
270272 'oi_width' => intval( $props['width'] ),
271273 'oi_height' => intval( $props['height'] ),
272274 'oi_bits' => $props['bits'],
273 - 'oi_timestamp' => $props['timestamp'],
 275+ 'oi_timestamp' => $dbw->timestamp( $timestamp ),
274276 'oi_description' => $comment,
275277 'oi_user' => $user->getId(),
276278 'oi_user_text' => $user->getName(),
Index: trunk/phase3/includes/Export.php
@@ -35,6 +35,7 @@
3636 var $author_list = "" ;
3737
3838 var $dumpUploads = false;
 39+ var $dumpUploadFileContents = false;
3940
4041 const FULL = 1;
4142 const CURRENT = 2;
@@ -318,7 +319,7 @@
319320 if ( isset( $last ) ) {
320321 $output = '';
321322 if ( $this->dumpUploads ) {
322 - $output .= $this->writer->writeUploads( $last );
 323+ $output .= $this->writer->writeUploads( $last, $this->dumpUploadFileContents );
323324 }
324325 $output .= $this->writer->closePage();
325326 $this->sink->writeClosePage( $output );
@@ -333,7 +334,7 @@
334335 if ( isset( $last ) ) {
335336 $output = '';
336337 if ( $this->dumpUploads ) {
337 - $output .= $this->writer->writeUploads( $last );
 338+ $output .= $this->writer->writeUploads( $last, $this->dumpUploadFileContents );
338339 }
339340 $output .= $this->author_list;
340341 $output .= $this->writer->closePage();
@@ -600,29 +601,48 @@
601602 /**
602603 * Warning! This data is potentially inconsistent. :(
603604 */
604 - function writeUploads( $row ) {
 605+ function writeUploads( $row, $dumpContents = false ) {
605606 if ( $row->page_namespace == NS_IMAGE ) {
606607 $img = wfFindFile( $row->page_title );
607608 if ( $img ) {
608609 $out = '';
609610 foreach ( array_reverse( $img->getHistory() ) as $ver ) {
610 - $out .= $this->writeUpload( $ver );
 611+ $out .= $this->writeUpload( $ver, $dumpContents );
611612 }
612 - $out .= $this->writeUpload( $img );
 613+ $out .= $this->writeUpload( $img, $dumpContents );
613614 return $out;
614615 }
615616 }
616617 return '';
617618 }
618619
619 - function writeUpload( $file ) {
 620+ function writeUpload( $file, $dumpContents = false ) {
 621+ if ( $file->isOld() ) {
 622+ $archiveName = " " .
 623+ Xml::element( 'archivename', null, $file->getArchiveName() ) . "\n";
 624+ } else {
 625+ $archiveName = '';
 626+ }
 627+ if ( $dumpContents ) {
 628+ # Dump file as base64
 629+ # Uses only XML-safe characters, so does not need escaping
 630+ $contents = ' <contents encoding="base64">' .
 631+ chunk_split( base64_encode( file_get_contents( $file->getPath() ) ) ) .
 632+ " </contents>\n";
 633+ } else {
 634+ $contents = '';
 635+ }
620636 return " <upload>\n" .
621637 $this->writeTimestamp( $file->getTimestamp() ) .
622638 $this->writeContributor( $file->getUser( 'id' ), $file->getUser( 'text' ) ) .
623639 " " . Xml::elementClean( 'comment', null, $file->getDescription() ) . "\n" .
624640 " " . Xml::element( 'filename', null, $file->getName() ) . "\n" .
 641+ $archiveName .
625642 " " . Xml::element( 'src', null, $file->getFullUrl() ) . "\n" .
626643 " " . Xml::element( 'size', null, $file->getSize() ) . "\n" .
 644+ " " . Xml::element( 'sha1base36', null, $file->getSha1() ) . "\n" .
 645+ " " . Xml::element( 'rel', null, $file->getRel() ) . "\n" .
 646+ $contents .
627647 " </upload>\n";
628648 }
629649
Index: trunk/phase3/includes/Import.php
@@ -35,6 +35,7 @@
3636 private $mLogItemCallback, $mUploadCallback, $mRevisionCallback, $mPageCallback;
3737 private $mSiteInfoCallback, $mTargetNamespace, $mPageOutCallback;
3838 private $mDebug;
 39+ private $mImportUploads, $mImageBasePath;
3940
4041 /**
4142 * Creates an ImportXMLReader drawing from the source provided
@@ -169,6 +170,13 @@
170171 return false;
171172 }
172173 }
 174+
 175+ /**
 176+ *
 177+ */
 178+ public function setImageBasePath( $dir ) {
 179+ $this->mImageBasePath = $dir;
 180+ }
173181
174182 /**
175183 * Default per-revision callback, performs the import.
@@ -192,9 +200,8 @@
193201 * Dummy for now...
194202 */
195203 public function importUpload( $revision ) {
196 - $revision->importUpload();
197 - //$dbw = wfGetDB( DB_MASTER );
198 - //return $dbw->deadlockLoop( array( $revision, 'importUpload' ) );
 204+ $dbw = wfGetDB( DB_MASTER );
 205+ return $dbw->deadlockLoop( array( $revision, 'importUpload' ) );
199206 return false;
200207 }
201208
@@ -582,7 +589,7 @@
583590 $uploadInfo = array();
584591
585592 $normalFields = array( 'timestamp', 'comment', 'filename', 'text',
586 - 'src', 'size' );
 593+ 'src', 'size', 'sha1base36', 'archivename', 'rel' );
587594
588595 $skip = false;
589596
@@ -601,26 +608,53 @@
602609 $uploadInfo[$tag] = $this->nodeContents();
603610 } elseif ( $tag == 'contributor' ) {
604611 $uploadInfo['contributor'] = $this->handleContributor();
 612+ } elseif ( $tag == 'contents' ) {
 613+ $contents = $this->nodeContents();
 614+ $encoding = $this->reader->getAttribute( 'encoding' );
 615+ if ( $encoding === 'base64' ) {
 616+ $uploadInfo['fileSrc'] = $this->dumpTemp( base64_decode( $contents ) );
 617+ }
605618 } elseif ( $tag != '#text' ) {
606619 $this->warn( "Unhandled upload XML tag $tag" );
607620 $skip = true;
608621 }
609622 }
 623+
 624+ if ( $this->mImageBasePath && isset( $uploadInfo['rel'] ) ) {
 625+ $path = "{$this->mImageBasePath}/{$uploadInfo['rel']}";
 626+ if ( file_exists( $path ) ) {
 627+ $uploadInfo['fileSrc'] = $path;
 628+ }
 629+ }
610630
611 - return $this->processUpload( $pageInfo, $uploadInfo );
 631+ if ( $this->mImportUploads ) {
 632+ return $this->processUpload( $pageInfo, $uploadInfo );
 633+ }
612634 }
 635+
 636+ private function dumpTemp( $contents ) {
 637+ $filename = tempnam( wfTempDir(), 'importupload' );
 638+ file_put_contents( $filename, $contents );
 639+ return $filename;
 640+ }
613641
614642
615643 private function processUpload( $pageInfo, $uploadInfo ) {
616644 $revision = new WikiRevision;
617 - $text = isset( $uploadInfo['text'] ) ? $uploadInfo['text'] : '';
 645+ $text = isset( $uploadInfo['text'] ) ? $uploadInfo['text'] : '';
618646
619647 $revision->setTitle( $pageInfo['_title'] );
620 - $revision->setID( $pageInfo['id'] );
 648+ $revision->setID( $pageInfo['id'] );
621649 $revision->setTimestamp( $uploadInfo['timestamp'] );
622 - $revision->setText( $text );
 650+ $revision->setText( $text );
623651 $revision->setFilename( $uploadInfo['filename'] );
 652+ if ( isset( $uploadInfo['archivename'] ) ) {
 653+ $revision->setArchiveName( $uploadInfo['archivename'] );
 654+ }
624655 $revision->setSrc( $uploadInfo['src'] );
 656+ if ( isset( $uploadInfo['fileSrc'] ) ) {
 657+ $revision->setFileSrc( $uploadInfo['fileSrc'] );
 658+ }
625659 $revision->setSize( intval( $uploadInfo['size'] ) );
626660 $revision->setComment( $uploadInfo['comment'] );
627661
@@ -631,7 +665,7 @@
632666 $revision->setUserName( $uploadInfo['contributor']['username'] );
633667 }
634668
635 - return call_user_func( $this->mUploadCallback, $revision );
 669+ return call_user_func( $this->mUploadCallback, $revision );
636670 }
637671
638672 private function handleContributor() {
@@ -790,6 +824,7 @@
791825 * @ingroup SpecialPage
792826 */
793827 class WikiRevision {
 828+ var $importer = null;
794829 var $title = null;
795830 var $id = 0;
796831 var $timestamp = "20010115000000";
@@ -801,6 +836,8 @@
802837 var $type = "";
803838 var $action = "";
804839 var $params = "";
 840+ var $fileSrc = '';
 841+ var $archiveName = '';
805842
806843 function setTitle( $title ) {
807844 if( is_object( $title ) ) {
@@ -844,10 +881,16 @@
845882 function setSrc( $src ) {
846883 $this->src = $src;
847884 }
 885+ function setFileSrc( $src ) {
 886+ $this->fileSrc = $src;
 887+ }
848888
849889 function setFilename( $filename ) {
850890 $this->filename = $filename;
851891 }
 892+ function setArchiveName( $archiveName ) {
 893+ $this->archiveName = $archiveName;
 894+ }
852895
853896 function setSize( $size ) {
854897 $this->size = intval( $size );
@@ -896,10 +939,16 @@
897940 function getSrc() {
898941 return $this->src;
899942 }
 943+ function getFileSrc() {
 944+ return $this->fileSrc;
 945+ }
900946
901947 function getFilename() {
902948 return $this->filename;
903949 }
 950+ function getArchiveName() {
 951+ return $this->archiveName;
 952+ }
904953
905954 function getSize() {
906955 return $this->size;
@@ -1044,62 +1093,55 @@
10451094 }
10461095
10471096 function importUpload() {
1048 - wfDebug( __METHOD__ . ": STUB\n" );
1049 -
1050 - /**
1051 - // from file revert...
1052 - $source = $this->file->getArchiveVirtualUrl( $this->oldimage );
1053 - $comment = $wgRequest->getText( 'wpComment' );
1054 - // TODO: Preserve file properties from database instead of reloading from file
1055 - $status = $this->file->upload( $source, $comment, $comment );
1056 - if( $status->isGood() ) {
1057 - */
1058 -
1059 - /**
1060 - // from file upload...
1061 - $this->mLocalFile = wfLocalFile( $nt );
1062 - $this->mDestName = $this->mLocalFile->getName();
1063 - //....
1064 - $status = $this->mLocalFile->upload( $this->mTempPath, $this->mComment, $pageText,
1065 - File::DELETE_SOURCE, $this->mFileProps );
1066 - if ( !$status->isGood() ) {
1067 - $resultDetails = array( 'internal' => $status->getWikiText() );
1068 - */
1069 -
1070 - // @todo Fixme: it may create a page without our desire, also wrong potentially.
1071 - // and, it will record a *current* upload, but we might want an archive version here
1072 -
1073 - $file = wfLocalFile( $this->getTitle() );
 1097+ # Construct a file
 1098+ $archiveName = $this->getArchiveName();
 1099+ if ( $archiveName ) {
 1100+ wfDebug( __METHOD__ . "Importing archived file as $archiveName\n" );
 1101+ $file = OldLocalFile::newFromArchiveName( $this->getTitle(),
 1102+ RepoGroup::singleton()->getLocalRepo(), $archiveName );
 1103+ } else {
 1104+ $file = wfLocalFile( $this->getTitle() );
 1105+ wfDebug( __METHOD__ . 'Importing new file as ' . $file->getName() . "\n" );
 1106+ if ( $file->exists() && $file->getTimestamp() > $this->getTimestamp() ) {
 1107+ $archiveName = $file->getTimestamp() . '!' . $file->getName();
 1108+ $file = OldLocalFile::newFromArchiveName( $this->getTitle(),
 1109+ RepoGroup::singleton()->getLocalRepo(), $archiveName );
 1110+ wfDebug( __METHOD__ . "File already exists; importing as $archiveName\n" );
 1111+ }
 1112+ }
10741113 if( !$file ) {
1075 - wfDebug( "IMPORT: Bad file. :(\n" );
 1114+ wfDebug( __METHOD__ . ': Bad file for ' . $this->getTitle() . "\n" );
10761115 return false;
10771116 }
1078 -
1079 - $source = $this->downloadSource();
 1117+
 1118+ # Get the file source or download if necessary
 1119+ $source = $this->getFileSrc();
 1120+ if ( !$source ) {
 1121+ $source = $this->downloadSource();
 1122+ }
10801123 if( !$source ) {
1081 - wfDebug( "IMPORT: Could not fetch remote file. :(\n" );
 1124+ wfDebug( __METHOD__ . ": Could not fetch remote file.\n" );
10821125 return false;
10831126 }
10841127
10851128 $user = User::newFromName( $this->user_text );
1086 -
1087 - $status = $file->upload( $source,
1088 - $this->getComment(),
1089 - $this->getComment(), // Initial page, if none present...
1090 - File::DELETE_SOURCE,
1091 - false, // props...
1092 - $this->getTimestamp(),
1093 - is_object( $user ) ? ( $user->isLoggedIn() ? $user : null ) : null );
1094 -
1095 - if( $status->isGood() ) {
1096 - // yay?
1097 - wfDebug( "IMPORT: is ok?\n" );
 1129+
 1130+ # Do the actual upload
 1131+ if ( $archiveName ) {
 1132+ $status = $file->uploadOld( $source, $archiveName,
 1133+ $this->getTimestamp(), $this->getComment(), $user, File::DELETE_SOURCE );
 1134+ } else {
 1135+ $status = $file->upload( $source, $this->getComment(), $this->getComment(),
 1136+ File::DELETE_SOURCE, false, $this->getTimestamp(), $user );
 1137+ }
 1138+
 1139+ if ( $status->isGood() ) {
 1140+ wfDebug( __METHOD__ . ": Succesful\n" );
10981141 return true;
 1142+ } else {
 1143+ wfDebug( __METHOD__ . ': failed: ' . $status->getXml() . "\n" );
 1144+ return false;
10991145 }
1100 -
1101 - wfDebug( "IMPORT: is bad? " . $status->getXml() . "\n" );
1102 - return false;
1103 -
11041146 }
11051147
11061148 function downloadSource() {

Follow-up revisions

RevisionCommit summaryAuthorDate
r87176Add --include-files option to dumpBackup.php to include the uploaded files in...btongminh21:35, 30 April 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r85635First part of bug 22881: Allow uploading directly into the archive to support...btongminh20:19, 7 April 2011

Comments

#Comment by Brion VIBBER (talk | contribs)   23:36, 7 June 2011

This output format has some potential problems with very large files; videos can be hundreds of megabytes quite easily, and multi-gigabyte files (eg feature-length high-resolution movies) are not unthinkable.

While it should be possible in principle to handle a really huge incoming data file in the stream, currently it'll be buffered up into memory, requiring *at least* ~2.5x the size of the original file for the base64 string, and the decoded binary string before it's written out to a file.

This doesn't block working on it as a non-default experimental feature, but it's worth looking out for... especially if it turns out that XMLReader will try to batch up an entire multi-hundred-megabyte string into one node's 'value' property or something, it may be hard to actually read the value in a streaming way.

#Comment by Bryan (talk | contribs)   08:00, 18 June 2011

It's possible to not include the files in the XML stream, and import them from a separate directory. That's in one of the follow-up revisions.

Status & tagging log