r95260 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r95259‎ | r95260 | r95261 >
Date:22:01, 22 August 2011
Author:ariel
Status:resolved (Comments)
Tags:
Comment:
add functions that support close and rename of output files as they are being written, used to write out checkpoint files at regular intervals during XML dump production
Modified paths:
  • /trunk/phase3/includes/Export.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/Export.php
@@ -354,6 +354,9 @@
355355 * @ingroup Dump
356356 */
357357 class XmlDumpWriter {
 358+ var $firstPageWritten = 0;
 359+ var $lastPageWritten = 0;
 360+ var $pageInProgress = 0;
358361
359362 /**
360363 * Returns the export schema version.
@@ -458,6 +461,7 @@
459462 $title = Title::makeTitle( $row->page_namespace, $row->page_title );
460463 $out .= ' ' . Xml::elementClean( 'title', array(), $title->getPrefixedText() ) . "\n";
461464 $out .= ' ' . Xml::element( 'id', array(), strval( $row->page_id ) ) . "\n";
 465+ $this->pageInProgress = $row->page_id;
462466 if ( $row->page_is_redirect ) {
463467 $out .= ' ' . Xml::element( 'redirect', array() ) . "\n";
464468 }
@@ -478,6 +482,10 @@
479483 */
480484 function closePage() {
481485 return " </page>\n";
 486+ if (! $this->firstPageWritten) {
 487+ $this->firstPageWritten = $this->pageInProgress;
 488+ }
 489+ $this->lastPageWritten = $this->pageInProgress;
482490 }
483491
484492 /**
@@ -691,6 +699,18 @@
692700 function write( $string ) {
693701 print $string;
694702 }
 703+
 704+ function closeRenameAndReopen( $newname ) {
 705+ return;
 706+ }
 707+
 708+ function rename( $newname ) {
 709+ return;
 710+ }
 711+
 712+ function getFilename() {
 713+ return NULL;
 714+ }
695715 }
696716
697717 /**
@@ -699,14 +719,56 @@
700720 */
701721 class DumpFileOutput extends DumpOutput {
702722 var $handle;
 723+ var $filename;
703724
704725 function __construct( $file ) {
705726 $this->handle = fopen( $file, "wt" );
 727+ $this->filename = $file;
706728 }
707729
708730 function write( $string ) {
709731 fputs( $this->handle, $string );
710732 }
 733+
 734+ /**
 735+ * Close the old file, move it to a specified name,
 736+ * and reopen new file with the old name. Use this
 737+ * for writing out a file in multiple pieces
 738+ * at specified checkpoints (e.g. every n hours).
 739+ */
 740+ function closeRenameAndReopen( $newname ) {
 741+ if ( is_array($newname) ) {
 742+ if (count($newname) > 1) {
 743+ WfDie("Export closeRenameAndReopen: passed multiple argumnts for rename of single file\n");
 744+ }
 745+ else {
 746+ $newname = $newname[0];
 747+ }
 748+ }
 749+ if ( $newname ) {
 750+ fclose( $this->handle );
 751+ rename( $this->filename, $newname );
 752+ $this->handle = fopen( $this->filename, "wt" );
 753+ }
 754+ }
 755+
 756+ function rename( $newname ) {
 757+ if ( is_array($newname) ) {
 758+ if (count($newname) > 1) {
 759+ WfDie("Export closeRenameAndReopen: passed multiple argumnts for rename of single file\n");
 760+ }
 761+ else {
 762+ $newname = $newname[0];
 763+ }
 764+ }
 765+ if ( $newname ) {
 766+ rename( $this->filename, $newname );
 767+ }
 768+ }
 769+
 770+ function getFilename() {
 771+ return $this->filename;
 772+ }
711773 }
712774
713775 /**
@@ -716,12 +778,52 @@
717779 * @ingroup Dump
718780 */
719781 class DumpPipeOutput extends DumpFileOutput {
 782+ var $command;
 783+
720784 function __construct( $command, $file = null ) {
721785 if ( !is_null( $file ) ) {
722786 $command .= " > " . wfEscapeShellArg( $file );
723787 }
724788 $this->handle = popen( $command, "w" );
 789+ $this->command = $command;
 790+ $this->filename = $file;
725791 }
 792+
 793+ /**
 794+ * Close the old file, move it to a specified name,
 795+ * and reopen new file with the old name.
 796+ */
 797+ function closeRenameAndReopen( $newname ) {
 798+ if ( is_array($newname) ) {
 799+ if (count($newname) > 1) {
 800+ WfDie("Export closeRenameAndReopen: passed multiple argumnts for rename of single file\n");
 801+ }
 802+ else {
 803+ $newname = $newname[0];
 804+ }
 805+ }
 806+ if ( $newname ) {
 807+ pclose( $this->handle );
 808+ rename( $this->filename, $newname );
 809+ $command = $this->command;
 810+ $command .= " > " . wfEscapeShellArg( $this->filename );
 811+ $this->handle = popen( $command, "w" );
 812+ }
 813+ }
 814+
 815+ function rename( $newname ) {
 816+ if ( is_array($newname) ) {
 817+ if (count($newname) > 1) {
 818+ WfDie("Export closeRenameAndReopen: passed multiple argumnts for rename of single file\n");
 819+ }
 820+ else {
 821+ $newname = $newname[0];
 822+ }
 823+ }
 824+ if ( $newname ) {
 825+ rename( $this->filename, $newname );
 826+ }
 827+ }
726828 }
727829
728830 /**
@@ -749,13 +851,48 @@
750852 * @ingroup Dump
751853 */
752854 class Dump7ZipOutput extends DumpPipeOutput {
 855+ var $filename;
 856+
753857 function __construct( $file ) {
754858 $command = "7za a -bd -si " . wfEscapeShellArg( $file );
755859 // Suppress annoying useless crap from p7zip
756860 // Unfortunately this could suppress real error messages too
757861 $command .= ' >' . wfGetNull() . ' 2>&1';
758862 parent::__construct( $command );
 863+ $this->filename = $file;
759864 }
 865+
 866+ function closeRenameAndReopen( $newname ) {
 867+ if ( is_array($newname) ) {
 868+ if (count($newname) > 1) {
 869+ WfDie("Export closeRenameAndReopen: passed multiple argumnts for rename of single file\n");
 870+ }
 871+ else {
 872+ $newname = $newname[0];
 873+ }
 874+ }
 875+ if ( $newname ) {
 876+ pclose( $this->handle );
 877+ rename( $this->filename, $newname );
 878+ $command = "7za a -bd -si " . wfEscapeShellArg( $file );
 879+ $command .= ' >' . wfGetNull() . ' 2>&1';
 880+ $this->handle = popen( $command, "w" );
 881+ }
 882+ }
 883+
 884+ function rename( $newname ) {
 885+ if ( is_array($newname) ) {
 886+ if (count($newname) > 1) {
 887+ WfDie("Export closeRenameAndReopen: passed multiple argumnts for rename of single file\n");
 888+ }
 889+ else {
 890+ $newname = $newname[0];
 891+ }
 892+ }
 893+ if ( $newname ) {
 894+ rename( $this->filename, $newname );
 895+ }
 896+ }
760897 }
761898
762899
@@ -803,6 +940,18 @@
804941 $this->sink->writeRevision( $rev, $string );
805942 }
806943
 944+ function closeRenameAndReopen( $newname ) {
 945+ $this->sink->closeRenameAndReopen( $newname );
 946+ }
 947+
 948+ function rename( $newname ) {
 949+ $this->sink->rename( $newname );
 950+ }
 951+
 952+ function getFilename() {
 953+ return $this->sink->getFilename();
 954+ }
 955+
807956 /**
808957 * Override for page-based filter types.
809958 * @return bool
@@ -950,6 +1099,27 @@
9511100 $this->sinks[$i]->writeRevision( $rev, $string );
9521101 }
9531102 }
 1103+
 1104+ function closeRenameAndReopen( $newnames ) {
 1105+ for( $i = 0; $i < $this->count; $i++ ) {
 1106+ $this->sinks[$i]->closeRenameAndReopen( $newnames[$i] );
 1107+ }
 1108+ }
 1109+
 1110+ function rename( $newnames ) {
 1111+ for( $i = 0; $i < $this->count; $i++ ) {
 1112+ $this->sinks[$i]->rename( $newnames[$i] );
 1113+ }
 1114+ }
 1115+
 1116+ function getFilename() {
 1117+ $filenames = array();
 1118+ for( $i = 0; $i < $this->count; $i++ ) {
 1119+ $filenames[] = $this->sinks[$i]->getFilename();
 1120+ }
 1121+ return $filenames;
 1122+ }
 1123+
9541124 }
9551125
9561126 function xmlsafe( $string ) {

Follow-up revisions

RevisionCommit summaryAuthorDate
r95601replace WfDie()ariel15:50, 27 August 2011
r96275Fixes for r95260:...catrope11:16, 5 September 2011
r964651.17wmf1: MFT r94135, r95260, r95311, r95601, r95790, r96434catrope18:33, 7 September 2011
r96486get rid of duplication, remove unused function rename(), add documentation as...ariel20:21, 7 September 2011
r96556MFT r95260, r95272, r95288, r95290, r95443, r95601, r95604, r95634, r95720, r...reedy12:28, 8 September 2011
r97744Comment out unreachable code...reedy16:34, 21 September 2011
r97800remove cruft that didn't all get cleaned up before commit in r95260, addressi...ariel07:03, 22 September 2011

Comments

#Comment by Platonides (talk | contribs)   15:29, 27 August 2011

wfDie() was removed in r91602

#Comment by 😂 (talk | contribs)   15:34, 27 August 2011

Not sure how I missed that. Should be replaced with throwing an exception.

#Comment by ArielGlenn (talk | contribs)   15:59, 27 August 2011

fixme addressed in r95601

#Comment by Catrope (talk | contribs)   11:18, 5 September 2011

Fixed some things that irked me in r96275.

+	function closeRenameAndReopen( $newname ) {
+	function rename( $newname ) {
+	function getFilename() {

These functions are undocumented. closeRenameAndReopen() was documented in a subclass, so I moved that doc comment up to the parent class, but the other two aren't documented anywhere.

+			$command = "7za a -bd -si " . wfEscapeShellArg( $file );
+			$command .= ' >' . wfGetNull() . ' 2>&1';

The command building, both here and in DumpPipeOutput, is duplicated and should be factored out.

Good enough for deployment, so marking OK, but please do document the new code and fix the code duplication you introduced.

#Comment by Reedy (talk | contribs)   21:57, 21 September 2011

Per the fixme on r97744

 	function closePage() {
 		return "  </page>\n";
+		if (! $this->firstPageWritten) {
+			$this->firstPageWritten = $this->pageInProgress;
+		}
+		$this->lastPageWritten = $this->pageInProgress;
 	}


Did you mean to add the code AFTER the return, so hence, it will never be reached?

#Comment by Brion VIBBER (talk | contribs)   21:57, 21 September 2011

r97744 comments out some unreachable code in XmlDumpWriter::closePage -- something is amiss.

#Comment by ArielGlenn (talk | contribs)   07:05, 22 September 2011

Ugh, I meant that to get removed, not committed. Fixed in r97800.

#Comment by Brion VIBBER (talk | contribs)   22:40, 22 September 2011

yay!

Status & tagging log