r94514 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r94513‎ | r94514 | r94515 >
Date:13:56, 15 August 2011
Author:reedy
Status:deferred (Comments)
Tags:
Comment:
Create MetricsMaintenance.php base class to house DB creation

Add basic "generic" data table

Swap FetchGoogleSpreadsheet to subclass MetricsMaintenance

Added ImportCSVFile maintenance script to pull from a generic CSV file and push it into the metrics database
Modified paths:
  • /trunk/extensions/MetricsReporting/GenericDataTable.sql (added) (history)
  • /trunk/extensions/MetricsReporting/MetricsMaintenance.php (added) (history)
  • /trunk/extensions/MetricsReporting/fetchGoogleSpreadsheet.php (modified) (history)
  • /trunk/extensions/MetricsReporting/importCSVFile.php (added) (history)

Diff [purge]

Index: trunk/extensions/MetricsReporting/MetricsMaintenance.php
@@ -0,0 +1,31 @@
 2+<?php
 3+
 4+$IP = getenv( 'MW_INSTALL_PATH' );
 5+if ( $IP === false ) {
 6+ $IP = dirname( __FILE__ ) . '/../..';
 7+}
 8+require( "$IP/maintenance/Maintenance.php" );
 9+
 10+abstract class MetricsMaintenance extends Maintenance {
 11+ private $db;
 12+
 13+ /**
 14+ * @return DatabaseBase
 15+ */
 16+ protected function getDB() {
 17+ if ( is_null( $this->db ) ) {
 18+ global $wgMetricsDBserver, $wgMetricsDBname, $wgMetricsDBuser, $wgMetricsDBpassword, $wgMetricsDBtype;
 19+ $this->db = DatabaseBase::factory( $wgMetricsDBtype,
 20+ array(
 21+ 'host' => $wgMetricsDBserver,
 22+ 'user' => $wgMetricsDBuser,
 23+ 'password' => $wgMetricsDBpassword,
 24+ 'dbname' => $wgMetricsDBname,
 25+ 'tablePrefix' => '',
 26+ )
 27+ );
 28+ //$this->db->query( "SET names utf8" );
 29+ }
 30+ return $this->db;
 31+ }
 32+}
Property changes on: trunk/extensions/MetricsReporting/MetricsMaintenance.php
___________________________________________________________________
Added: svn:eol-style
133 + native
Index: trunk/extensions/MetricsReporting/fetchGoogleSpreadsheet.php
@@ -1,12 +1,8 @@
22 <?php
33
4 -$IP = getenv( 'MW_INSTALL_PATH' );
5 -if ( $IP === false ) {
6 - $IP = dirname( __FILE__ ) . '/../..';
7 -}
8 -require( "$IP/maintenance/Maintenance.php" );
 4+require( "MetricsMaintenance.php" );
95
10 -class FetchGoogleSpreadsheet extends Maintenance {
 6+class FetchGoogleSpreadsheet extends MetricsMaintenance {
117 public function __construct() {
128 parent::__construct();
139 $this->mDescription = "Grabs and does stuff with a Google Documents Spreadsheet";
Index: trunk/extensions/MetricsReporting/GenericDataTable.sql
@@ -0,0 +1,9 @@
 2+CREATE TABLE `GenericDataTable` (
 3+ `date` date NOT NULL,
 4+ `language_code` char (15) DEFAULT NULL,
 5+ `project_code` varchar (10) DEFAULT NULL,
 6+ `country_code` varchar (3) DEFAULT NULL,
 7+ `value` bigint (12),-- Needs to be decimal?
 8+ PRIMARY KEY (date,language_code,project_code,country_code,value),
 9+ -- More indexes may be needed where language, project and country could be null
 10+) ;
\ No newline at end of file
Property changes on: trunk/extensions/MetricsReporting/GenericDataTable.sql
___________________________________________________________________
Added: svn:eol-style
111 + native
Index: trunk/extensions/MetricsReporting/importCSVFile.php
@@ -0,0 +1,58 @@
 2+<?php
 3+
 4+require( "MetricsMaintenance.php" );
 5+
 6+class ImportCSVFile extends MetricsMaintenance {
 7+
 8+ public function __construct() {
 9+ parent::__construct();
 10+ $this->mDescription = "Imports a generic MetricsReporting CSV file into its table";
 11+ $this->addArg( 'table', 'Table to import file to' );
 12+ $this->addArg( 'file', 'File to import' );
 13+ }
 14+
 15+ public function execute() {
 16+ $table = $this->getArg( 0 );
 17+ $fileName = $this->getArg( 1 );
 18+
 19+ if ( !file_exists( $fileName ) ) {
 20+ $this->error( 'Input file does not exist', true );
 21+ }
 22+
 23+ $db = $this->getDB();
 24+ if ( !$db->tableExists( $table ) ) {
 25+ $this->error( "Target table doesn't exist. Please create it" );
 26+ }
 27+
 28+ $file = fopen( $fileName, 'r' );
 29+ if ( $file === false ) {
 30+ $this->error( 'Unable to open input file', true );
 31+ }
 32+
 33+ $db->begin( __METHOD__ );
 34+
 35+ feof( $file ); // Strip 1st/header line. Might need to be disableable
 36+
 37+ while( !feof( $file ) ) {
 38+ list( $date, $lang, $project, $country, $value ) = explode( ',', fgets( $file ) );
 39+ $db->insert(
 40+ $table,
 41+ array(
 42+ 'date' => $date,
 43+ 'language_code' => $lang,
 44+ 'project_code' => $project,
 45+ 'country_code' => $country,
 46+ 'value' => $value,
 47+ ),
 48+ __METHOD__,
 49+ array( 'IGNORE' ) // If we're importing duplicates, just skip
 50+ );
 51+ }
 52+ fclose( $file );
 53+ $db->commit( __METHOD__ );
 54+ $this->output( 'Complete!' );
 55+ }
 56+}
 57+
 58+$maintClass = "ImportCSVFile";
 59+require_once( DO_MAINTENANCE );
\ No newline at end of file
Property changes on: trunk/extensions/MetricsReporting/importCSVFile.php
___________________________________________________________________
Added: svn:eol-style
160 + native

Comments

#Comment by MZMcBride (talk | contribs)   18:08, 18 August 2011
CREATE TABLE `GenericDataTable` (
  `date`           date NOT NULL,
  `language_code`  char (15) DEFAULT NULL,
  `project_code`   varchar (10) DEFAULT NULL,
  `country_code`   varchar (3) DEFAULT NULL,
  `value`          bigint (12),-- Needs to be decimal?
  PRIMARY KEY (date,language_code,project_code,country_code,value),
  -- More indexes may be needed where language, project and country could be null
) ;

I see two potential issues here:

  1. Is this valid syntax? I'm don't think you can use a trailing comma like that after the primary key.
  2. Does this syntax meet current coding conventions? I thought indexes were generally defined outside of create table (cf. rev:94791#c21055). Maybe that's only for "secondary indexes" (I'm not really sure what that means in context)?
#Comment by Reedy (talk | contribs)   18:14, 18 August 2011

The trailing comma is indeed wrong

Primary key is fine.

Secondary indexes are indexes that aren't the primary index

#Comment by MZMcBride (talk | contribs)   19:19, 18 August 2011

There should be a common prefix for the database columns, right?

Plus this will need table prefix magic ($wgDBprefix) like every other extension, I assume.

I also don't see much value in the backticks, but those seem to be a style preference more than anything.

#Comment by Reedy (talk | contribs)   19:23, 18 August 2011

This doesn't need the table prefix magic, as it's not designed for use than other for Wikimedias Stats needs. And it's designed with the intention of using a 3rd party database instance (ie not the one MediaWiki is currently running from)

The backticks there are as done by example from [1]

Common prefixes for columns is personal preference, if the other tables (see link above) were using them, sure, but it seems somewhat daft to change the convention for these. We could change the original also, if we really cared that much

#Comment by MZMcBride (talk | contribs)   19:44, 18 August 2011

If the tables are going to be replicated or dumped (I have no idea whether this info will be public), it makes life vastly easier to use a common prefix. It's also simply good form.

#Comment by Reedy (talk | contribs)   19:46, 18 August 2011

Some of the stats core data are restricted, and would be only publishable in an aggregated form or similar

But then, that's why we're creating the API, so it can be reused ;)

Status & tagging log