r52549 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r52548‎ | r52549 | r52550 >
Date:16:16, 29 June 2009
Author:daniel
Status:deferred
Tags:
Comment:
documenting
Modified paths:
  • /trunk/WikiWord/WikiWordIntegrator/src/docbook/Manual.xml (modified) (history)

Diff [purge]

Index: trunk/WikiWord/WikiWordIntegrator/src/docbook/Manual.xml
@@ -3,10 +3,13 @@
44 <article lang="en-US">
55 <title>WikiWord: Integrator</title>
66
 7+<sect1>
 8+ <title>Intro</title>
79 <para>WikiWord is a system for extracting a theraurus from Wikipedia.
810 The Integrator module is desigend top use this data as a glue between
911 different data sets, that is, to map between different vocabularies,
10 - standardized or natural.</param>
 12+ standardized or natural.</para>
 13+</sect1>
1114
1215 <sect1>
1316 <title>Process</title>
@@ -57,9 +60,10 @@
5861 <para>Sometimes, it is desired to only get <emphasis>exact, exclusive</emphasis> matches &mdash; that is, not only to exclude any foreign concept for which there exists more than one mapping to  WikiWord, but to also to exclude all WikiWord concepts mapped to more than one foreign concept. This yields a strict 1:1 relationship and avoids any mismatches in scope or granularity. This is particularly useful when transferring definitions from one authority to another.</para>
5962 </sect3>
6063 </sect2>
 64+</sect1>
6165
6266 <sect1>
63 -<title>Architecture</title>
 67+ <title>Architecture</title>
6468
6569 <sect2>
6670 <title>Classes</title>
@@ -76,26 +80,53 @@
7781 <listitem>
7882 <para>The Processor fetches on entry after another from the DataCursor and passes it to the StoreBuilder. Note that any logic for filtering, grouping and converting of entries is usually implemented in the DataCursor, not in the Processor.</para>
7983 </listitem>
80 -</orderedlist>
 84+<listitem>
8185 <para>Application</para>
 86+</listitem>
 87+<listitem>
8288 <para>DB Configuration, Tweaks, SourceDescriptor</para>
 89+</listitem>
 90+<listitem>
8391 <para>Store, StoreBuilder</para>
 92+</listitem>
 93+<listitem>
8494 <para>FeatureSet</para>
 95+</listitem>
 96+<listitem>
8597 <para>DataCursor</para>
 98+</listitem>
 99+<listitem>
86100 <para>FeatureSetSourceDescriptor</para>
 101+</listitem>
 102+<listitem>
87103 <para>Processor</para>
 104+</listitem>
 105+<listitem>
88106 <para>Associations</para>
 107+</listitem>
 108+<listitem>
89109 <para>MappingCandidates</para>
 110+</listitem>
 111+<listitem>
90112 <para>Filter, Selector</para>
 113+</listitem>
 114+<listitem>
91115 <para>Scorer</para>
 116+</listitem>
 117+<listitem>
92118 <para>Aggregator, Accessor</para>
 119+</listitem>
 120+<listitem>
 121+<para>Aggregator, Accessor</para>
 122+</listitem>
 123+</orderedlist>
93124 </sect2>
94125
95126
96127 <sect2>
97128 <title>Database</title>
 129+<para>...</para>
98130 </sect2>
99 -
100131 </sect1>
101132
102133 <sect1>
@@ -103,14 +134,17 @@
104135
105136 <sect2>
106137 <title>Configuration files</title>
 138+<para>...</para>
107139 </sect2>
108140
109141 <sect2>
110142 <title>Command Line</title>
 143+<para>...</para>
111144 </sect2>
112145
113146 <sect2>
114147 <title>BeanShell Commands</title>
 148+<para>...</para>
115149 </sect2>
116150
117151 <sect2>
@@ -118,27 +152,440 @@
119153
120154 <sect3>
121155 <title>Database</title>
 156+<para>...</para>
122157 </sect3>
123158
124159 <sect3>
125160 <title>Tweaks</title>
 161+<para>...</para>
126162 </sect3>
127163
128164 <sect3>
129165 <title>Source Descriptor</title>
 166+<para>...syntax...</para>
 167+<para>...syntax in beanshell...</para>
 168+ <note><para>In parameter names, "-" and "_" are interchangable.</para></note>
 169+ <variablelist><title>Source Descriptor Parameters</title>
 170+
 171+ <varlistentry>
 172+ <term>
 173+ <parameter>association-annotation-field</parameter>
 174+ </term>
 175+ <listitem>
 176+ <para>The field/column that contains the annotation string. The annotation could be any additional info attached to a mapping. Used with <classname>BuildConceptMappings</classname>.</para>
 177+ </listitem>
 178+ </varlistentry>
 179+
 180+ <varlistentry>
 181+ <term>
 182+ <parameter>association-value-field</parameter>
 183+ </term>
 184+ <listitem>
 185+ <para>The field/column that contains the association value. That is the value that was used to derive the association. Used with <classname>BuildConceptAssociations</classname>.</para>
 186+ </listitem>
 187+ </varlistentry>
 188+
 189+ <varlistentry>
 190+ <term>
 191+ <parameter>association-weight-field</parameter>
 192+ </term>
 193+ <listitem>
 194+ <para>The field/column that contains the association weight. The weight may be used for filtering. Used with <classname>BuildConceptAssociations</classname> as well as <classname>BuildConceptMappings</classname> and <classname>FilterConceptMappings</classname>.</para>
 195+ </listitem>
 196+ </varlistentry>
 197+
 198+ <varlistentry>
 199+ <term>
 200+ <parameter>authority</parameter>
 201+ </term>
 202+ <listitem>
 203+ <para>The name of an external authority; the authority name serves as the namespace for foreign property names and foreign entity IDs. Instead of setting <parameter>authority</parameter> to a fixed value, it can also be taken from a data field/column spcified by <parameter>foreign-authority-field</parameter>.</para>
 204+ </listitem>
 205+ </varlistentry>
 206+
 207+ <varlistentry>
 208+ <term>
 209+ <parameter>concept-fields</parameter>
 210+ </term>
 211+ <listitem>
 212+ <para>The list of fields names that will be taken to belong to the concept (mapping target resp. object) when building an <classname>Association</classname> from a single <classname>FeatureSet</classname> instance. Used with <classname>BuildConceptAssociations</classname>.</para>
 213+ </listitem>
 214+ </varlistentry>
 215+
 216+ <varlistentry>
 217+ <term>
 218+ <parameter>concept-id-field</parameter>
 219+ </term>
 220+ <listitem>
 221+ <para>???</para>
 222+ </listitem>
 223+ </varlistentry>
 224+
 225+ <varlistentry>
 226+ <term>
 227+ <parameter>concept-name-field</parameter>
 228+ </term>
 229+ <listitem>
 230+ <para>???</para>
 231+ </listitem>
 232+ </varlistentry>
 233+
 234+ <varlistentry>
 235+ <term>
 236+ <parameter>concept-property-field</parameter>
 237+ </term>
 238+ <listitem>
 239+ <para>???</para>
 240+ </listitem>
 241+ </varlistentry>
 242+
 243+ <varlistentry>
 244+ <term>
 245+ <parameter>concept-property-freq-field</parameter>
 246+ </term>
 247+ <listitem>
 248+ <para>???</para>
 249+ </listitem>
 250+ </varlistentry>
 251+
 252+ <varlistentry>
 253+ <term>
 254+ <parameter>concept-property-source-field</parameter>
 255+ </term>
 256+ <listitem>
 257+ <para>???</para>
 258+ </listitem>
 259+ </varlistentry>
 260+
 261+ <varlistentry>
 262+ <term>
 263+ <parameter>console-encoding</parameter>
 264+ </term>
 265+ <listitem>
 266+ <para>???</para>
 267+ </listitem>
 268+ </varlistentry>
 269+
 270+ <varlistentry>
 271+ <term>
 272+ <parameter>csv-backslash-escape</parameter>
 273+ </term>
 274+ <listitem>
 275+ <para>???</para>
 276+ </listitem>
 277+ </varlistentry>
 278+
 279+ <varlistentry>
 280+ <term>
 281+ <parameter>csv-chunker</parameter>
 282+ </term>
 283+ <listitem>
 284+ <para>???</para>
 285+ </listitem>
 286+ </varlistentry>
 287+
 288+ <varlistentry>
 289+ <term>
 290+ <parameter>csv-separator</parameter>
 291+ </term>
 292+ <listitem>
 293+ <para>???</para>
 294+ </listitem>
 295+ </varlistentry>
 296+
 297+ <varlistentry>
 298+ <term>
 299+ <parameter>csv-skip-bad-rows</parameter>
 300+ </term>
 301+ <listitem>
 302+ <para>???</para>
 303+ </listitem>
 304+ </varlistentry>
 305+
 306+ <varlistentry>
 307+ <term>
 308+ <parameter>csv-skip-header</parameter>
 309+ </term>
 310+ <listitem>
 311+ <para>???</para>
 312+ </listitem>
 313+ </varlistentry>
 314+
 315+ <varlistentry>
 316+ <term>
 317+ <parameter>defaults</parameter>
 318+ </term>
 319+ <listitem>
 320+ <para>???</para>
 321+ </listitem>
 322+ </varlistentry>
 323+
 324+ <varlistentry>
 325+ <term>
 326+ <parameter>encoding</parameter>
 327+ </term>
 328+ <listitem>
 329+ <para>???</para>
 330+ </listitem>
 331+ </varlistentry>
 332+
 333+ <varlistentry>
 334+ <term>
 335+ <parameter>field-chunkers</parameter>
 336+ </term>
 337+ <listitem>
 338+ <para>???</para>
 339+ </listitem>
 340+ </varlistentry>
 341+
 342+ <varlistentry>
 343+ <term>
 344+ <parameter>fields</parameter>
 345+ </term>
 346+ <listitem>
 347+ <para>???</para>
 348+ </listitem>
 349+ </varlistentry>
 350+
 351+ <varlistentry>
 352+ <term>
 353+ <parameter>file</parameter>
 354+ </term>
 355+ <listitem>
 356+ <para>???</para>
 357+ </listitem>
 358+ </varlistentry>
 359+
 360+ <varlistentry>
 361+ <term>
 362+ <parameter>file-format</parameter>
 363+ </term>
 364+ <listitem>
 365+ <para>???</para>
 366+ </listitem>
 367+ </varlistentry>
 368+
 369+ <varlistentry>
 370+ <term>
 371+ <parameter>foreign-authority-field</parameter>
 372+ </term>
 373+ <listitem>
 374+ <para>???</para>
 375+ </listitem>
 376+ </varlistentry>
 377+
 378+ <varlistentry>
 379+ <term>
 380+ <parameter>foreign-fields</parameter>
 381+ </term>
 382+ <listitem>
 383+ <para>The list of fields names that will be taken to belong to the foreign entity (mapping source resp. subject) when building an <classname>Association</classname> from a single <classname>FeatureSet</classname> instance. Used with <classname>BuildConceptAssociations</classname>.</para>
 384+ </listitem>
 385+ </varlistentry>
 386+
 387+ <varlistentry>
 388+ <term>
 389+ <parameter>foreign-id-field</parameter>
 390+ </term>
 391+ <listitem>
 392+ <para>Field containing an ID unique in the context of the foreign authority, used to identify foreign entities.</para>
 393+ </listitem>
 394+ </varlistentry>
 395+
 396+ <varlistentry>
 397+ <term>
 398+ <parameter>foreign-name-field</parameter>
 399+ </term>
 400+ <listitem>
 401+ <para>Field containing a display name for foreign entities. Defaults to the value of <parameter>foreign-id-field</parameter>.</para>
 402+ </listitem>
 403+ </varlistentry>
 404+
 405+ <varlistentry>
 406+ <term>
 407+ <parameter>foreign-property-field</parameter>
 408+ </term>
 409+ <listitem>
 410+ <para>Field containing the name of the property of a foreign entity. Used with <classname>BuildConceptAssociations</classname>.</para>
 411+ </listitem>
 412+ </varlistentry>
 413+
 414+ <varlistentry>
 415+ <term>
 416+ <parameter>mapping-filter</parameter>
 417+ </term>
 418+ <listitem>
 419+ <para>Instance of <classname>MappingCandidateFilter</classname> to use for filtering. Used with <classname>FilterConceptMappings</classname>.
 420+ Overrides <property>mapping-filter-aggregator</property>, <property>mapping-filter-field</property>, <property>mapping-filter-scorer</property>, <property>mapping-filter-selector</property>, etc.
 421+ </para>
 422+ </listitem>
 423+ </varlistentry>
 424+
 425+ <varlistentry>
 426+ <term>
 427+ <parameter>mapping-filter-aggregator</parameter>
 428+ </term>
 429+ <listitem>
 430+ <para>Instance of <classname>Functor2</classname> to use as an aggregator to aggregate multiple score values into one. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property>, <parameter>mapping-filter-scorer</parameter> and <parameter>mapping-filter-accessor</parameter>.</para>
 431+ </listitem>
 432+ </varlistentry>
 433+
 434+ <varlistentry>
 435+ <term>
 436+ <parameter>mapping-filter-aggregator-function</parameter>
 437+ </term>
 438+ <listitem>
 439+ <para>Name of the aggregator function used to merge multiple score values into one; Must be either "max" or "sum", default is "sum". Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property>, <parameter>mapping-filter-scorer</parameter>, <parameter>mapping-filter-accessor</parameter> and <parameter>mapping-filter-aggregator</parameter>.</para>
 440+ </listitem>
 441+ </varlistentry>
 442+
 443+ <varlistentry>
 444+ <term>
 445+ <parameter>mapping-filter-field</parameter>
 446+ </term>
 447+ <listitem>
 448+ <para>The field to take the score for a candidate <classname>FeatureSet</classname> from. Used with <classname>FilterConceptMappings</classname>. Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property>, <parameter>mapping-filter-scorer</parameter> and <parameter>mapping-filter-accessor</parameter>.</para>
 449+ </listitem>
 450+ </varlistentry>
 451+
 452+ <varlistentry>
 453+ <term>
 454+ <parameter>mapping-filter-field-accessor</parameter>
 455+ </term>
 456+ <listitem>
 457+ <para>Instance of <classname>PropertyAccessor</classname> to extract the score from a candidate <classname>FeatureSet</classname>. Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property> and <parameter>mapping-filter-scorer</parameter>.</para>
 458+ </listitem>
 459+ </varlistentry>
 460+
 461+ <varlistentry>
 462+ <term>
 463+ <parameter>mapping-filter-scorer</parameter>
 464+ </term>
 465+ <listitem>
 466+ <para>Instance of <classname>MappingCandidateScorer</classname> to use when filtering (for best candidate, or by threshold). Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property> and <property>mapping-selector</property>,
 467+ overrides <property>mapping-filter-aggregator</property>, <property>mapping-filter-field</property> etc.
 468+ </listitem>
 469+ </varlistentry>
 470+
 471+ <varlistentry>
 472+ <term>
 473+ <parameter>mapping-filter-threshold</parameter>
 474+ </term>
 475+ <listitem>
 476+ <para>Threshold value to use when filtering mapping candidates. If set, all candidates with a score equal to or better than the threshold will pass.
 477+ Otherwise, the one candidate with the best score will pass. Used with <classname>FilterConceptMappings</classname>.
 478+ Overridden by <property>mapping-filter</property> and <property>mapping-selector</property>.
 479+ </listitem>
 480+ </varlistentry>
 481+
 482+ <varlistentry>
 483+ <term>
 484+ <parameter>mapping-selector</parameter>
 485+ </term>
 486+ <listitem>
 487+ <para>Instance of <classname>MappingCandidateSelector</classname> to use for filtering. Used with <classname>FilterConceptMappings</classname>.
 488+ Overridden by <property>mapping-filter</property>, overrides <property>mapping-filter-aggregator</property>, <property>mapping-filter-field</property>, <property>mapping-filter-scorer</property> etc.
 489+ </para>
 490+ </listitem>
 491+ </varlistentry>
 492+
 493+ <varlistentry>
 494+ <term>
 495+ <parameter>property-fields</parameter>
 496+ </term>
 497+ <listitem>
 498+ <para>The list of fields names that will be taken to belong to the association itself (mapping properties) when building an <classname>Association</classname> from a single <classname>FeatureSet</classname> instance. Used with <classname>BuildConceptAssociations</classname>.</para>
 499+ </listitem>
 500+ </varlistentry>
 501+
 502+ <varlistentry>
 503+ <term>
 504+ <parameter>property-name-field</parameter>
 505+ </term>
 506+ <listitem>
 507+ <para>???</para>
 508+ </listitem>
 509+ </varlistentry>
 510+
 511+ <varlistentry>
 512+ <term>
 513+ <parameter>property-qualifier</parameter>
 514+ </term>
 515+ <listitem>
 516+ <para>???</para>
 517+ </listitem>
 518+ </varlistentry>
 519+
 520+ <varlistentry>
 521+ <term>
 522+ <parameter>property-subject-name-field</parameter>
 523+ </term>
 524+ <listitem>
 525+ <para>???</para>
 526+ </listitem>
 527+ </varlistentry>
 528+
 529+ <varlistentry>
 530+ <term>
 531+ <parameter>query</parameter>
 532+ </term>
 533+ <listitem>
 534+ <para>???</para>
 535+ </listitem>
 536+ </varlistentry>
 537+
 538+ <varlistentry>
 539+ <term>
 540+ queryGenerator
 541+ </term>
 542+ <listitem>
 543+ <para>???</para>
 544+ </listitem>
 545+ </varlistentry>
 546+
 547+ <varlistentry>
 548+ <term>
 549+ <parameter>source-table</parameter>
 550+ </term>
 551+ <listitem>
 552+ <para>???</para>
 553+ </listitem>
 554+ </varlistentry>
 555+
 556+ <varlistentry>
 557+ <term>
 558+ <parameter>sql-comment-subst</parameter>
 559+ </term>
 560+ <listitem>
 561+ <para>???</para>
 562+ </listitem>
 563+ </varlistentry>
 564+
 565+ <varlistentry>
 566+ <term>
 567+ <parameter>sql-manglers</parameter>
 568+ </term>
 569+ <listitem>
 570+ <para>???</para>
 571+ </listitem>
 572+ </varlistentry>
 573+ </variablelist>
 574+
130575 </sect3>
131576
132577 <sect3>
133578 <title>Source Descriptor Defaults</title>
 579+<para>...</para>
134580 </sect3>
135581
136582 <sect3>
137583 <title>Built-In Scripts</title>
 584+<para>...</para>
138585 </sect3>
139586
140587 </sect2>
141588
142589
143 -</sect1>
 590+</sect1>
144591
145592 </article>

Status & tagging log