Index: trunk/WikiWord/WikiWordIntegrator/src/docbook/Manual.xml |
— | — | @@ -3,10 +3,13 @@ |
4 | 4 | <article lang="en-US"> |
5 | 5 | <title>WikiWord: Integrator</title> |
6 | 6 | |
| 7 | +<sect1> |
| 8 | + <title>Intro</title> |
7 | 9 | <para>WikiWord is a system for extracting a theraurus from Wikipedia. |
8 | 10 | The Integrator module is desigend top use this data as a glue between |
9 | 11 | different data sets, that is, to map between different vocabularies, |
10 | | - standardized or natural.</param> |
| 12 | + standardized or natural.</para> |
| 13 | +</sect1> |
11 | 14 | |
12 | 15 | <sect1> |
13 | 16 | <title>Process</title> |
— | — | @@ -57,9 +60,10 @@ |
58 | 61 | <para>Sometimes, it is desired to only get <emphasis>exact, exclusive</emphasis> matches — that is, not only to exclude any foreign concept for which there exists more than one mapping to WikiWord, but to also to exclude all WikiWord concepts mapped to more than one foreign concept. This yields a strict 1:1 relationship and avoids any mismatches in scope or granularity. This is particularly useful when transferring definitions from one authority to another.</para> |
59 | 62 | </sect3> |
60 | 63 | </sect2> |
| 64 | +</sect1> |
61 | 65 | |
62 | 66 | <sect1> |
63 | | -<title>Architecture</title> |
| 67 | + <title>Architecture</title> |
64 | 68 | |
65 | 69 | <sect2> |
66 | 70 | <title>Classes</title> |
— | — | @@ -76,26 +80,53 @@ |
77 | 81 | <listitem> |
78 | 82 | <para>The Processor fetches on entry after another from the DataCursor and passes it to the StoreBuilder. Note that any logic for filtering, grouping and converting of entries is usually implemented in the DataCursor, not in the Processor.</para> |
79 | 83 | </listitem> |
80 | | -</orderedlist> |
| 84 | +<listitem> |
81 | 85 | <para>Application</para> |
| 86 | +</listitem> |
| 87 | +<listitem> |
82 | 88 | <para>DB Configuration, Tweaks, SourceDescriptor</para> |
| 89 | +</listitem> |
| 90 | +<listitem> |
83 | 91 | <para>Store, StoreBuilder</para> |
| 92 | +</listitem> |
| 93 | +<listitem> |
84 | 94 | <para>FeatureSet</para> |
| 95 | +</listitem> |
| 96 | +<listitem> |
85 | 97 | <para>DataCursor</para> |
| 98 | +</listitem> |
| 99 | +<listitem> |
86 | 100 | <para>FeatureSetSourceDescriptor</para> |
| 101 | +</listitem> |
| 102 | +<listitem> |
87 | 103 | <para>Processor</para> |
| 104 | +</listitem> |
| 105 | +<listitem> |
88 | 106 | <para>Associations</para> |
| 107 | +</listitem> |
| 108 | +<listitem> |
89 | 109 | <para>MappingCandidates</para> |
| 110 | +</listitem> |
| 111 | +<listitem> |
90 | 112 | <para>Filter, Selector</para> |
| 113 | +</listitem> |
| 114 | +<listitem> |
91 | 115 | <para>Scorer</para> |
| 116 | +</listitem> |
| 117 | +<listitem> |
92 | 118 | <para>Aggregator, Accessor</para> |
| 119 | +</listitem> |
| 120 | +<listitem> |
| 121 | +<para>Aggregator, Accessor</para> |
| 122 | +</listitem> |
| 123 | +</orderedlist> |
93 | 124 | </sect2> |
94 | 125 | |
95 | 126 | |
96 | 127 | <sect2> |
97 | 128 | <title>Database</title> |
| 129 | +<para>...</para> |
98 | 130 | </sect2> |
99 | | - |
100 | 131 | </sect1> |
101 | 132 | |
102 | 133 | <sect1> |
— | — | @@ -103,14 +134,17 @@ |
104 | 135 | |
105 | 136 | <sect2> |
106 | 137 | <title>Configuration files</title> |
| 138 | +<para>...</para> |
107 | 139 | </sect2> |
108 | 140 | |
109 | 141 | <sect2> |
110 | 142 | <title>Command Line</title> |
| 143 | +<para>...</para> |
111 | 144 | </sect2> |
112 | 145 | |
113 | 146 | <sect2> |
114 | 147 | <title>BeanShell Commands</title> |
| 148 | +<para>...</para> |
115 | 149 | </sect2> |
116 | 150 | |
117 | 151 | <sect2> |
— | — | @@ -118,27 +152,440 @@ |
119 | 153 | |
120 | 154 | <sect3> |
121 | 155 | <title>Database</title> |
| 156 | +<para>...</para> |
122 | 157 | </sect3> |
123 | 158 | |
124 | 159 | <sect3> |
125 | 160 | <title>Tweaks</title> |
| 161 | +<para>...</para> |
126 | 162 | </sect3> |
127 | 163 | |
128 | 164 | <sect3> |
129 | 165 | <title>Source Descriptor</title> |
| 166 | +<para>...syntax...</para> |
| 167 | +<para>...syntax in beanshell...</para> |
| 168 | + <note><para>In parameter names, "-" and "_" are interchangable.</para></note> |
| 169 | + <variablelist><title>Source Descriptor Parameters</title> |
| 170 | + |
| 171 | + <varlistentry> |
| 172 | + <term> |
| 173 | + <parameter>association-annotation-field</parameter> |
| 174 | + </term> |
| 175 | + <listitem> |
| 176 | + <para>The field/column that contains the annotation string. The annotation could be any additional info attached to a mapping. Used with <classname>BuildConceptMappings</classname>.</para> |
| 177 | + </listitem> |
| 178 | + </varlistentry> |
| 179 | + |
| 180 | + <varlistentry> |
| 181 | + <term> |
| 182 | + <parameter>association-value-field</parameter> |
| 183 | + </term> |
| 184 | + <listitem> |
| 185 | + <para>The field/column that contains the association value. That is the value that was used to derive the association. Used with <classname>BuildConceptAssociations</classname>.</para> |
| 186 | + </listitem> |
| 187 | + </varlistentry> |
| 188 | + |
| 189 | + <varlistentry> |
| 190 | + <term> |
| 191 | + <parameter>association-weight-field</parameter> |
| 192 | + </term> |
| 193 | + <listitem> |
| 194 | + <para>The field/column that contains the association weight. The weight may be used for filtering. Used with <classname>BuildConceptAssociations</classname> as well as <classname>BuildConceptMappings</classname> and <classname>FilterConceptMappings</classname>.</para> |
| 195 | + </listitem> |
| 196 | + </varlistentry> |
| 197 | + |
| 198 | + <varlistentry> |
| 199 | + <term> |
| 200 | + <parameter>authority</parameter> |
| 201 | + </term> |
| 202 | + <listitem> |
| 203 | + <para>The name of an external authority; the authority name serves as the namespace for foreign property names and foreign entity IDs. Instead of setting <parameter>authority</parameter> to a fixed value, it can also be taken from a data field/column spcified by <parameter>foreign-authority-field</parameter>.</para> |
| 204 | + </listitem> |
| 205 | + </varlistentry> |
| 206 | + |
| 207 | + <varlistentry> |
| 208 | + <term> |
| 209 | + <parameter>concept-fields</parameter> |
| 210 | + </term> |
| 211 | + <listitem> |
| 212 | + <para>The list of fields names that will be taken to belong to the concept (mapping target resp. object) when building an <classname>Association</classname> from a single <classname>FeatureSet</classname> instance. Used with <classname>BuildConceptAssociations</classname>.</para> |
| 213 | + </listitem> |
| 214 | + </varlistentry> |
| 215 | + |
| 216 | + <varlistentry> |
| 217 | + <term> |
| 218 | + <parameter>concept-id-field</parameter> |
| 219 | + </term> |
| 220 | + <listitem> |
| 221 | + <para>???</para> |
| 222 | + </listitem> |
| 223 | + </varlistentry> |
| 224 | + |
| 225 | + <varlistentry> |
| 226 | + <term> |
| 227 | + <parameter>concept-name-field</parameter> |
| 228 | + </term> |
| 229 | + <listitem> |
| 230 | + <para>???</para> |
| 231 | + </listitem> |
| 232 | + </varlistentry> |
| 233 | + |
| 234 | + <varlistentry> |
| 235 | + <term> |
| 236 | + <parameter>concept-property-field</parameter> |
| 237 | + </term> |
| 238 | + <listitem> |
| 239 | + <para>???</para> |
| 240 | + </listitem> |
| 241 | + </varlistentry> |
| 242 | + |
| 243 | + <varlistentry> |
| 244 | + <term> |
| 245 | + <parameter>concept-property-freq-field</parameter> |
| 246 | + </term> |
| 247 | + <listitem> |
| 248 | + <para>???</para> |
| 249 | + </listitem> |
| 250 | + </varlistentry> |
| 251 | + |
| 252 | + <varlistentry> |
| 253 | + <term> |
| 254 | + <parameter>concept-property-source-field</parameter> |
| 255 | + </term> |
| 256 | + <listitem> |
| 257 | + <para>???</para> |
| 258 | + </listitem> |
| 259 | + </varlistentry> |
| 260 | + |
| 261 | + <varlistentry> |
| 262 | + <term> |
| 263 | + <parameter>console-encoding</parameter> |
| 264 | + </term> |
| 265 | + <listitem> |
| 266 | + <para>???</para> |
| 267 | + </listitem> |
| 268 | + </varlistentry> |
| 269 | + |
| 270 | + <varlistentry> |
| 271 | + <term> |
| 272 | + <parameter>csv-backslash-escape</parameter> |
| 273 | + </term> |
| 274 | + <listitem> |
| 275 | + <para>???</para> |
| 276 | + </listitem> |
| 277 | + </varlistentry> |
| 278 | + |
| 279 | + <varlistentry> |
| 280 | + <term> |
| 281 | + <parameter>csv-chunker</parameter> |
| 282 | + </term> |
| 283 | + <listitem> |
| 284 | + <para>???</para> |
| 285 | + </listitem> |
| 286 | + </varlistentry> |
| 287 | + |
| 288 | + <varlistentry> |
| 289 | + <term> |
| 290 | + <parameter>csv-separator</parameter> |
| 291 | + </term> |
| 292 | + <listitem> |
| 293 | + <para>???</para> |
| 294 | + </listitem> |
| 295 | + </varlistentry> |
| 296 | + |
| 297 | + <varlistentry> |
| 298 | + <term> |
| 299 | + <parameter>csv-skip-bad-rows</parameter> |
| 300 | + </term> |
| 301 | + <listitem> |
| 302 | + <para>???</para> |
| 303 | + </listitem> |
| 304 | + </varlistentry> |
| 305 | + |
| 306 | + <varlistentry> |
| 307 | + <term> |
| 308 | + <parameter>csv-skip-header</parameter> |
| 309 | + </term> |
| 310 | + <listitem> |
| 311 | + <para>???</para> |
| 312 | + </listitem> |
| 313 | + </varlistentry> |
| 314 | + |
| 315 | + <varlistentry> |
| 316 | + <term> |
| 317 | + <parameter>defaults</parameter> |
| 318 | + </term> |
| 319 | + <listitem> |
| 320 | + <para>???</para> |
| 321 | + </listitem> |
| 322 | + </varlistentry> |
| 323 | + |
| 324 | + <varlistentry> |
| 325 | + <term> |
| 326 | + <parameter>encoding</parameter> |
| 327 | + </term> |
| 328 | + <listitem> |
| 329 | + <para>???</para> |
| 330 | + </listitem> |
| 331 | + </varlistentry> |
| 332 | + |
| 333 | + <varlistentry> |
| 334 | + <term> |
| 335 | + <parameter>field-chunkers</parameter> |
| 336 | + </term> |
| 337 | + <listitem> |
| 338 | + <para>???</para> |
| 339 | + </listitem> |
| 340 | + </varlistentry> |
| 341 | + |
| 342 | + <varlistentry> |
| 343 | + <term> |
| 344 | + <parameter>fields</parameter> |
| 345 | + </term> |
| 346 | + <listitem> |
| 347 | + <para>???</para> |
| 348 | + </listitem> |
| 349 | + </varlistentry> |
| 350 | + |
| 351 | + <varlistentry> |
| 352 | + <term> |
| 353 | + <parameter>file</parameter> |
| 354 | + </term> |
| 355 | + <listitem> |
| 356 | + <para>???</para> |
| 357 | + </listitem> |
| 358 | + </varlistentry> |
| 359 | + |
| 360 | + <varlistentry> |
| 361 | + <term> |
| 362 | + <parameter>file-format</parameter> |
| 363 | + </term> |
| 364 | + <listitem> |
| 365 | + <para>???</para> |
| 366 | + </listitem> |
| 367 | + </varlistentry> |
| 368 | + |
| 369 | + <varlistentry> |
| 370 | + <term> |
| 371 | + <parameter>foreign-authority-field</parameter> |
| 372 | + </term> |
| 373 | + <listitem> |
| 374 | + <para>???</para> |
| 375 | + </listitem> |
| 376 | + </varlistentry> |
| 377 | + |
| 378 | + <varlistentry> |
| 379 | + <term> |
| 380 | + <parameter>foreign-fields</parameter> |
| 381 | + </term> |
| 382 | + <listitem> |
| 383 | + <para>The list of fields names that will be taken to belong to the foreign entity (mapping source resp. subject) when building an <classname>Association</classname> from a single <classname>FeatureSet</classname> instance. Used with <classname>BuildConceptAssociations</classname>.</para> |
| 384 | + </listitem> |
| 385 | + </varlistentry> |
| 386 | + |
| 387 | + <varlistentry> |
| 388 | + <term> |
| 389 | + <parameter>foreign-id-field</parameter> |
| 390 | + </term> |
| 391 | + <listitem> |
| 392 | + <para>Field containing an ID unique in the context of the foreign authority, used to identify foreign entities.</para> |
| 393 | + </listitem> |
| 394 | + </varlistentry> |
| 395 | + |
| 396 | + <varlistentry> |
| 397 | + <term> |
| 398 | + <parameter>foreign-name-field</parameter> |
| 399 | + </term> |
| 400 | + <listitem> |
| 401 | + <para>Field containing a display name for foreign entities. Defaults to the value of <parameter>foreign-id-field</parameter>.</para> |
| 402 | + </listitem> |
| 403 | + </varlistentry> |
| 404 | + |
| 405 | + <varlistentry> |
| 406 | + <term> |
| 407 | + <parameter>foreign-property-field</parameter> |
| 408 | + </term> |
| 409 | + <listitem> |
| 410 | + <para>Field containing the name of the property of a foreign entity. Used with <classname>BuildConceptAssociations</classname>.</para> |
| 411 | + </listitem> |
| 412 | + </varlistentry> |
| 413 | + |
| 414 | + <varlistentry> |
| 415 | + <term> |
| 416 | + <parameter>mapping-filter</parameter> |
| 417 | + </term> |
| 418 | + <listitem> |
| 419 | + <para>Instance of <classname>MappingCandidateFilter</classname> to use for filtering. Used with <classname>FilterConceptMappings</classname>. |
| 420 | + Overrides <property>mapping-filter-aggregator</property>, <property>mapping-filter-field</property>, <property>mapping-filter-scorer</property>, <property>mapping-filter-selector</property>, etc. |
| 421 | + </para> |
| 422 | + </listitem> |
| 423 | + </varlistentry> |
| 424 | + |
| 425 | + <varlistentry> |
| 426 | + <term> |
| 427 | + <parameter>mapping-filter-aggregator</parameter> |
| 428 | + </term> |
| 429 | + <listitem> |
| 430 | + <para>Instance of <classname>Functor2</classname> to use as an aggregator to aggregate multiple score values into one. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property>, <parameter>mapping-filter-scorer</parameter> and <parameter>mapping-filter-accessor</parameter>.</para> |
| 431 | + </listitem> |
| 432 | + </varlistentry> |
| 433 | + |
| 434 | + <varlistentry> |
| 435 | + <term> |
| 436 | + <parameter>mapping-filter-aggregator-function</parameter> |
| 437 | + </term> |
| 438 | + <listitem> |
| 439 | + <para>Name of the aggregator function used to merge multiple score values into one; Must be either "max" or "sum", default is "sum". Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property>, <parameter>mapping-filter-scorer</parameter>, <parameter>mapping-filter-accessor</parameter> and <parameter>mapping-filter-aggregator</parameter>.</para> |
| 440 | + </listitem> |
| 441 | + </varlistentry> |
| 442 | + |
| 443 | + <varlistentry> |
| 444 | + <term> |
| 445 | + <parameter>mapping-filter-field</parameter> |
| 446 | + </term> |
| 447 | + <listitem> |
| 448 | + <para>The field to take the score for a candidate <classname>FeatureSet</classname> from. Used with <classname>FilterConceptMappings</classname>. Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property>, <parameter>mapping-filter-scorer</parameter> and <parameter>mapping-filter-accessor</parameter>.</para> |
| 449 | + </listitem> |
| 450 | + </varlistentry> |
| 451 | + |
| 452 | + <varlistentry> |
| 453 | + <term> |
| 454 | + <parameter>mapping-filter-field-accessor</parameter> |
| 455 | + </term> |
| 456 | + <listitem> |
| 457 | + <para>Instance of <classname>PropertyAccessor</classname> to extract the score from a candidate <classname>FeatureSet</classname>. Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property>, <property>mapping-selector</property> and <parameter>mapping-filter-scorer</parameter>.</para> |
| 458 | + </listitem> |
| 459 | + </varlistentry> |
| 460 | + |
| 461 | + <varlistentry> |
| 462 | + <term> |
| 463 | + <parameter>mapping-filter-scorer</parameter> |
| 464 | + </term> |
| 465 | + <listitem> |
| 466 | + <para>Instance of <classname>MappingCandidateScorer</classname> to use when filtering (for best candidate, or by threshold). Used with <classname>FilterConceptMappings</classname>. Overridden by <property>mapping-filter</property> and <property>mapping-selector</property>, |
| 467 | + overrides <property>mapping-filter-aggregator</property>, <property>mapping-filter-field</property> etc. |
| 468 | + </listitem> |
| 469 | + </varlistentry> |
| 470 | + |
| 471 | + <varlistentry> |
| 472 | + <term> |
| 473 | + <parameter>mapping-filter-threshold</parameter> |
| 474 | + </term> |
| 475 | + <listitem> |
| 476 | + <para>Threshold value to use when filtering mapping candidates. If set, all candidates with a score equal to or better than the threshold will pass. |
| 477 | + Otherwise, the one candidate with the best score will pass. Used with <classname>FilterConceptMappings</classname>. |
| 478 | + Overridden by <property>mapping-filter</property> and <property>mapping-selector</property>. |
| 479 | + </listitem> |
| 480 | + </varlistentry> |
| 481 | + |
| 482 | + <varlistentry> |
| 483 | + <term> |
| 484 | + <parameter>mapping-selector</parameter> |
| 485 | + </term> |
| 486 | + <listitem> |
| 487 | + <para>Instance of <classname>MappingCandidateSelector</classname> to use for filtering. Used with <classname>FilterConceptMappings</classname>. |
| 488 | + Overridden by <property>mapping-filter</property>, overrides <property>mapping-filter-aggregator</property>, <property>mapping-filter-field</property>, <property>mapping-filter-scorer</property> etc. |
| 489 | + </para> |
| 490 | + </listitem> |
| 491 | + </varlistentry> |
| 492 | + |
| 493 | + <varlistentry> |
| 494 | + <term> |
| 495 | + <parameter>property-fields</parameter> |
| 496 | + </term> |
| 497 | + <listitem> |
| 498 | + <para>The list of fields names that will be taken to belong to the association itself (mapping properties) when building an <classname>Association</classname> from a single <classname>FeatureSet</classname> instance. Used with <classname>BuildConceptAssociations</classname>.</para> |
| 499 | + </listitem> |
| 500 | + </varlistentry> |
| 501 | + |
| 502 | + <varlistentry> |
| 503 | + <term> |
| 504 | + <parameter>property-name-field</parameter> |
| 505 | + </term> |
| 506 | + <listitem> |
| 507 | + <para>???</para> |
| 508 | + </listitem> |
| 509 | + </varlistentry> |
| 510 | + |
| 511 | + <varlistentry> |
| 512 | + <term> |
| 513 | + <parameter>property-qualifier</parameter> |
| 514 | + </term> |
| 515 | + <listitem> |
| 516 | + <para>???</para> |
| 517 | + </listitem> |
| 518 | + </varlistentry> |
| 519 | + |
| 520 | + <varlistentry> |
| 521 | + <term> |
| 522 | + <parameter>property-subject-name-field</parameter> |
| 523 | + </term> |
| 524 | + <listitem> |
| 525 | + <para>???</para> |
| 526 | + </listitem> |
| 527 | + </varlistentry> |
| 528 | + |
| 529 | + <varlistentry> |
| 530 | + <term> |
| 531 | + <parameter>query</parameter> |
| 532 | + </term> |
| 533 | + <listitem> |
| 534 | + <para>???</para> |
| 535 | + </listitem> |
| 536 | + </varlistentry> |
| 537 | + |
| 538 | + <varlistentry> |
| 539 | + <term> |
| 540 | + queryGenerator |
| 541 | + </term> |
| 542 | + <listitem> |
| 543 | + <para>???</para> |
| 544 | + </listitem> |
| 545 | + </varlistentry> |
| 546 | + |
| 547 | + <varlistentry> |
| 548 | + <term> |
| 549 | + <parameter>source-table</parameter> |
| 550 | + </term> |
| 551 | + <listitem> |
| 552 | + <para>???</para> |
| 553 | + </listitem> |
| 554 | + </varlistentry> |
| 555 | + |
| 556 | + <varlistentry> |
| 557 | + <term> |
| 558 | + <parameter>sql-comment-subst</parameter> |
| 559 | + </term> |
| 560 | + <listitem> |
| 561 | + <para>???</para> |
| 562 | + </listitem> |
| 563 | + </varlistentry> |
| 564 | + |
| 565 | + <varlistentry> |
| 566 | + <term> |
| 567 | + <parameter>sql-manglers</parameter> |
| 568 | + </term> |
| 569 | + <listitem> |
| 570 | + <para>???</para> |
| 571 | + </listitem> |
| 572 | + </varlistentry> |
| 573 | + </variablelist> |
| 574 | + |
130 | 575 | </sect3> |
131 | 576 | |
132 | 577 | <sect3> |
133 | 578 | <title>Source Descriptor Defaults</title> |
| 579 | +<para>...</para> |
134 | 580 | </sect3> |
135 | 581 | |
136 | 582 | <sect3> |
137 | 583 | <title>Built-In Scripts</title> |
| 584 | +<para>...</para> |
138 | 585 | </sect3> |
139 | 586 | |
140 | 587 | </sect2> |
141 | 588 | |
142 | 589 | |
143 | | -</sect1> |
| 590 | +</sect1> |
144 | 591 | |
145 | 592 | </article> |