Georges Sauvet (CREAP Cartaihac, CNRS, Toulouse)

Georges Sauvet (CREAP Cartaihac, CNRS, Toulouse)

Symbolic and statistical Analyses of meta-data using the Semana platform a bundle of tools for the KDD research Georges Sauvet (CNRS, Toulouse) Centre de Recherche et dEtude de lArt Prhistorique UMR 5608: Travaux et Recherches Archologiques sur les Cultures, les Espaces et les Socits QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. CASK Sorbonne 2008, Paris, June 13th SEMANA and Data Mining interpretation Data warehouse sampling Data coding KDD techniques (Rough Set, FCA, statistical analysis, etc.) After B. Wthrich, 1998 SEMANA, a bundle of tools aimed at makink these tasks easier Architecture of the SEMANA platform A software bundle written in Transcript, the programming language of Revolution Standalone applications for Macintosh and Windows Dynamic DynamicDB DBBuilder Builder Data sheets Data coding Data storage Attribute AttributeEditor Editor Discretization Logical scaling

Tree TreeBuilder BuilderAssistant Assistant Aid to code structuration Tables (various formats) Multi-valued tables Rough Rough Set SetTheory Theory Upper approx. Lower approx. Reducts, Core Discriminating power (Pawlak) Decision DecisionLogic Logic Minimal rules Attribute strength (Bolc, Cytowski and Stacewicz) One-valued tables Formal FormalConcept Concept Analysis Analysis Galois lattice central concepts (Wille, Ganter) Statistical Statisticaltools tools Correlation Matrix Correspondence Factor Analysis, Hierarchical Classifications (Benzecri) Working with the SEMANA platform SEMANA is twofold: 1)

Tools for Intelligent Database Designing => Dynamic DB Builder providing statistical information about the use of AV suggesting iterative restructuration of AV 2) Tools for KDD research : integration of RST, FCA, Statistical Data Analyses Three illustrations: Ten-ta-to: the proximal deictic adjectives in Polish The category of Aspect in Polish Representations of women in Palaeolithic Art Case 1: the Proximal Deictic Adjectives in Polish The proximal deictic adjectives in Polish In Polish School Grammar, the adjective declension consists in the amalgamation of three morphological categories. Case = {Nominative, Accusative, Genitive, Dative, Instrumental, Locative} Number = {singular, plural} Gender = In Polish Linguistics (cf. SALONI, Z. 1976), up to 7 gender classes have been proposed: Singular : 1. feminine 2. neuter 3. animal masculine (animal corresponds to the feature animate in other European languages descriptions) 4. non animal masculine Plural : personal masculine (personal corresponds to the feature human) non personal masculine 1. pluralia tantum (defective nouns with no singular form). The proximal deictic adjectives in Polish The root of these adjectives is a single phoneme t-. 13 forms are used: ten, ta, to, tym, tymi, tych, te, te*,temu, tej, tego, ta*,ci Examples (only Nominative case) Polish Singular

ten dom ten pies ten pan ta deska ta g ta pani to piro to kurcz to dziecko ... English translation Plural Masculine Feminine Neuter te domy te psy ci panowie this/these house(s) this/these dog(s) this/these sir(s) te deski te gsi te panie this/these board(s) this/these goose/geese this/these lady/ladies te pira te kurczta te dzieci ... this/these feather(s) this/these chicken(s) this/these child/children ... The proximal deictic adjectives in Polish In order to elucidate the problem of Gender in Polish noun morphology, H. and A. Wlodarczyk have built a database of usages of the proximal deictic adjectives. As the 7 sub-genders of Polish School Grammars neither correspond to

any known semantic or ontological categories nor to any known grammatical sub-gender in other languages, they proposed to split the sub-genders of the Gender attribute into three attributes : gender = {feminine, neuter, masculine) animacy = {animate, inanimate} humanity = {human, non_human} TENTATO: database first version morpheme sample attribute, value (features chosen for each entry) Objects Objects == 108 108 TENTATO: database first version Distinct objects Distinct objects == 108 108 Duplicates Duplicates == 00 Duplicate Duplicate ratio ratio == 00 An AV Table is automatically collected Attributes Attributes == 55 (with (with resp. resp. 6,2,3,2,2 6,2,3,2,2 values) values) NB: NB: in in this this calculation, calculation, non-used non-used attributes attributes (*) (*) have have been

been replaced replaced by by aa null null value value ('nAtt') ('nAtt') The program suggests the possibility to merge these attributes The program indicates that the pair {inanimate-human} does not exist (for obvious reason) ================================================== ================================================== Theoretical Theoretical Number Number of of Combinations Combinations == 144 144 Apparent Saturation Index : 75% Apparent Saturation Index : 75% ================================================== ================================================== The The following following pairs pairs of of attributes attributes could could be be merged: merged: [hum|ina] Confidence index = 99.9% [hum|ina] Confidence index = 99.9% [hum|nhu] [hum|nhu] Confidence Confidence index index == 99.9% 99.9%

[ina|nhu] [ina|nhu] Confidence Confidence index index == 99.9% 99.9% ================================================== ================================================== STATISTICAL STATISTICAL USE USE OF OF AV AV Attr Value occur Attr Value occur Ani anim 72 Ani anim 72 Ani inanim 36 Ani inanim 36 Case A 18 Case A 18 Case DD 18 Case 18 Case GG 18 Case 18 Case II 18 Case 18 Case

LL 18 Case 18 Case NN 18 Case 18 Gnd fem 36 Gnd fem 36 Gnd masc 36 Gnd masc 36 Gnd neu 36 Gnd neu 36 Hum hum 36 Hum hum 36 Hum nhum 72 Hum nhum 72 Nb plur 54 Nb plur 54 Nb sing 54 Nb sing 54 ==================================================

================================================== Non-Attested Non-Attested Pairs Pairs of of Values Values == 11 ina,hum,2,4 ina,hum,2,4 --------------------------------------------------------------------------------------------------Assuming Assuming that that all all non-attested non-attested pairs pairs are are impossible: impossible: Maximum Maximum number number of of combinations combinations == 108 108 Corrected Saturation Corrected Saturation Index Index :: 100% 100% --------------------------------------------------------------------------------------------------- TENTATO (Version 1): Formal Concept Analysis TENTATO complete lattice Test of dependence Inanimate depends on non human Human depends on animate simplified lattice Total Total Dependence Dependence ina ina => => nhu nhu (36/36) (36/36) hum => an

(36/36) hum => an (36/36) none none TENTATO: second version Objects Objects == 108 108 Distinct Distinct objects objects == 108 108 Duplicates Duplicates == 00 Duplicate Duplicate ratio ratio == 00 Attributes Attributes == 44 (with (with resp. resp. 3,6,3,2 3,6,3,2 values) values) NB: NB: in in this this calculation, calculation, non-used non-used attributes attributes (*) (*) have have been been replaced replaced by by aa null null value value ('nAtt') ('nAtt') ================================================== ================================================== Theoretical Theoretical Number Number of of Combinations Combinations == 108 108 Apparent Saturation Index

Apparent Saturation Index :: 100% 100% ================================================== ================================================== No No attributes attributes could could be be merged merged ================================================== ================================================== STATISTICAL STATISTICAL USE USE OF OF AV AV Attr Value occur Attr Value occur ANY human 36 ANY human 36 ANY inanimate 36 ANY inanimate 36 ANY nhuman 36 ANY nhuman 36 In a second trial, the attributes ANIMACY ({ANI}=[animate|inamimate]) and HUMANITY ({HUM}=[human|nhuman]) are merged into a three-valued attribute : {ANY}=[nhuman|inanimate|human] CAS CAS CAS CAS

CAS CAS CAS CAS CAS CAS CAS CAS accusative accusative dative dative genetive genetive instrumental instrumental locative locative nominative nominative 18 18 18 18 18 18 18 18 18 18 18 18 GND GND GND GND GND GND feminine feminine masculine masculine neuter neuter 36 36 36

36 36 36 No attribute merging is possible; all pairs of values are attested. NBR plural 54 NBR plural 54 NBR singular 54 NBR singular 54 ================================================== ================================================== Non-Attested Non-Attested Pairs Pairs of of Values Values == 00 ------------------------------------------------------------------------------------------------Assuming Assuming that that all all non-attested non-attested pairs pairs are are impossible: impossible: Maximum Maximum number number of of combinations combinations == 108 108 Corrected Corrected Saturation Saturation Index Index :: 100% 100% TENTATO: Formal Concept Analysis TENTATO TENTATO

complete lattice All the attributes at the same level : no hierarchy Inanimate depends on non human Human depends on Test of dependence => complete lattice simplified lattice simplified lattice Total Total Dependence Dependence Total Total Dependence Dependence ina ina => => nhu nhu (36/36) (36/36) hum => an (36/36) hum => an (36/36) none none none none none none TENTATO-2: Rough Set Theory and Minimal Rules A procedure derived from Rough Set Theory allows us to calculate the minimal rules (i.e. the values of the attributes which condition the morpheme to be used) r1 (9) : CASdat,NBRplu --> tym r2 (3) : CASins,GNDmas,NBRsin --> tym r3 (3) : CASins,GNDneu,NBRsin --> tym

r4 (3) : CASloc,GNDmas,NBRsin --> tym r5 (3) : CASloc,GNDneu,NBRsin --> tym r20 (1) : CASacc,ANYina,GNDmas,NBRsin --> ten r21 (3) : CASnom,GNDmas,NBRsin --> ten r6 (9) : CASins,NBRplu --> tymi r24 (3) : CASdat,GNDfem,NBRsin --> tej r25 (3) : CASgen,GNDfem,NBRsin --> tej r26 (3) : CASloc,GNDfem,NBRsin --> tej r7 (1) : CASacc,ANYhum,GNDmas,NBRplu --> tych r8 (9) : CASgen,NBRplu --> tych r9 (9) : CASloc,NBRplu --> tych r10 (3) : CASacc,GNDneu,NBRsin --> to r11 (3) : CASnom,GNDneu,NBRsin --> to r12 (3) : CASacc,ANYina,NBRplu --> te r13 (3) : CASacc,ANYnhu,NBRplu --> te r14 (3) : CASacc,GNDfem,NBRplu --> te r15 (3) : CASacc,GNDneu,NBRplu --> te r16 (3) : CASnom,ANYina,NBRplu --> te r17 (3) : CASnom,ANYnhu,NBRplu --> te r18 (3) : CASnom,GNDfem,NBRplu --> te r19 (3) : CASnom,GNDneu,NBRplu --> te r22 (3) : CASdat,GNDmas,NBRsin --> temu r23 (3) : CASdat,GNDneu,NBRsin --> temu r27 (1) : CASacc,ANYhum,GNDmas,NBRsin --> tego r28 (1) : CASacc,ANYnhu,GNDmas,NBRsin --> tego r29 (3) : CASgen,GNDmas,NBRsin --> tego r30 (3) : CASgen,GNDneu,NBRsin --> tego r31 (3) : CASacc,GNDfem,NBRsin --> te* r32 (3) : CASnom,GNDfem,NBRsin --> ta r33 (3) : CASins,GNDfem,NBRsin --> ta* r34 (1) : CASnom,ANYhum,GNDmas,NBRplu --> ci The 108 distinct objects of the DB can be described by only 34 morphological rules. Note that CAS and NBR are required in every rule, GND in 26/34 and ANY in only 9/34. TENTATO-2: Statistical analysis The Multi-valued Table is unfolded in a One-value Table... and the One-value Table is transformed in a Burts Table A Burts Table is a square symmetrical table giving the number

of cooccurrences of the attributes TENTATO-2: Correspondence Factor Analysis (CFA) te te te* te* tego tego tej tej tem tem ten ten to to tych tych tym tym tymi tymi 88 33 22 00 00 11 33 11 00 00 00 00 00 33 66 00 00 00 99 00 00 00 66 33 00 00 00 99 00 00 00 00 00 00 00 00 00 00 66 99 00 00 00 33 00 00 00 99 66 00 88 00 00 00 00 33 33 00 00 00 44 11 33 33 22 11 22 77 00 00 66 11 22 33 22 22 22 66 77 33 66 11 33 33 22 11 22 66 77 33 66 33 00 99 00 00 00 66 33 33 44 00 55 00 33 44 00 77 99 33 66 00 33 00 33 00 66 66 99 33 16 16 00 00 00 00 00 00 19 19 99 99 00 33 88 99 66 44 66 00 12 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 16 00 00 00 00 00 00 00 00 00 00 33 00 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 99 00 00 00 00 00 00 00 00 00 00 66 00 00 00 00 00 00 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 66 00 00 00 00 00 00 00 00 00 00 19 19 00 00 00 00 00 00 00 00 00 00 21 21 00 00 00 00 00 00 00 00 00 00 99 80 80 15

15 40 40 45 45 30 30 20 20 30 30 95 95 105 105 45 45 Numbers in the Table are considered as coordinates of points in a N-dimensional space. z F3 F2 ta ta ta* ta* 00 00 00 00 00 00 00 33 00 00 33 00 11 11 11 11 11 11 33 33 00 00 00 00 00 00 33 33 00 00 33 00 00 33 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 15 15 15 15 F1 BURT BURT TABLE TABLE acc acc dat dat gen gen ins ins loc loc nom nom hum hum ina ina nhu nhu fem fem mas mas neu neu plu plu sin sin ci ci acc acc 18 18 00 00 00 00 00 66 66 66 66 66 66 99 99 00 dat dat 00 18 18 00 00 00 00 66 66 66 66 66 66 99 99 00 gen gen 00 00 18 18 00 00 00 66 66 66 66 66 66 99 99 00 ins ins 00 00 00 18 18 00 00 66 66 66 66 66 66 99 99 00 loc loc 00 00 00 00 18 18 00 66 66 66 66 66 66 99 99 00 nom nom 00 00 00 00 00 18

18 66 66 66 66 66 66 99 99 11 hum 6 6 6 6 6 hum 6 6 6 6 6 66 36 36 00 00 12 12 12 12 12 12 18 18 18 18 11 ina ina 66 66 66 66 66 66 00 36 36 00 12 12 12 12 12 12 18 18 18 18 00 nhu nhu 66 66 66 66 66 66 00 00 36 36 12 12 12 12 12 12 18 18 18 18 00 fem fem 66 66 66 66 66 66 12 12 12 12 12 12 36 36 00 00 18 18 18 18 00 mas 6 6 6 6 6 6 12 12 12 0 36 0 18

18 mas 6 6 6 6 6 6 12 12 12 0 36 0 18 18 11 neu neu 66 66 66 66 66 66 12 12 12 12 12 12 00 00 36 36 18 18 18 18 00 plu plu 99 99 99 99 99 99 18 18 18 18 18 18 18 18 18 18 18 18 54 54 00 11 sin sin 99 99 99 99 99 99 18 18 18 18 18 18 18 18 18 18 18 18 00 54 54 00 ci ci 00 00 00 00 00 11 11 00 00 00 11 00 11 00 11 ta ta 00 00 00 00 00 33 11 11 11 33 00 00 00 33 00 ta* ta* 00 00 00 33 00 00 11 11 11 33 00 00 00 33 00 te te 88 00 00 00 00 88 44 66 66 66 44 66 16 16 00 00 te* te* 33 00 00 00 00 00 11 11 11 33 00 00 00 33 00 tego tego 22 00 66 00 00 00 33 22 33 00 55 33 00 88 00 tej tej 00 33 33 00 33 00 33 33 33 99 00 00 00 99 00 tem tem 00 66 00 00 00 00 22 22 22 00 33 33 00 66 00 ten ten 11 00 00 00 00 33 11 22 11 00 44 00 00 44 00 to to 33 00 00 00 00 33 22 22 22 00 00 66 00 66 00 tych tych 11 00 99 00 99 00 77 66 66 66 77 66 19 19 00 00

tym tym 00 99 00 66 66 00 77 77 77 33 99 99 99 12 12 00 tymi 0 0 0 9 0 0 3 3 3 3 3 3 9 tymi 0 0 0 9 0 0 3 3 3 3 3 3 9 00 00 FJ FJ 90 90 90 90 90 90 90 90 90 90 90 90 180 180 180 180 180 180 180 180 180 180 180 180 270 270 270 270 55 y x CFA calculates the axes of inertia of the cloud of points (F1, F2, F3 ) and displaysprojections in planes [F1,F2], [F1,F3], etc. CFA is implemented inSemana TENTATO-2: Correspondence Factor Analysis (CFA) Coordinate of object J on factor 1 Note that the quality of the description of

attribute animacy is very poor: these elements have no contribution to the first 4 factors. Note that the number (singular/plural) has the highest contrib. to axis 1 Contribution of factor 1 to the description of object J Contribution of object J to the definition of factor 1 CLOUD CLOUD JJ FREQ FREQ QLT QLT INR INR || F#1 F#1 COR COR CTR CTR || F#2 F#2 COR COR CTR CTR || F#3 F#3 COR COR CTR CTR || F#4 F#4 COR COR CTR CTR || acc 33 33 11 || -745 2 1 | -48 22 11 || acc 33 397 397 40 40 || -70 -70 -745 391 391 120

120 || 47 47 -48 dat 33 66 22 || 489 dat 33 588 588 42 42 || 643 643 271 271 88 88 || 483 483 153 153 50 50 || 97 97 489 157 157 66 66 || gen 33 00 00 || 153 55 || -870 gen 33 596 596 40 40 || -15 -15 153 16 16 -870 522 522 186 186 || -290 -290 58 58 23 23 || ins 33 ins 33 869 869 48 48 || -362 -362 77 77 28 28 || 682 682 271 100 | 924 499 210 | -195 -195 22 22 11 11 ||

loc 33 33 || 366 88 33 || loc 33 326 326 35 35 || -117 -117 11 11 366 106 106 29 29 || -504 -504 201 201 62 62 || -103 -103 nom 33 44 11 || -938 66 || nom 33 633 633 44 44 || -78 -78 -938 556 556 190 190 || 306 306 59 59 23 23 || 147 147 14 14 hum 67 55 23 00 00 00 || 29 22 00 || -34 33 11 || 55 00 00 || hum 67 23 ||

29 -34 ina 67 55 23 00 00 || -26 22 00 || 34 33 11 || 66 00 00 || ina 67 23 || -1 -1 -26 34 nhu 67 00 22 11 00 00 || -3 00 00 || 00 00 00 || -11 00 00 || nhu 67 22 || -3 -11 fem 67 11 00 || -26 11 00 || 85 44 || -669 fem 67 768 768 33 33 || 20 20 -26 85 12

12 -669 754 754 247 247 || mas 67 245 28 | -9 0 0 | 56 6 1 | -97 19 5 | 332 220 mas 67 245 28 | -9 0 0 | 56 6 1 -97 19 5 | 332 220 61 61 || neu 67 00 00 || -29 22 00 || 12 00 00 || 337 neu 67 232 232 28 28 || -11 -11 -29 12 337 229

229 63 63 || plu 100 55 11 || -68 33 || 108 plu 100 873 873 30 30 || -546 -546 823 823 189 189 || 43 43 -68 13 13 108 32 32 10 10 || sin 100 55 11 || 68 33 || -108 sin 100 873 873 30 30 || 546 546 823 823 189 189 || -43 -43 68 13 13 -108 32 32 10 10 || ci 22 76 55 || -841 8 || 128 11 00 || 804 ci 76 36 36 || -644 -644 18 18 -841 30 30 128

804 28 28 10 10 || ta 66 276 99 |-1046 ta 276 40 40 || 497 497 29 29 |-1046 127 127 39 39 || 545 545 35 35 12 12 || -856 -856 85 85 34 34 || ta* 66 445 55 22 || 635 ta* 445 40 40 || 207 207 635 47 47 15 15 || 1278 1278 190 190 67 67 |-1321 |-1321 203 203 80 80 || te 30 55 || 156 66 || te 30 651 651 44 44 || -630 -630 225 225 75 75 || -839 -839 399 399 135 135 || 148 148 12

12 156 14 14 te* 66 265 99 || -845 77 22 |-1121 te* 265 40 40 || 505 505 30 30 -845 83 83 26 26 || 237 237 |-1121 146 146 58 58 || tego 15 22 11 || -751 00 00 || tego 15 249 249 42 42 || 516 516 79 79 25 25 || -91 -91 -751 167 167 62 62 || -6 -6 tej 17 8 || -324 tej 17 588 588 42 42 || 749 749 187 187 60 60 || 274 274 25 25 -324 35 35 13

13 |-1012 |-1012 341 341 142 142 || temu 11 44 22 || 973 temu 11 559 559 44 44 || 1200 1200 306 306 102 102 || 469 469 47 47 16 16 || 145 145 973 201 201 87 87 || ten 77 208 44 || 440 ten 208 39 39 || 469 469 34 34 10 10 || -917 -917 132 132 40 40 || 262 262 11 11 440 30 30 12 12 || to 11 88 || 378 to 11 308 308 41 41 || 468 468 50 50 15 15 || -948 -948 204 204 65 65 || 305

305 21 21 378 32 32 13 13 || tych 35 55 22 || tych 35 812 812 44 44 || -624 -624 263 263 87 87 || 264 264 47 47 16 16 || -858 -858 498 498 191 191 || -86 -86 tym 39 99 || 408 tym 39 481 481 35 35 || 214 214 43 43 11 11 || 527 527 256 256 70 70 || 174 174 28 28 408 154 154 54 54 || tymi 17 44 22 || tymi 17 726 726 47 47 || -924 -924 251 251 91 91 || 752

752 166 166 61 61 || 1016 1016 304 304 127 127 || -119 -119 Output by Stat-3 TENTATO-2: CFA representation in plane [1,2] Axis 2 Morphemes are widely spread over plane [1,2] Axis 2 separates syntactic relators (CASE) => {nom,acc} vs {gen,loc,dat, ins} Axis 1 ANIMACY & GENDER are not differenciated on axes 1 and 2 Axis 1 separates NUMBER => singular vs plural Output by Stat-3 TENTATO-2: Axis 1 separates quantifiers Axis 2 Morphemes strictly associated to plural: => ci, te, tych, tymi One exception: tym may be either singular or plural plural singular Axis 1 Morphemes strictly associated to singular: => ta, to, ten, te*, tego, tej, temu, ta* Output by Stat-3 TENTATO-2: Axis 2 separates syntactic relators Axis 2 Morphemes strictly associated to genitive, locative, dative and/or instrumental: => tej, tych, temu, tymi, ta*, tymi ins

dat loc gen Axis 1 Morphemes strictly associated to nominative and/or accusative: => ta, to, ten, te*, ci, te One exception: tego may be either accusative or genitive acc nom Output by Stat-3 TENTATO-2: Axis 3 separates {gen, loc} vs {inst]} Axis 3 Morphemes tymi, ta* strictly associated to instrumental ins One exception: tym may be either instrumental or locative nom dat acc loc Axis 1 Morphemes tych, tego, tej strictly associated to genitive or locative gen Output by Stat-3 TENTATO-2: Axis 4 separates gender {fem} vs (mas, neu} Axis 4 Morphemes ta*, te*,tej, ta strictly associated to feminine fem

One exception: tym may be associated to any gender Axis 1 Note that the attribute [ANIMACY]={human, nhuman, inanimate} is still not differenciated on axis 4. mas neu Morphemes tego,to, ten, temu, ci strictly associated to masculine or neutral Output by Stat-3 TENTATO-2: Animacy appears only on axis 9 !!! Axis 9 Morpheme ci strictly associated to human Axis 1 hum nhu ina Output by Stat-3 TENTATO-2: CFA and Minimal Rules (RST) Axis (% inertia) NUMBER (36/36 rules) CASE (36/36 rules) GENDER (26/36 rules) ANIMACY (9/36 rules) Axis 1 (13.05%)

singular plural . Axis 2 (12.81%) nom, acc gen,loc,dat,inst . Axis 3 (11.27%) gen,loc (dat) inst . Axis 4 (10.0%) feminine masculine . .. . Axis 9 (4.35%) human nhum,ina . The relative strength of the attributes is revealed both by their contribution to the axes of inertia in Factor Analysis and by their weight in Minimal Rules. Case 2: the category of Aspect in Polish A Database built with Dynamic DB-Builder A classical data sheet to fill for each specimen the grammatical form of each specimen is used as index Attributes and values are chosen in a list and the resulting AVs appear in a field A test of consistency Each specimen is characterized by a set of AV and by its grammatical form (used as

index). It may be written as a rule : the grammatical form of each specimen is used as index if {given set of AV} then index This allows index inconsistencies to be detected (a test of consistency is provided in Semana) A test of consistency Each specimen is characterized by a set of AV and by its grammatical form (used as index). It may be written as a rule : the grammatical form of each specimen is used as index if {given set of AV} then index This allows index inconsistencies to be detected (a test of consistency is provided in Semana) This is a warning to the expert: probably the AV do not describe properly the different aspectual situations! 9 different forms applying to exactly the same situation ? Polish Aspect using Dynamic DB Builder All specimens are automatically collected in a contingency table and statistics are reported. In this initial version, there was more than 2 millions of theoretical combinations and 9 pairs of attributes could be merged! Polish Aspect using Dynamic DB Builder Improvements by trials and errors DB version Distinct objects Number of attributes Number of theor. combin.

Number of merging attributes HW-Aspect-V1 61 12 2,064,384 9 HW-Aspect-V2 60 11 1,032,192 9 HW-Aspect-V3 77 11 829,000 6 HW-Aspect-V4 79 9 408,240 1 HW-Aspect-V5 79 8 136,080

1 HW-Aspect-V6 69 8 45,360 1 HW-Aspect-V7 74 8 61,440 0 HW-Aspect-V8 78 7 58,320 0 From Dynamic DB Builder to STAT-3 The multi-valued table is transformed into a one-valued table for STAT analyses Polish Aspect : Correspondence Factor Analysis axis 2 axis 1 Factor Analysis of the contingency table shows a clear Gutmanns effect (i.e. a sequential order of the attributes) Polish Aspect : Correspondence Factor Analysis Ascending Hierarchical Classification shows two well-defined classes Polish Aspect : Correspondence Factor Analysis

imperfective perfective A clear partition in two classes according to the attribute [VAL] = {perfective | imperfective} Polish Aspect : Correspondence Factor Analysis Gutmanns effect shows that attributes are sequentially ordered attribute MCMP (morph. comp.) : pip > ip > pp > pi >ii attribute MOD : parallel > sequential > trans > resume > stop > interrupt > keep > OffAndOn Polish Aspect: Correspondence Factor Analysis All these features require imperatively perfective VAL perfective perfective MCMP pip ip 00 CRE MOD ANA ITS TYP imperfective imperfective pp 00 pi 00 100 100

defnb nRe 00 par 00 00 resume 00 35 after finish enter start 00 00 00 decr incr 00 0 89 89 stop inter 0 end 00 33 33 keep 0 before nan begin 44 44 69

69 40 40 100 run 84 84 weak 28 29 29 OaO 60 strong ordPr ii ndefnb 30 30 seq trans 100 100 54 event 17 17 state 75 75 refPr 67 67 Distribution of features along the perfective-to-imperfective path

(% association with imperfective) Case 3 : Images of the Woman in Palaeolithic Art Images of the Woman in Palaeolithic Art Customized DB-builder: for each figure, AV are selected with check box buttons Raphalle Bourrillon, PhD, Univ.Toulouse-Le Mirail Images of the Woman in Palaeolithic Art CFA and HAC show three classes of representations Realist and slim Schematic / abstract Realist and fatty Detailed study of the schematic women representations CFA and HAC split the schematic feminine figures into five sub-classes Schematic / abstract Detailed study of the schematic women representations Formal concept analysis SEMANA : a bundle of tools for KDD research at hand in a single box FROM PREPROCESSING Building /Editing DB - Structuration of AV - Statistics - AV edition (merging, splitting, etc.) - Edition/conversion of tables in various formats TO MINING Complementary KDD procedures (RST, FCA ...) with special emphasis on the powerful tools of statistical data analyses (CFA, HAC) with applications in many domains (within and out of Linguistics!)

Recently Viewed Presentations

  • Lecture 1- Introduction

    Lecture 1- Introduction

    a storyboard, i.e. a cartoon-like series of scenes . a Powerpoint slide show. a video simulating the use of a system. a lump of wood (e.g. PalmPilot) a cardboard mock-up. a piece of software with limited functionality written in the...
  • Strings - PY4E

    Strings - PY4E

    Strings. Chapter 6. Python for Everybody. www.py4e.com. Note from Chuck. If you are using these materials, you can remove the UM logo and replace it with your own, but please retain the CC-BY logo on the first page as well...
  • The Conflict Begins - Joshua Kervin, M.Ed.

    The Conflict Begins - Joshua Kervin, M.Ed.

    Arrival at Concord. When the British arrived a Concord, they were in for a nasty surprise. Dr. Prescott had successfully warned the militia. All of the weapons and ammunition had been moved
  • Regional Haze Rule

    Regional Haze Rule

    SAGO. JOSH. PORE. SAGA. DOME. KAIS. LAVO. ... Arial Times New Roman Symbol Default Design Microsoft Excel Chart Microsoft Office Excel Chart Microsoft Photo Editor 3.0 Photo REGIONAL HAZE Progress Report CALIFORNIA CLASS 1 AREAS OVERVIEW BASIC REQUIREMENTS MONITOR DATA...
  • Political Cartoon Questions

    Political Cartoon Questions

    Examining Maps Overview = Subject, type of map (phy, political, thematic) Parts = Legend, scale, labels, shading, symbols, inserts Title = Often in legend, frequently includes location, date, and map theme Interrelationships = Patterns related to location and map content...
  • Using Quantum Numbers to Describe Electrons

    Using Quantum Numbers to Describe Electrons

    Using Quantum Numbers to Describe Electrons Clouds probable location for electrons Quantum Numbers Four numbers that describe the shape of the cloud. Described using the variables: n, l, m, s. "n" = the Principle Quantum Number(Energy Quantum #) Corresponds to...
  • Application of the Fast Gauss Transform to Option Pricing

    Application of the Fast Gauss Transform to Option Pricing

    Future work Extension to more general jump-diffusion asset price models variance gamma models stochastic volatility models Extensions to other types of exotic options options on two or more assets various path-dependent options Double-Exponential Fast Gauss Transform Algorithms for Pricing Discrete...
  • Engaging in Health Inequalities - Wessex Voices

    Engaging in Health Inequalities - Wessex Voices

    Engaging the NHS - bringing social context into General Practice "There is no requirement for GPs to provide reports or offer an opinion on incapacity for work to anyone else, such as Citizens Advice Bureau.