Participant Presentations Today: 1: 35 Benjamin Leinwand 1:40 Keerthi Anand Check Website Notes: For Schedule Can Use USB Device Or Remote Log-In, Set Up Before Class Have Title Page With: Your Name

Your Affiliation Data Visualization Overview of Visualization Methods: Curve Views (FDA) Scatterplot Matrices (Shows Relationships) Marginal Distributions Heat-Map Views Heat-Map Views Some Principles:

Population Structure Invisible w/ Random Order Ordering Rows and Columns Really Matters Clustering Columns & Rows is Crucial Heat-Map Views Some Principles: Color Scale is Also Critical

Distn of Values, with Usual Jitter Plot + Smooth Histo Equal Bin Spacing Toy Example Many Small Values & Few Large Ones Results in Very Poor Contrast

Heat-Map Views Natural Approach (to Skewed Distn): Log Transformation Same Toy Example Much Better Scaling, Using More of the Color Range

Much Better Contrast, Now See 4 Peaks Heat-Map Views Can Look at Either Original Or Log View Natural Approach (to Skewed Distn): Quantile Scaling (= #s in Each Bin)

Non-Equally Spaced Bins Same Toy Example Far More Contrast Good or Bad? Peaks Less Prominent Heat-Map Views

Color Choice When 0 Needs Highlighting A Recommendation ( Many): Important: Middle White Highlights 0 Shade Indicates Magnitude Red for <0? (Econ.) for >0? (Climatology)

Heat-Map Views Larger Example: , Will See OK For This Data Simple Approach Here: Study 1st Rows Looks Totally Random?

Have Done Both Column And Row Clustering Heat-Map Views Larger Example: , Revisit With Scatterplot Matrix View Conclude: Heat Map Worse Than Scatterplot

In This Case Colored 1st 100 & 2nd 100 Clear Cluster Structure Hidden by Noise in Heatmap View! Is This Real (or Artifact)? & Heat-Map Views Which Data View Is Best???

Simon Sheather Quote: Every Dog Has His Day Scatterplot Views Summarize Other Directions to Project On Classification Directions (e.g. DWD) Independent Component Analysis Fourier Basis Directions

Wavelet Basis Directions Nonnegative Matrix Factorization Might Discuss Later Distance Methods Given a metric

on the Object Space And Data Objects Define Distance Matrix: Distance Methods Frchet Sample Mean, Toy Example Move Around to Min Sum of Squared Dists Start With Candidate Point

Distance Methods Frchet Sample Mean, Toy Example Min Looks Like Sample Mean??? Recall Center Note: Outlier Pulls Mean Out of Of Mass Convex Hull Of Other 4 Points Interpretretation

Known Problem with Mean: Not Robust Against Outliers Distance Methods Frchet Sample Median, Toy Example Distance Depends on Coordinates, Called Manhattan Distance Result Is Also

Very Robust Distance Methods Recall General Approach: Replace Linear Ops with Optimization Distance Based Version of PCA? Distance Methods Multi Dimensional Scaling (MDS) Name & Psychometric References:

Torgersen (1952, 1958), Gower (1966) Earlier Versions of Ideas: Eckart & Young (1936), Young and Householder (1938) Big Picture Object Oriented Data Analysis Have done detailed study of Data Objects In Euclidean Space,

Next: OODA in Non-Euclidean Spaces Shapes As Data Objects Several Different Notions of Shape Oldest and Best Known (in Statistics): Landmark Based Shapes As Data Objects

Landmark Based Shape Analysis: Kendall et al. (1999) Bookstein (1991) Dryden & Mardia (2016) Recommended as Most Accessible Landmark Based Shape Analysis Start by Representing Shapes by Landmarks (points in or ) ( 1, 1)

( 2, 2) ( 3 , 3) 1 1 2 6 2 3

3 [] Feature Vector Landmark Based Shape Analysis Approach: Identify objects that are: Translations Rotations Scalings

of each other Landmark Based Shape Analysis Approach: Identify objects that are: Translations Rotations Scalings of each other Mathematics: Equivalence Relation

Equivalence Relations Deeper Example: Group Transformations Mathematical Terminology: Quotient Operation Set of Equiv. Classes = Quotient Space Denoted

Landmark Based Shape Analysis Approach: Identify objects that are: Translations Rotations Scalings of each other Group of Transformations Landmark Based Shape Analysis

Approach: Identify objects that are: Translations Rotations Scalings of each other Mathematics: Results in: Orbits) Equivalence Relation

Equivalence Classes (i.e. Which become the Data Objects Landmark Based Shape Analysis Equivalence Classes become Data Objects Mathematics: Called Quotient Space Intuitive Representation:

(curved surface) Manifold Landmark Based Shape Analysis Triangle Shape Space: Represent as Sphere Landmark Based Shape Analysis Triangle Shape Space: Represent as Sphere Translation

Landmark Based Shape Analysis Triangle Shape Space: Represent as Sphere rotation Landmark Based Shape Analysis Triangle Shape Space: Represent as Sphere

scaling (thanks to Wikipedia) Landmark Based Shape Analysis Triangle Shape Space: Represent as Sphere Equilateral Triangles

Co-Linear Point Triples Hemispheres Are Reflections OODA in Image Analysis First Generation Problems:

Denoising Segmentation Registration

(all about single images, still interesting challenges) OODA in Image Analysis Second Generation Problems: Populations of Images Understanding Population Variation Discrimination (a.k.a. Classification)

Complex Data Structures (& Spaces) HDLSS Statistics High Dimension, Low Sample Size

Image Object Representation Major Approaches for Image Data Objects: Landmark Representations

Boundary Representations Skeletal Representations Landmark Representations Landmarks for Fly Wing Data: Thanks to George Gilchrist

Landmark Representations Major Drawback of Landmarks: Need to always find each landmark Need same relationship I.e. Landmarks need to correspond

Often fails for medical images E.g. How many corresponding landmarks on a set of kidneys, livers or brains???

Boundary Representations Traditional Major Sets of Ideas: Triangular Meshes Survey: Owen (1998) Active Shape Models Cootes, et al (1993)

Fourier Boundary Representations Keleman, et al (1997 & 1999) Boundary Representations Example of triangular mesh repn: From:www.geometry.caltech.edu/pubs.html Boundary Representations Main Drawback:

Correspondence For OODA (on vectors of parameters): Need to match up points Boundary Representations Main Drawback: Correspondence

For OODA (on vectors of parameters): Need to match up points Easy to find triangular mesh Lots of research on this driven by gamers Boundary Representations Main Drawback:

Correspondence For OODA (on vectors of parameters): Need to match up points Easy to find triangular mesh Lots of research on this driven by gamers

Challenge to match mesh across objects Boundary Representations Correspondence for Mesh Objects: 1. Active Shape Models (PCA like) Boundary Representations

Correspondence for Mesh Objects: 1. Active Shape Models (PCA like) 2. Automatic Landmark Choice Cates, et al (2007) Based on Optimization Problem: Good Correspondence & Separation (Formulate via Entropy) Skeletal Representations Main Idea: Represent Objects as:

Discretized skeletons (medial atoms) Plus spokes from center to edge Which imply a boundary Very accessible early reference: Yushkevich, et al (2001) Skeletal Representations 2-d S-Rep Example: Corpus Callosum (Yushkevich)

Skeletal Representations 2-d S-Rep Example: Corpus Callosum (Yushkevich) Atoms Skeletal Representations 2-d S-Rep Example: Corpus Callosum (Yushkevich) Atoms

Spokes Skeletal Representations 2-d S-Rep Example: Corpus Callosum (Yushkevich) Atoms Spokes Implied Boundary

A Challenging Example Male Pelvis Bladder Prostate Rectum A Challenging Example Male Pelvis

Bladder Prostate Rectum Attached to the Bladder Common Area for Cancer in Males A Challenging Example Male Pelvis

Bladder Prostate Rectum Common Area for Cancer in Males Useful Approach: Radiation Challenge: Design Treatment Hit Prostate Miss Bladder & Rectum Over Many Days

A Challenging Example Radiation Treatment in Male Pelvis Bladder Prostate Rectum

Central Question: How do they move over time (days)? Work with 3-d CT (Computed Tomography, = 3d version of X-ray) A Challenging Example Radiation Treatment in Male Pelvis

Bladder Prostate Rectum Central Question: How do they move over time (days)? Work with 3-d CT

Very Challenging to Segment Find boundary of each object? Represent each Object? Male Pelvis Raw Data One CT Slice (in 3d image) Like X-ray:

White = Dense (Bone) Black = Gas Male Pelvis Raw Data One CT Slice (in 3d image) Tail Bone Male Pelvis Raw Data One CT Slice

(in 3d image) Tail Bone Rectum Male Pelvis Raw Data One CT Slice (in 3d image) Tail Bone Rectum Bladder

Male Pelvis Raw Data One CT Slice (in 3d image) Tail Bone Rectum Bladder Prostate Male Pelvis Raw Data Bladder: manual

segmentation Slice by slice Reassemble d Male Pelvis Raw Data Bladder: Slices: Reassembled in 3d How to represent? Thanks: Ja-Yeon Jeong

Object Representation Above Shape Approaches: Landmarks (hard to find) Boundary Repns (no correspondence)

Skeletal Representations Skeletal Representations 3-d S-Rep Example: From Ja-Yeon Jeong Bladder Prostate - Rectum Skeletal Representations 3-d S-Rep Example: From Ja-Yeon

Jeong Bladder Prostate - Rectum Skeletal Representations 3-d S-Rep Example: From Ja-Yeon Jeong Bladder Prostate - Rectum Skeletal Representations 3-d S-Rep Example: From Ja-Yeon Jeong

Bladder Prostate - Rectum Skeletal Representations 3-d S-reps: there are several variations Two choices: From Fletcher (2004) Skeletal Representations Detailed discussion of mathematics of S-reps:

Siddiqi & Pizer (2008) More Applications of S-reps in Imaging: Pizer & Marron (2017) Skeletal Representations Statistical Challenge S-rep parameters are: Locations , Radii () Angles

(not comparable) Stuffed into a long vector I.e. many direct products of these Skeletal Representations Statistical Challenge Many direct products of: Locations ,

Radii () Angles (not comparable) Appropriate View: Data Lie on Curved Manifold Embedded in higher dimal Eucln Space 3-d s-reps S-rep model fitting

Easy, when starting from binary (blue) But very expensive (30 40 minutes technicians time) Want automatic approach

Challenging, because of poor contrast, noise, Need to borrow information across training sample Use Bayes approach: prior & likelihood posterior

(A surrogate for anatomical knowledge) 3-d s-reps S-rep model fitting Easy, when starting from binary (blue) But very expensive (30 40 minutes technicians time)

Want automatic approach Challenging, because of poor contrast, noise,

Need to borrow information across training sample Use Bayes approach: prior & likelihood posterior ~Conjugate Gaussians (Embarassingly Straightforward?)

3-d s-reps S-rep model fitting Easy, when starting from binary (blue) But very expensive (30 40 minutes technicians time)

Want automatic approach Challenging, because of poor contrast, noise, Need to borrow information across training sample

Use Bayes approach: prior & likelihood posterior ~Conjugate Gaussians, but there are issues: Major HLDSS challenges

Manifold aspect of data Handle With Variation on PCA Careful Handling Very Useful 3-d s-reps S-rep model fitting

Very Successful Jeong (2009) 3-d s-reps S-rep model fitting Very Successful

Jeong (2009) Basis of Startup Company: Since Purchased By Accuray

Morphormics Mildly Non-Euclidean Spaces Statistical Analysis of S-rep Data Recall: Many direct products of: Locations Radii Angles Useful View: Data Objects on Curved

Manifold Data in non-Euclidean Space But only mildly non-Euclidean Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) Note on Terminology:

Manifold Data Manifold Learning Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) Note on Terminology: Manifold Data Manifold Learning Data Naturally Lie on Known Manifold

Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) Note on Terminology: Manifold Data Manifold Learning Try to Find Low-d Aproxing Manifold

Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) E.g. average of: ? ???

Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) E.g. average of: ??? xx xx

Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) E.g. average of: xx xx ???

Should Use Unit Circle Structure Data Lying On a Manifold Major issue: s-reps live in (locations, radii and angles) E.g. average of:

??? Natural Data Space is: Smooth, Curved Manifold (Differential Geometry) Manifold Feature Spaces Standard Statistical Example: Directional Data (aka Circular Data)

Idea: Angles as Data Objects Wind Directions Magnetic Compass Headings Cracks in Mines Manifold Feature Spaces Standard Statistical Example:

Directional Data (aka Circular Data) Classical References: Mardia (2014), Fisher (1995), Jammalamadaka & Sen Gupta (2001) Manifold Feature Spaces Standard Statistical Example: Directional Data (aka Circular Data)

Reasonable View: Points on Unit Circle Manifold Feature Spaces Main Idea: Curved Surface, With Approximating Tangent Plane At Each Point, (In Limit of Shrinking

Neighborhoods) Manifold Feature Spaces Important Mappings: Plane Surface: Manifold Feature Spaces Important Mappings: Plane Surface:

Important Point: Common Length (along surface) Manifold Feature Spaces Important Mappings: Plane Surface: Surface Plane

Manifold Feature Spaces Log & Exp Memory Device: e i Complex Numbers i

Exponential: Tangent Plane Manifold (Note: Common Length) Manifold Feature Spaces Log & Exp Memory Device: e

i Complex Numbers i Exponential: Tangent Plane Manifold Logarithm: Manifold Tangent Plane

Manifold Feature Spaces Important Mappings: Plane Surface: Surface Plane (matrix versions) Manifold Feature Spaces

Natural Choice of For Data Analysis A Centerpoint Hard To Use: Manifold Feature Spaces Extrinsic Centerpoint Compute:

Anyway And Project Back To Manifold Manifold Feature Spaces Intrinsic Centerpoint Work Really Inside The Manifold

Manifold Feature Spaces Recall Useful Center in Metric Spaces Frchet Mean Frchet (1948) Works in Any Metric Space (e.g. Manifolds) Manifold Feature Spaces Frchet Mean of Numbers: Frchet Mean in Euclidean Space ():

(Intrinsic) Frchet Mean on a Manifold: Replace Euclidean by Geodesic Manifold Feature Spaces Geodesics: Idea: March Along Manifold Without Turning (Defined in Tangent Plane) Manifold Feature Spaces Geodesics:

Idea: March Along Manifold Without Turning (Defined in Tangent Plane) E.g. Surface of the Earth: Great Circle E.g. Lines of Longitude (Not Latitude) Manifold Feature Spaces Geodesic Distance: Given Points & , define

Can Show: is a metric (distance) Manifold Feature Spaces Frchet Mean of Numbers: Frchet Mean in Euclidean Space (): (Intrinsic) Frchet Mean on a Manifold: Replace Euclidean by Geodesic

Manifold Feature Spaces Extrinsic vs. Intrinsic Centerpoints Ambient Space & Project Back to Manifold E.g. Frchet Mean Compare on Unit Circle,

Manifold Feature Spaces Extrinsic vs. Intrinsic Centerpoints Often Similar Note Big Difference! Which is Better? Manifold Feature Spaces Extrinsic vs. Intrinsic Centerpoints Which is Better?

Intrinsic More Like Mean? Intrinsic More Stable Shifting a Green Down Extrinsic More Stable Shifting Greens Left Manifold Feature Spaces Extrinsic vs. Intrinsic Centerpoints Which is Better? Problem as Hard as

Center of Bimodal Distribution Median Mean Manifold Feature Spaces Extrinsic vs. Intrinsic Centerpoints Which is Better?

Problem as Hard as Center of Bimodal Distribution Manifold Feature Spaces Extrinsic vs. Intrinsic Centerpoints Which is Better? Problem as Hard as Center of Bimodal Distribution

Manifold Feature Spaces Directional Data Examples of Frchet Mean: Not always easily interpretable Manifold Feature Spaces Directional Data Examples of Frchet Mean: Not always easily interpretable

Think about distances along arc Not about points in Sum of squared distances strongly feels the largest Not always unique

But unique with probability one Non-unique requires strong symmetry But possible to have many means Participant Presentations Benjamin Leinwand Longitudinal Brain Scans Keerthi Anand Blind Source Separation