Which Spatial Partition Trees Are Adaptive to Intrinsic Dimension?
Nakul Verma, Samory Kpotufe, and Sanjoy Dasgupta
{naverma, skpotufe, dasgupta}@ucsd.edu
University of California, San Diego
Local covariance dimension
Why is this important?
Spatial trees are at the heart of many machine learning tasks (e.g. regression, near neighbor search, vector quantization).
However, they tend to suffer from the curse of dimensionality: the rate
at which the diameter of the data decreases as we go down the tree
depends on the dimension of the space. In particular, we might require
partitions of size O(2D) to attain small data diameters.
Fortunately, many real world data have low intrinsic dimension (e.g.
manifolds, sparse datasets), and we would like to benefit from such
situations.
Theoretical Guarantees
A set S D is said to have local covariance dimension (d, ) if the
largest d eigenvalues of its covariance matrix satisfy:
1
2
1
2
d
2
1
2
D
d=2
Diameter decrease rate (k): Smallest k such that data diameter is halved
every k levels.
We show that:
k O d
RP tree:
number of levels
PD tree:
d=3
d=1
Empirical estimates of local covariance dimension
2M tree:
k O
k O min
2 2
1
d ,
2
i
2
i
2 2
1
needed to halve
the diameter
( k)
dyadic tree / kd tree: k O D log D
Axis parallel splitting rules (dyadic / kd tree) dont always adapt to intrinsic
dimension; the upper bounds have matching lower bounds.
On the other hand, the irregular splitting rules (RP / PD / 2M trees) always
adapt to intrinsic dimension. They therefore tend to perform better on real
world tasks.
We show that
Trees such as RPTree, PDTree and 2-MeansTree adapt to the intrinsic
dimension of the data in terms of the rate at which they decrease diameter
down the tree.
This has strong implications on the performance of these trees on the
various learning tasks they are used for.
Experiments
Vector Quantization
Some real world data with low intrinsic dimension
Rotating teapot. One degree
of freedom (rotation angle).
Movement of a robotic arm.
Two degrees of freedom.
one for each joint.
Loc. cov. dim. estimate at different scales for some real-world datasets.
Spatial Partition Trees
Handwritten characters. The
tilt angle, thickness, etc.
govern the final written form.
Speech. Few anatomical characteristics govern the spoken
phonemes.
Hand gestures in Sign Language. Few
gestures can follow other gestures.
Standard characterizations of intrinsic dimension
Common notions of intrinsic dimension (e.g. Box dimension, Doubling
dimension, etc.) originally emerged from fractal geometry. They,
however, have the following issues in the context of machine learning:
These notions are purely geometrical and dont account for the underlying
distribution.
They are not robust to distributional noise: e.g. for a noisy manifold, these
dimensions can be very high.
They are difficult to verify empirically.
Need a more statistical notion of intrinsic dimension that
characterizes the underlying distribution, is robust to noise, and is
easy to verify for real world datasets.
Builds a hierarchy of nested partitions of the data space by recursively
bisecting the space.
dyadic tree
kd tree
rp tree
Quantization error of test data at different levels for various partition trees
(built using separate training data). 2-means and PD trees perform the best.
Nearest Neighbor
Level 1
Level 2
The trees we consider:
dyadic tree: Pick a coordinate direction and split the data at the mid point
along this direction.
Quality of the found neighbor at various levels of the partition trees.
Regression
kd tree: Pick a coordinate direction and split the data at the median along
this direction.
RP tree: Pick a random direction and split the data at the median along
this direction.
PCA/PD tree: Split the data at the median along the principal direction.
2Means tree: Compute the 2-means solution, and split the data as per the
cluster assignment.
l2 regression error in predicting the rotation angle at different tree levels.
All experiments are done with 10-fold cross validation.