Living in Higher Dimensions
Sir Professor David Mackay revolutionised Machine Learning. There’s no question about it. His abundance of knowledge was clear from his research, his selflessness, and his groundbreaking work on Information Theory. Both the fields of Gaussian Processes and Neural Networks owe him a lot.
During my studies I came across the following question in his book which, unbeknown to me at the time, would revolutionise my way of thinking about Machine Learning.
The question itself is nothing more than a graduate level maths question, but some things register in different ways. In particular, I always visualised probability as maybe a 2 or at most 3 dimension problem. This was wrong.
Multivariate studies are hard to understand and harder to visualise, but imagine that the density in ‘space’ or ‘object’ was of a uniform distribution. That is, that density is uniformly distributed throughout the space. From here, the problem conjects that:
Consider a sphere of radius
r in an
N-dimensional real space. Show that the fraction (
f)of the volume of the sphere that is in the surface shell lying at values of the radius between
r − ϵ and r, where
0 < ϵ< r, is:
Further, evaluate the function for
N = 2, 10, 1000 and for
Now there are two points understand thus far:
- The density in this sphere is that of the density of a distribution. Just how you look at a bell curve in 2 dimensions: project that image to a ball. Now, the volume inside the ball is your probability distribution
- The question actually gives you the answer, in that, as the number of dimensions increase, the proportion of the density increases in the ‘shells’ of each manifold. Thereby implying that the more information is being stored at the intersections of each
Asymptotically, we need to think about how the equation would like with N dimensions. In saying that, let’s first look at the equations for volume of a sphere in low dimensions:
So we can see that roughly speaking, the density of the n-dimensional sphere increases by a factor of the number of dimensions, so:
So if we want to calculate a ‘shell’ or a ‘surface’ volume, that would equate to roughly the difference between this, and a smaller sphere of shape:
And the proportion of the volume stored in this residual would be calculated as in traditional settings
Now the ratio
ϵ/r reflects the proportion of the radius that is classed as the
shell. We see that in two dimensions, if
1% of the radius is in the surface, then
2% of the density is stored within the
surface. However, if we move to a space of
1000 dimensions, then
99.996% of information is stored in the
Now think about this a little bit more. We agree that the density within the n-sphere is equivalent to a probability density. Also, as the number of dimensions increase, the density in the shell increases. What does this teach us about Machine Learning?
It teaches us that if we want to better model multivariate phenomena in higher dimensions, then we need to model the boundaries between these variables because that’s where the information exists.
There’s no point modelling just one dimension because the information by modelling just one dimension diminishes as the number of dimensions increase. There’s also no point modelling just two dimensions because it still ignores the vast amount of density. This is why Neural Networks and other higher-dimensional methods work so well. They take into account information from all variables so can, therefore, model the intersection much better.
Given that, I hope that as a reader you can appreciate why we want to model the intersection of variables, rather than being happy with just modelling one or two variables that ‘seem important’.
Identifying key features is important because often, the density isn’t uniformly distributed, however, we should appreciate that the intersection of these variables is where a lot of the juice lies.
Thanks for reading again!! Let me know if you have any questions and I’ll be happy to help.
Keep up to date with my latest work here!