The Intersection of Statistical Physics and Machine Learning

Living in Higher Dimensions

Photo by Greg Rakozy on Unsplash

Sir Professor David Mackay revolutionised Machine Learning. There’s no question about it. His abundance of knowledge was clear from his research, his selflessness, and his groundbreaking work on Information Theory. Both the fields of Gaussian Processes and Neural Networks owe him a lot.

During my studies I came across the following question in his book which, unbeknown to me at the time, would revolutionise my way of thinking about Machine Learning.

The question itself is nothing more than a graduate level maths question, but some things register in different ways. In particular, I always visualised probability as maybe a 2 or at most 3 dimension problem. This was wrong.

Multivariate studies are hard to understand and harder to visualise, but imagine that the density in ‘space’ or ‘object’ was of a uniform distribution. That is, that density is uniformly distributed throughout the space. From here, the problem conjects that:

Probability distributions and volumes have some unexpected properties in high-dimensional spaces.


Consider a sphere of radius r in an N-dimensional real space. Show that the fraction (f)of the volume of the sphere that is in the surface shell lying at values of the radius between r − ϵ and r, where 0 < ϵ< r, is:

Further, evaluate the function for N = 2, 10, 1000 and for ϵ/r=0.01, 0.5.

Now there are two points understand thus far:

  1. The density in this sphere is that of the density of a distribution. Just how you look at a bell curve in 2 dimensions: project that image to a ball. Now, the volume inside the ball is your probability distribution
  2. The question actually gives you the answer, in that, as the number of dimensions increase, the proportion of the density increases in the ‘shells’ of each manifold. Thereby implying that the more information is being stored at the intersections of each

Asymptotically, we need to think about how the equation would like with N dimensions. In saying that, let’s first look at the equations for volume of a sphere in low dimensions:

  1. 2R
  2. πR²
  3. (4/3)πR³

So we can see that roughly speaking, the density of the n-dimensional sphere increases by a factor of the number of dimensions, so:

So if we want to calculate a ‘shell’ or a ‘surface’ volume, that would equate to roughly the difference between this, and a smaller sphere of shape:

And the proportion of the volume stored in this residual would be calculated as in traditional settings (b-a)/a:

Now the ratio ϵ/r reflects the proportion of the radius that is classed as the shell. We see that in two dimensions, if 1% of the radius is in the surface, then 2% of the density is stored within the surface. However, if we move to a space of 1000 dimensions, then 99.996% of information is stored in the surface.

Photo by 小谢 on Unsplash

Now think about this a little bit more. We agree that the density within the n-sphere is equivalent to a probability density. Also, as the number of dimensions increase, the density in the shell increases. What does this teach us about Machine Learning?

It teaches us that if we want to better model multivariate phenomena in higher dimensions, then we need to model the boundaries between these variables because that’s where the information exists.

There’s no point modelling just one dimension because the information by modelling just one dimension diminishes as the number of dimensions increase. There’s also no point modelling just two dimensions because it still ignores the vast amount of density. This is why Neural Networks and other higher-dimensional methods work so well. They take into account information from all variables so can, therefore, model the intersection much better.

Given that, I hope that as a reader you can appreciate why we want to model the intersection of variables, rather than being happy with just modelling one or two variables that ‘seem important’.

Identifying key features is important because often, the density isn’t uniformly distributed, however, we should appreciate that the intersection of these variables is where a lot of the juice lies.

Thanks for reading again!! Let me know if you have any questions and I’ll be happy to help.

Keep up to date with my latest work here!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by

Up ↑

%d bloggers like this: