Photo by Fotis Fotopoulos on UnsplashStreamlit Is The Game Changing Python Library That We’ve Been Waiting ForDeveloping a user-interface is not easy. I’ve always been a mathematician and for me, coding was a functional tool to solve an equation and to create a model, rather than providing the user with an experience. I’m not artsy and nor... Continue Reading →

Pearson’s Correlation reflects the dispersion of a linear relationship (top row), but not the angle of that relationship (middle), nor many aspects of nonlinear relationships (bottom). [Source]How a Data Scientists can get the most of this statisticPeople are quite familiar with the colloquial usage of the term ‘correlation’: that it tends to resemble a phenomena... Continue Reading →

Figure 0: Pair Plot using Seaborn — [more information]4 Reasons Why and 3 Examples Howimport seaborn as snsFinding a pattern can sometimes be the easy bit when researching so let’s be honest: conveying a pattern to the team or your customers is sometimes a lot more difficult than it should be. Not only are some patterns hard to... Continue Reading →

Pareto’s Power-Law Distribution Explaining the Laws of Nature (Including the Golden Ratio)The laws of nature are complicated and throughout time, Scientists from all corners of the world have attempted to model and reengineer what they see around them to extract some value from it. Quite often we see a pattern that comes up time and time... Continue Reading →

Probability Density Function for the Student t-Distribution.For the Sake of Statistics, forget the Normal Distribution.To be clear: This is targeted at Data Scientiststs/Machine Learning Researchers and not at PhysicistsStatistical normality is overused. It‘s not as common and only really occurs in the impractical ‘limits’ [[2][3][4]]. To garner normality, you need to have substantial well-behaved independent... Continue Reading →

Anomalies hidden in plain sight. Chart from Liu and Neilson (2016)Methods that Data Scientists Should LoveA robust statistic is a type of estimator used when the distribution of the data set is not certain, or when egregious anomalies exist. If we’re confident on the distributional properties of our data set, then traditional statistics like the Sample Mean... Continue Reading →

OLS Regression on sample data Details, details: it’s all about the details!Ordinary Least Squares (OLS) is usually the first method every student learns as they embark on a journey of statistical euphoria. It’s a method that quite simply finds the line of best fit within a two dimensional dataset. Now the assumptions behind the model, along with... Continue Reading →

Eulers Infinity Infinity (and beyond…)The study of asymptotic distributions looks to understand how the distribution of a phenomena changes as the number of samples taken into account goes from n → ∞. Say we’re trying to make a binary guess on where the stock market is going to close tomorrow (like a Bernoulli trial): how does the... Continue Reading →

One-sided test on a distribution that is shaped like a Bell Curve. [Image from Jill Mac from Source (CC0) ]All Machine Learning Researchers should know thisMost machine learning and mathematical problems involve extrapolating a subset of data to infer for a global population. As an example, we may only get 100 replies on a survey to our... Continue Reading →

Visualising the principal components of portrait facial images. ‘Eigenfaces’ are the decomposed images in the direction of largest variance.Why we can’t relate to eigenfacesTraditional methods like Principal Component Analysis (PCA) would decompose a dataset into some form of latent representation e.g. eigenvectors, which at times can be meaningless when visualised — what actually is my first principal... Continue Reading →