Using Deep Learning to Count RepetitionsPhoto by Efe Kurnaz on UnsplashIn our daily lives, repeating actions occur frequently. This ranges from organic cycles such as heartbeats and breathing, through programming and manufacturing, to planetary cycles like day-night rotation and seasons.The need to recognise these repetitions, like those in videos, is unavoidable and requires a system that... Continue Reading →

# The Sampling Distribution of Pearson’s Correlation

Pearson’s Correlation reflects the dispersion of a linear relationship (top row), but not the angle of that relationship (middle), nor many aspects of nonlinear relationships (bottom). [Source]How a Data Scientists can get the most of this statisticPeople are quite familiar with the colloquial usage of the term ‘correlation’: that it tends to resemble a phenomena... Continue Reading →

# Plotting with Seaborn in Python

Figure 0: Pair Plot using Seaborn — [more information]4 Reasons Why and 3 Examples Howimport seaborn as snsFinding a pattern can sometimes be the easy bit when researching so let’s be honest: conveying a pattern to the team or your customers is sometimes a lot more difficult than it should be. Not only are some patterns hard to... Continue Reading →

# The Student t-Distribution

Probability Density Function for the Student t-Distribution.For the Sake of Statistics, forget the Normal Distribution.To be clear: This is targeted at Data Scientiststs/Machine Learning Researchers and not at PhysicistsStatistical normality is overused. It‘s not as common and only really occurs in the impractical ‘limits’ [[2][3][4]]. To garner normality, you need to have substantial well-behaved independent... Continue Reading →

# Robust Statistical Methods

Anomalies hidden in plain sight. Chart from Liu and Neilson (2016)Methods that Data Scientists Should LoveA robust statistic is a type of estimator used when the distribution of the data set is not certain, or when egregious anomalies exist. If we’re confident on the distributional properties of our data set, then traditional statistics like the Sample Mean... Continue Reading →

# The Sampling Distribution of OLS Estimators

OLS Regression on sample data Details, details: it’s all about the details!Ordinary Least Squares (OLS) is usually the first method every student learns as they embark on a journey of statistical euphoria. It’s a method that quite simply finds the line of best fit within a two dimensional dataset. Now the assumptions behind the model, along with... Continue Reading →

# Asymptotic Distributions

Eulers Infinity Infinity (and beyond…)The study of asymptotic distributions looks to understand how the distribution of a phenomena changes as the number of samples taken into account goes from n → ∞. Say we’re trying to make a binary guess on where the stock market is going to close tomorrow (like a Bernoulli trial): how does the... Continue Reading →

# The Distribution of the Sample Mean

One-sided test on a distribution that is shaped like a Bell Curve. [Image from Jill Mac from Source (CC0) ]All Machine Learning Researchers should know thisMost machine learning and mathematical problems involve extrapolating a subset of data to infer for a global population. As an example, we may only get 100 replies on a survey to our... Continue Reading →

# Parts-based learning by Non-Negative Matrix Factorisation

Visualising the principal components of portrait facial images. ‘Eigenfaces’ are the decomposed images in the direction of largest variance.Why we can’t relate to eigenfacesTraditional methods like Principal Component Analysis (PCA) would decompose a dataset into some form of latent representation e.g. eigenvectors, which at times can be meaningless when visualised — what actually is my first principal... Continue Reading →

# Gross Failures in COVID Reporting

Photo by Deniz Fuchidzhiev on UnsplashReporting Inextricable Statistics is a ProblemIf its use of these items is typical of the NHS at large, the range of daily demand would be between 7.5 million to 12 million, more than the 5.5 million actually supplied. [Source]It’s been quite clear from the beginning of the epidemic that statistical modelling is... Continue Reading →