Top 5 Linux Commands for Beginners

Data Science on the Command LinePhoto by Nathan Dumlao on UnsplashAs data sets are getting larger and more prevalent, researchers are having to do a lot more of the leg work in regards to core programming — thereby spending more time with tools like GIT and Linux (something we rarely had to before!).For the software engineers reading this post:... Continue Reading →

How to Derive an OLS Estimator in 3 Easy Steps

Mohammad Hasan on [Pixabay]A Data Scientist’s Must-KnowOLS Estimation was originally derived in 1795 by Gauss. 17 at the time, the genius mathematician was attempting to define the dynamics of planetary orbits and comets alike and in the process, derived much of modern day statistics. Now the methodology I show below is a hell of a... Continue Reading →

Flask’s Latest Rival in Data Science

Photo by Fotis Fotopoulos on UnsplashStreamlit Is The Game Changing Python Library That We’ve Been Waiting ForDeveloping a user-interface is not easy. I’ve always been a mathematician and for me, coding was a functional tool to solve an equation and to create a model, rather than providing the user with an experience. I’m not artsy and nor... Continue Reading →

The Sampling Distribution of Pearson’s Correlation

Pearson’s Correlation reflects the dispersion of a linear relationship (top row), but not the angle of that relationship (middle), nor many aspects of nonlinear relationships (bottom). [Source]How a Data Scientists can get the most of this statisticPeople are quite familiar with the colloquial usage of the term ‘correlation’: that it tends to resemble a phenomena... Continue Reading →

Plotting with Seaborn in Python

Figure 0: Pair Plot using Seaborn — [more information]4 Reasons Why and 3 Examples Howimport seaborn as snsFinding a pattern can sometimes be the easy bit when researching so let’s be honest: conveying a pattern to the team or your customers is sometimes a lot more difficult than it should be. Not only are some patterns hard to... Continue Reading →

The Power-Law Distribution

Pareto’s Power-Law Distribution Explaining the Laws of Nature (Including the Golden Ratio)The laws of nature are complicated and throughout time, Scientists from all corners of the world have attempted to model and reengineer what they see around them to extract some value from it. Quite often we see a pattern that comes up time and time... Continue Reading →

The Student t-Distribution

Probability Density Function for the Student t-Distribution.For the Sake of Statistics, forget the Normal Distribution.To be clear: This is targeted at Data Scientiststs/Machine Learning Researchers and not at PhysicistsStatistical normality is overused. It‘s not as common and only really occurs in the impractical ‘limits’ [[2][3][4]]. To garner normality, you need to have substantial well-behaved independent... Continue Reading →

Robust Statistical Methods

Anomalies hidden in plain sight. Chart from Liu and Neilson (2016)Methods that Data Scientists Should LoveA robust statistic is a type of estimator used when the distribution of the data set is not certain, or when egregious anomalies exist. If we’re confident on the distributional properties of our data set, then traditional statistics like the Sample Mean... Continue Reading →

Powered by WordPress.com.

Up ↑