How to Join DataFrames in Pandas in Python

{inner, outer, left, right}Photo by Sid Balachandran on UnsplashIn 2008, Wes Mckinney was at the Hedge Fund AQR and developed a small piece of software which became the pre-cursor to Pandas, developing and finalising it later on. Since then, Pandas has become one of the most important Python libraries that most Data Scientists (if not all) use... Continue Reading →

Top 5 Linux Commands for Beginners

Data Science on the Command LinePhoto by Nathan Dumlao on UnsplashAs data sets are getting larger and more prevalent, researchers are having to do a lot more of the leg work in regards to core programming — thereby spending more time with tools like GIT and Linux (something we rarely had to before!).For the software engineers reading this post:... Continue Reading →

What does the keyword “yield” do in Python?

Handling Python Memory Issues when faced with Big DataSmileys [Pixabay]As the programming language Python develops over time, added functionality improves both its usability and performance. Python has become (if not) the foremost language in the Data Science and its handling of big data sets is amongst one of the reasons why.It’s no wonder that the language... Continue Reading →

How to Derive an OLS Estimator in 3 Easy Steps

Mohammad Hasan on [Pixabay]A Data Scientist’s Must-KnowOLS Estimation was originally derived in 1795 by Gauss. 17 at the time, the genius mathematician was attempting to define the dynamics of planetary orbits and comets alike and in the process, derived much of modern day statistics. Now the methodology I show below is a hell of a... Continue Reading →

Flask’s Latest Rival in Data Science

Photo by Fotis Fotopoulos on UnsplashStreamlit Is The Game Changing Python Library That We’ve Been Waiting ForDeveloping a user-interface is not easy. I’ve always been a mathematician and for me, coding was a functional tool to solve an equation and to create a model, rather than providing the user with an experience. I’m not artsy and nor... Continue Reading →

The Sampling Distribution of Pearson’s Correlation

Pearson’s Correlation reflects the dispersion of a linear relationship (top row), but not the angle of that relationship (middle), nor many aspects of nonlinear relationships (bottom). [Source]How a Data Scientists can get the most of this statisticPeople are quite familiar with the colloquial usage of the term ‘correlation’: that it tends to resemble a phenomena... Continue Reading →

Plotting with Seaborn in Python

Figure 0: Pair Plot using Seaborn — [more information]4 Reasons Why and 3 Examples Howimport seaborn as snsFinding a pattern can sometimes be the easy bit when researching so let’s be honest: conveying a pattern to the team or your customers is sometimes a lot more difficult than it should be. Not only are some patterns hard to... Continue Reading →

The Power-Law Distribution

Pareto’s Power-Law Distribution Explaining the Laws of Nature (Including the Golden Ratio)The laws of nature are complicated and throughout time, Scientists from all corners of the world have attempted to model and reengineer what they see around them to extract some value from it. Quite often we see a pattern that comes up time and time... Continue Reading →

Powered by WordPress.com.

Up ↑