{inner, outer, left, right}Photo by Sid Balachandran on UnsplashIn 2008, Wes Mckinney was at the Hedge Fund AQR and developed a small piece of software which became the pre-cursor to Pandas, developing and finalising it later on. Since then, Pandas has become one of the most important Python libraries that most Data Scientists (if not all) use... Continue Reading →
Sorry, the TensorFlow Developer Certificate is Pointless
Plenty of Better Alternatives Exist to Prove your SkillsetPhoto by Daniel Mingook Kim on UnsplashGoogle’s overall openness and investment in the space of AI has been phenomenal. I really think that unequivocally, the whole world has a lot to thank them for. Academic breakthroughs are published and code is often made free on GitHub. What more could... Continue Reading →
How to Remove Racial Discrimination from Data Science
On Finding and Fixing Latent Racial BiasPhoto by Tim Mossholder on UnsplashThe recent protests which unfolded across the United States (and more recently across the World) reminded us how important it is to acknowledge and resolve both unfair and undue bias from society.It’s important that events like these teach and remind us to take a look at... Continue Reading →
Top 5 Linux Commands for Beginners
Data Science on the Command LinePhoto by Nathan Dumlao on UnsplashAs data sets are getting larger and more prevalent, researchers are having to do a lot more of the leg work in regards to core programming — thereby spending more time with tools like GIT and Linux (something we rarely had to before!).For the software engineers reading this post:... Continue Reading →
What does the keyword “yield” do in Python?
Handling Python Memory Issues when faced with Big DataSmileys [Pixabay]As the programming language Python develops over time, added functionality improves both its usability and performance. Python has become (if not) the foremost language in the Data Science and its handling of big data sets is amongst one of the reasons why.It’s no wonder that the language... Continue Reading →
How to Derive an OLS Estimator in 3 Easy Steps
Mohammad Hasan on [Pixabay]A Data Scientist’s Must-KnowOLS Estimation was originally derived in 1795 by Gauss. 17 at the time, the genius mathematician was attempting to define the dynamics of planetary orbits and comets alike and in the process, derived much of modern day statistics. Now the methodology I show below is a hell of a... Continue Reading →
Flask’s Latest Rival in Data Science
Photo by Fotis Fotopoulos on UnsplashStreamlit Is The Game Changing Python Library That We’ve Been Waiting ForDeveloping a user-interface is not easy. I’ve always been a mathematician and for me, coding was a functional tool to solve an equation and to create a model, rather than providing the user with an experience. I’m not artsy and nor... Continue Reading →
The Sampling Distribution of Pearson’s Correlation
Pearson’s Correlation reflects the dispersion of a linear relationship (top row), but not the angle of that relationship (middle), nor many aspects of nonlinear relationships (bottom). [Source]How a Data Scientists can get the most of this statisticPeople are quite familiar with the colloquial usage of the term ‘correlation’: that it tends to resemble a phenomena... Continue Reading →
Plotting with Seaborn in Python
Figure 0: Pair Plot using Seaborn — [more information]4 Reasons Why and 3 Examples Howimport seaborn as snsFinding a pattern can sometimes be the easy bit when researching so let’s be honest: conveying a pattern to the team or your customers is sometimes a lot more difficult than it should be. Not only are some patterns hard to... Continue Reading →
The Power-Law Distribution
Pareto’s Power-Law Distribution Explaining the Laws of Nature (Including the Golden Ratio)The laws of nature are complicated and throughout time, Scientists from all corners of the world have attempted to model and reengineer what they see around them to extract some value from it. Quite often we see a pattern that comes up time and time... Continue Reading →