4 Reasons Why and 3 Examples How
import seaborn as sns
Finding a pattern can sometimes be the easy bit when researching so let’s be honest: conveying a pattern to the team or your customers is sometimes a lot more difficult than it should be. Not only are some patterns hard to explain in layman terms (try explaining a principal component to non-mathematicians) but sometimes, you’re trying to signify the dependency on a conditional joint distribution…say what?
Charting is imperative to our job as researchers so we need to be able to convey our story well. Without this: our knowledge and findings carry much less weight but with the best visuals, we can be sure to convey our story as well as we can.
In the following article, I’ll discuss Seaborn and why I prefer it to other libraries. I’ll also give my top 3 charts that I use daily.
Popular Python charting libraries are surprisingly few and far between because it’s hard to make a one-fits-all setting: think Matplotlib designed to be reflective of the Matlab output and ggplot as the pullover from the R version.
As to reasons why I prefer Seaborn against other top libraries:
- Seaborn requires a lot less code than Matplotlib to make similar high-quality output
- Chartifys’ visuals aren’t that great (sorry Spotify — it’s just a bit too blocky).
- ggplot doesn’t seem to be native to Python so it feels like I’m always stretching to make it work for me.
- Plotly has a ‘community edition’ which makes me feel uncomfortable with this worry of licensing so I generally stay away from anything involving legal sign-offs. Design-wise and functionality it’s actually a pretty good and has a broad set of offerings, but I’d say for the added headache, it’s not that much (if at all) better than Seaborn.
Most importantly, a Researcher spends a lot of their time plotting distributions and if you can’t plot distributions easily, your plotting package is essentially redundant. Seaborn intersects histograms and KDE’s perfectly which other packages really struggle to do (Plotly is the exception here).
Finally, Seaborn has the whole design side of things covered which leaves you, the researcher, with more time to research. Matplotlib sucks for visuals and Chartify is too blocky for my liking.
I’m going to keep my conclusion short and sweet: Seaborn is awesome. There’s no hiding that I use it a lot more than other libraries and recommend you to do the same. Let’s now move onto some charts that I use daily.
If you’ve found a random variable whose distribution makes for an interesting story, then Seaborn's displot function works great. It helps to convey the picture by showing:
- The underlying empirical distribution in the form of a histogram
- A Kernel that’s been approximated over the top to give a smoothed picture
The colours (a nice translucent unoffensive blue) with the grid lines and clear fonts make for a simple and effective offering!
Here we try to convey a bit more of a complicated dynamic. We have two variables that we feel should be related but how can we visualise this relationship?
The two distributions plotting on the sides of the chart are great for visually seeing how the marginal distributions look but the area plot is perfect for identifying those areas where a concentration of density arises.
I use this plot in both my research and in my decks as it allows me to keep the univariate dynamics (with the kernel plots) and the joint-dynamics in the forefront of my thought and my audiences: all whilst conveying the story that I’m trying to paint. It’s been super useful in layering discussion and I’d highly recommend it.
Box and Whisker Plots
The problem with distributional plots is that they can often get skewed by outliners which really distorts a story unless you know that those outliers exist and you deal with them in advanced.
Box plots are used so widely as its an effective way to display robust metrics like the median and the interquartile range, which are much more resilient to outliers (due to their high breakdown point),
Seaborn's implementation of the box-plot looks fantastic as it’s able to convey a fairly complicated story by highlighting a number of dimensions, whilst also, looking visually good enough to be fit for an academic journal. Moreover, Seaborn also does a fantastic job of making the code incredibly efficient so the researcher doesn’t have to spend time plot to make it readable.
Being able to discern and discuss a multitude of features and patterns at the same time is imperative to the success of your research, so I highly recommend using this chart. At the same time, you need to make sure you target the chart for your audience: at times you don’t want to go into too much detail!
In the above article, I broadly discussed why for me, Seaborn is the best plotting package and I give my top 3 examples of charts that I use. I’m a strong believer of conveying a message in an easy and understandable manner: the fewer words the better! Cogency is key!
These charts make it so much easier for you to do that and so if you’re a visual thinker, a storyteller, or if you love to see the big picture, then Seaborn is for you.
Thanks again, message me if you have any questions!
The following pieces of code are the simple snippets to recreate the awesome charts above!
Figure 0: Pair Plotting
import seaborn as sns
df = sns.load_dataset(“iris”)
Figure 1: Univariate Distribution
x = np.random.normal(size=100)
Figure 2: Joint Distribution
mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x", "y"])
sns.jointplot(x="x", y="y", data=df, kind="kde");
Figure 3: Lots of Joint Distributions
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris)
Figure 4: Box and Whisker Plot
import seaborn as sns
import matplotlib.pyplot as plt
# Initialize the figure with a logarithmic x axis
f, ax = plt.subplots(figsize=(7, 6))
# Load the example planets dataset
planets = sns.load_dataset("planets")
# Plot the orbital period with horizontal boxes
sns.boxplot(x="distance", y="method", data=planets,whis="range", palette="vlag")
# Add in points to show each observation
sns.swarmplot(x="distance", y="method", data=planets,size=2, color=".3", linewidth=0)
# Tweak the visual presentation