How to Join DataFrames in Pandas in Python

{inner, outer, left, right}

Photo by Sid Balachandran on Unsplash

In 2008, Wes Mckinney was at the Hedge Fund AQR and developed a small piece of software which became the pre-cursor to Pandas, developing and finalising it later on. Since then, Pandas has become one of the most important Python libraries that most Data Scientists (if not all) use on a daily basis.

It allows users to manage and manipulate data with levels of efficiency not seen before. From loading data, managing missing data, pivoting and reworking data: Panda’s doesn’t (really) achieve anything new, but, existing tools were inefficient, whereas Panda’s makes light work of these complicated tasks.

Now regardless of whether you use SQL or Pandas, you need to know how to join tables. A table join is a process by which you combine two separate ‘tables’ (or in Pandas land, DataFrames) together.

Different types of joins [source]

Let’s play with some actual data and say we have the following DataFrame (or in SQL land, Table):

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'],'C': ['C0', 'C1', 'C2', 'C3'],'D': ['D0', 'D1', 'D2', 'D3']},index=[0, 1, 2, 3])

which looks something like this:

Note that the index values have been set to [0,1,2,3]. Remember that an index to a table is an identifier to the location of a particular row in a table. For example, your office may be on the 3rd floor in an building. But in some other buildings, the may call the 3rd floor something arbitrary “Floor C” for example. In this case, Floor C and Floor 3 are equivalent, their just identifiers of a location, like an index.

Say we want to combine df1 with the following DataFrame, df4:

df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'], 'D': ['D2', 'D3', 'D6', 'D7'],'F': ['F2', 'F3', 'F6', 'F7']}, index=[2, 3, 6, 7])

which looks like this:

Now depending on how we want to combine the data depends on what type of a join we use. Note that we generally ‘join’ tables (or DataFrames) based on their index.

If we want to combine the data so we only have index rows that are in both A and B, then we would use an ‘inner’ join.

Inner Join

So as you can see, here we simply use the pd.concat function to bring the data together, setting the join setting to 'inner’:

result = pd.concat([df1, df4], axis=1, join='inner')

which as can be seen, only has index rows [2,3] . That’s because index values of 2 and 3 exist in both DataFrames 1 and 4.

Now all the other rows (or index values) in each DataFrame that are not in the other DataFrame (e.g. df1 contains index row values of [0, 1] which are not in df4, these are thrown away).

Now if we want to retain all the index values from both DataFrame, we can use an Outer Join.

Outer Join

So when we change the join setting to outer, we can see that now, all index values are present ({0,1,2,3} from df1, and {2,3,6,7} from df2).

Let’s now move onto something which some of you may have deduced:

Left Join

Say we want to stick to the index values in df1, for that, we would then do a left join:

df1.merge(df4, how=’left’)

This would give us the corresponding DataFrame:

and likewise on the other side:

Right Join

Say we want to stick to the index values in df4, for that, we would then do a right join:

df1.merge(df4, how=’right’)

This would give us the corresponding DataFrame:


So there you go! Definitely take this away and have a go yourself. Joins are a bit confusing and it takes a few different tries to fully get your head around it but try to remember the following for if you have two tables A and B:

  1. Inner joins only take index rows for what’s in A and B
  2. Outer joins make a new table that contain index values for both A or B, and set the missing segments to nan.
  3. Left joins make a new table that contain index values for both only the ‘left’ table, which in our case would be df1, or A.
  4. Right joins make a new table that contain index values for both only the ‘right’ table, which in our case would be df4, or B.

Thanks for reading again!! Let me know if you have any questions and I’ll be happy to help.

Keep up to date with my latest work here!

Photo by Charles Ray on Unsplash

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: