What does the keyword “yield” do in Python?

Handling Python Memory Issues when faced with Big Data

Smileys [Pixabay]

As the programming language Python develops over time, added functionality improves both its usability and performance. Python has become (if not) the foremost language in the Data Science and its handling of big data sets is amongst one of the reasons why.

It’s no wonder that the language is highly favoured with libraries like Pandas and core functionality like Generators that are both so easy to use. Commentary does exist of new languages like Julia or Go taking precedence but regardless, Python is here for the long run.

Researchers from all corners of academia complain about computational memory issues. Large data-sets are inherent to the problem in fields like Bioinformatics, Finance and broadly Machine Learning, so efficient and effective Memory Handling is required as a standard.

With Big Data comes Big Memory Issues

Let’s take the following example. Say we want to count how many rows are in a file so we write some inefficient code as follows:

open_file = read_csv("standard_csv_file.csv")
row_count = 0

for row in open_file:
row_count += 1

print("Row count is {}".format(row_count))

Now looking at this example, the function read_csv opens the file and loads all the contents into open_file. Then the program iterates over this list and increments row_count.

However, once the file size grows big (let’s say, to 2gb), does this piece of code still work as well? What if the file size is larger than the memory you have available?

Now, unfortunately, if the file size is stupendously large, you’ll probably notice your computer slows to a halt as you try to load it in. You might even need to kill the program entirely.

So, how can you handle these huge data files?

Generator functions allow you to declare a function that behaves like an iterator. Once an item has been presented from the iterator it’s expected to not be used again and can be cleared from memory. That means at any one point in time you only have one item in memory, rather than the entire problem set.

More Smileys [Pixabay]

So in terms of counting rows in a file, we now load up one row at a time instead of loading the whole file all at once. To do this, we can simply rework the code and introduce the keyword yield:

def read_csv(file_name):
for row in open(file_name, "r"):
yield row

By introducing the keyword yield, we’ve essentially turned the function into a generator function. This new version of our code opens a file, loops through each line, and yields each row.

The Python Yield Statement

When the code you’ve written reaches the yield statement, the program will suspend execution there and return the corresponding value to you. Now when a function is suspended in this case, the state of the function is saved somewhere magical. Everything linked to the state of that function is saved, including any variable bindings local to the generator, the instruction pointer, the internal stack, and any exception handling.

Let’s make a literal example. If the yield statement pauses code and suspends execution, then calling the function again should continue where it left off, so let’s make a function with multiple yield statements:

>>> def double_yield():
... yield "This will print string number one"
... yield = "This will print string number two"

>>> double_obj = double_yield()
>>> print(next(double_obj))
This will print string number one
>>> print(next(double_obj))
This will print string number two
>>> print(next(multi_obj))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

Given the advent of Big Data, large data sets are incredibly prevalent these days so memory-efficient coding is a must for Data Scientists and Machine Learning practitioners alike.

The article above highlights key benefits to using Generators and an example is shown in which a Generator is clearly favoured to not using one as its shown to greatly enhance memory handling when faced with large data sets.

Generators have become an integral part of my coding and as a practitioner myself, I encourage you to try them out!

Thanks again! Please send me a message if you have any questions! =]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: