The Top 4 Virtual Environments in Python for Data Scientists

Which Environment Is Yours?

Photo by Shahadat Rahman on Unsplash

Virtual Environments are a relatively difficult thing for new programmers to understand. One problem I had in understanding Virtual Environments was that I could see my environment existed within an MacOS framework, I was using PyCharm and my code was running, what else did I need?

However, as your career as a Data Scientist or Machine Learning Engineer progresses, you realise that you get these annoying as hell dependency issues between projects and as an amateur who’s self taught in this space (as many readers here), it just takes forever to figure it out.

In what follows, I go through the most common virtual environments and why/when you should use which. To be honest, you should probably use Docker as it’s the latest technology and it’s what everyone is using (and if you’re interviewing, you’ll be asked about it). I talk about Docker here.

However, it’s super important to appreciate existing technology and how it works. Here it goes!


VENV

Photo by Lucrezia Carnelos on Unsplash

VirtualEnv (or Venv for short) was (and kind of still is) the default virtual environment for most programmers. You can install is using pip as follows

pip install virtualenv

and once it’s installed, go to your chosen director and to create a virtual environment, run the following command:

python3 -m venv env

Before you can start installing or using packages in your virtual environment you’ll need to activate it. Activating a virtual environment will put the virtual environment-specific python and pip executables into your shell’s PATH.

source env/bin/activate

And now that you’re in an activated virtual environment, you can start installing libraries as normal:

pip install requests

Finally, to make your repo reusable, make sure to create a record of everything that’s installed in your new environment, run

pip freeze > requirements.txt

If you are creating a new virtual environment from a requirements.txt file, you can run

pip install -f requirements.txt

If you open your requirements file you will see a different package with its version in each line.

Finally, to deactivate the virtual environment, you can simply use the deactivate command to close the virtual environment. If you want to re-enter the virtual environment just follow the same instructions above about activating a virtual environment. There’s no need to re-create the virtual environment.

So we can see that so far, we’ve had to manually create a virtual environment, we’ve had to then activate it and then also freeze the session and save everything into a requirements.txt file to make it portable. But what if we didn’t have to have to this two part process?

Enter pipenv.


PipEnv

While venv is still the official virtual environment tool that ships with the latest version of Python, Pipenv is gaining ground in the Python Community.

For example, in what we just described about with venv, in order to create virtual environments so you could run multiple projects on the same computer you’d need:

  • A tool for creating a virtual environment (likevenv)
  • A utility for installing packages (like pip or easy_install)
  • A tool/utility for managing virtual environments (like virtualenvwrapper or pyenv)

Pipenv includes all of the above, and more, out of the box.

Moreover, Pipenv handles dependency management really well compared to requirements.txt and pip freeze. Pipenv works the same as pip when it comes to installing dependencies and if you get a conflict you still have to manage it (although you can issue pipenv graph to view a full dependency tree, which should help).

But once you‘ve solved the issue, Pipfile.lock keeps track of all of your application’s interdependencies, including their versions, for each environment so you can basically forget about interdependencies. This is really a step up.

To install pipenv, you need to install pip first. Then do

pip install pipenv

Next, you create a new environment by using

pipenv install

This will look for a pipenv file, if it doesn’t exist, it will create a new environment and activate it.

To activate you can simply run the following command:

pipenv shell

To install new packages in this environment you can simply use pip install package , and pipenv will automatically add the package to the pipenv file that’s called Pipfile.

You can also install package for just the dev environment by calling

pip install <package> --dev

And once you’re ready to ship to production, all you do is:

pipenv lock

This will create/update your Pipfile.lock, which you’ll never need to edit manually. You should always use the generated file. Now, once you get your code and Pipfile.lock in your production environment, you should install the last successful environment recorded:

pipenv install --ignore-pipfile 

This tells pipenv to ignore the pipfile for installation and use what’s in the Pipfile.lock. Given this Pipfile.lock, pipenv will create the exact same environment you had when you ran pipenv lock, sub-dependencies and all.

The lock file enables deterministic builds by taking a snapshot of all the versions of packages in an environment (similar to the result of a pip freeze).

There you have it! Now we’ve compared pipenv and venv and shown that pipenv is a much easier solution.


Conda Environment

Photo by Marius Masalar on Unsplash

Anaconda is distribution of Python that makes it simple to install packages and it’s generally a good place for Python beginners. At the same time, Anaconda also has its own virtual environment system conda. Similar to the above, to create the environment:

conda create --name environment_name python=3.6

You can save all the info necessary to recreate the environment in a file by calling

conda env export > environment.yml

To recreate the environment you can do the following:

conda env create -f environment.yml

Last, you can activate your environment with the invocation:

conda activate conda-env 

And deactivate it with:

conda deactivate

Environments created with conda live by default in the envs/ folder of your Conda directory.

Now in my experience, conda is OK but I prefer the approach taken by venv for two reasons. ️Firstly, iIt makes it easy to tell if a project utilises an isolated environment by including the environment as a sub-directory.

Further, It allows you to use the same name for all of your environments, meaning you can activate each with the same command. However as conda puts environments in a certain folder (rather than initiating the environment), it makes it easier to make an environment.


Docker

Photo by Iswanto Arif on Unsplash

In a previous blog post I talk about Docker and go into detail explaining how to use it, so I won’t bore you here.

Docker is a library that creates docker containers. These containers contain images of how your operating system looks like, whereas virtualenv only looks at the dependency structure of your python project. So, a virtualenv only encapsulates Python dependencies. A docker container encapsulates an entire OS.

Because of this, with a Python virtualenv, you can easily switch between Python versions and dependencies, but you’re stuck with your host OS. However with a docker image, you can swap out the entire OS — install and run Python on Ubuntu, Debian, Alpine, even Windows Server Core.

There are Docker images out there with every combination of OS and Python versions you can think of, ready to pull down and use on any system with docker installed.


If you think about each of the environments listed above, you’ll realise that there’s a natural divide between them. Conda is better suited (naturally) for those whoa re using the Anaconda distribution (so mostly for beginners in Python) whereas pipenv and venv are for those individuals who are more seasoned and know the ropes. Of these two, if you’re something from scratch i’d really recommend to go with pipenv as it’s just been built with the difficulties of venv in mind.

However, Docker is both easy to use and has such widespread recognition that you just have to know how this works. They all actually work out of the tin and do what they need to, but the portability between operating systems is what makes Docker the real stand out because when it comes to production, you don’t need to worry about the OS on your server as the container has it all sorted for you.


Thanks for reading! If you have any messages, please let me know!

Keep up to date with my latest articles here!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: