Why I hate virtualenv and pip

Update 20th July 2015:

I wrote this over 18 months ago mostly in response to debates that had been raging for over a year at my place of work. I felt that virtualenv was fine, but dispensable and was consistently not helping our projects to build and run out of the box when a new developer would need to work on them. I felt solving this obsoleted virtualenv and I still do. My article was deliberately vitriolic because I was annoyed by the bickering in my place of work and I wanted to polarise readers and find out what people really think in the real world.

With the proliferation of container technology, the standardisation of virtualenv and pip into python3 and the fact that I don’t work in that workplace anymore, this article and the opinions in it at pretty much obsolete. So I’m going to lock the comments and forget about it.

Thanks world, it’s been really interesting.

Original article begins below…

I don’t like virtualenv and I don’t like pip. I think they are not only unnecessary, but that they are misleading and harmful. Python programmers are generally not going to agree with me. Virtualenv and pip are almost defacto standards among much of the python community. This is why I am taking the time to write this, because I know how I sound when voice this opinion. Sure, I frequently go ahead and voice it anyway because I like to wind people up, but I’m conscious that I don’t fully justify myself verbally. Instead of trying to articulate the nuances behind my view, I hope to just point people to this article instead. Maybe I’ll get some support, which so far I’ve had almost none of. Maybe, once my arguments are fully understood, they’ll be soundly refuted. I’m pretty happy either way.

Virtualenv and the illusion of isolation

Isolation and repeatable clean room development without hidden dependencies on the base system is a good thing. Virtualenv’s primary aim is to conveniently provide python-level isolation. For python packages that depend on system libraries, only the python-level part of those packages are isolated. Provided the developer is fully conscious that this is python-level-only isolation, then it is useful. If the developer lapses into believing their virtualenv provides true isolation, then the net result is negative.

Full methods of isolation make virtualenv redundant

There are isolation methods that isolate the entire root filesystem. A heavy weight but comprehensive option is a virtual machine running under a hypervisor. Workflows to assist with this have been provided by software such as Vagrant for some time. At the other end of the spectrum are chroot environments and especially light weight operating system level containers such as LXC on Linux. LXC can even leverage a copy-on-write filesystem such as btrfs to the creation of an environment to be even faster use less disk space than a virtualenv.

Virtualenv for deployment is an antipattern

I can sense some readers bristle at the mention of tech such as LXC. We cannot require our target environment to be LXC-capable or for root access (which LXC still requires) to be granted simply to deploy our application! My response to this is that virtualenv is not useful for deployments at all. As stated already, virtualenv’s value lies only in conveniently allowing a user to _interactively_ create a python sandbox. Deployment should be at least semi-automatic and easy to repeat, thus scripting virtualenv to do what is convenient to do manually is actually more work than just setting up your PYTHONPATH variable properly in your entry points. It is very, very easy to install something as large as a Django application into a prefix. Easier, I would argue, then indirectly driving virtualenv and messing with python shebangs. And lets not forget that if you don’t have control over your target environment, you’re going to have to politely ask for the mysql client libraries and header files to be installed, system-wide, so you can *compile* mysql-python against them during deployment! Shipping software commercially is hard, and virtualenv doesn’t help at all.

Virtualenv is full of messy hacks

When you install a virtualenv, it’s not empty. In lib/ you’ll have a copy of the python standard library. In include/, a bunch of python headers. These appear spurious to me (but more in the next section), but it’s bin/ that bothers me the most. In bin/ you’ll have pip and easy_install. Virtualenv has munged both of their shebangs to run a copy of the python binary that sits beside them in the same directory. Any other scripts provided by packages will get their shebangs similarly munged too. You need to preserve this behaviour right down the line if you want to run things in this virtualenv from the outside, like a cron job. You will need to effectively hardcode the path of the virtualenv to run the correct python. This is at least as fiddly as manually setting up your PATH/PYTHONPATH. It’s actually way easier to do neither, but I’ll come back to that shortly…

I forgot to mention bin/activate

Sets PATH and changes your prompt. If you find this exciting, you’ve been living under a rock. Same goes for virtualenv wrapper. .NET developers on Windows are mocking you.

–no-site-packages

Virtualenv will fuck with sys.path in one of two ways. The –system-site-packages option will prepend the virtualenv site-packages to the existing paths so that your globally installed python modules can be used in the virtualenv. The default is –no-site-packages, which will make sure nothing from the global python installation will be loadable within the virtualenv. This would be why there are copies of things like the stdlib and the headers cluttering up the virtualenv. I find the existence of this option and the choice of it as a default very telling. Clearly virtualenv advocates don’t want any hidden dependencies or incorrect versions leaking into their environment. However their virtualenv will always be on the path first, so there’s little real danger (I haven’t forgotten about pip freeze – that’s coming later). It’s somewhat paranoid, but here lies the paradox. They never had complete isolation in the first place! What is the use of being 100% sure you’re not using the system version of the mysql-python python package when you are also 100% sure that you ARE using the system version of libmysqlclient! You can’t care and not care about isolation at the same time.

Pip and virtualenv seem to be buddies

It’s because they are both written by Ian Bicking. Both programs promote his personal philosophy and workflows. I don’t like virtualenv, mostly because of what it makes people believe, but I can accept it has its place. Actually I use it sometimes for ad hoc throwaway tests. Pip on the other hand simply shouldn’t exist at all. Pip is just an almost-compatible alternative for easy_install with some additional features that I personally wish didn’t exist. Interactively and non-interactively from things like puppet and binary package building I don’t use it, preferring easy_install because I have a prejudice against pip. Unfortunately, this isn’t true. There’s something a lot more satisfying about typing “pip install” than “easy_install”. I can’t deny it. easy_install is a stupid name. Having an underscore in it isn’t marketable. I would speculate that this is at least part of the reason pip is popular.

Pip always, always builds from source

Eggs are to Pythons as Jars are to Java…

pip appears to have deliberately dropped easy_installs ability to install a package from a binary egg. Somebody has decided this is a bad idea, despite binary egg distrubtion being a well established and mature part of the python platform. Of course, always building from source is good because you don’t need a separate prebuilt egg for every different target system. It’s inversely bad when you know exactly what your target platform is and you don’t want to require a compiler to be present on it (the .NET and Java folks are mocking you again). Stupidest of all is if you’re using a virtualenv with –no-site-packages and compiling scores of python modules that you didn’t even write every time someone in your team wants to run up a dev environment in a SOE.

God damn requirements.txt

The python way for a package to depend on other packages is install_requires in setup.py. setupools/distribute provide this mechanism which is used routinely by both easy_install and pip to automatically download and satisfy dependencies from Pypi. For reasons which I’ll pretend not to understand for a few sentences, pip also allows you to specify a list of dependencies in a text file. Typically, this will be requirements.txt. The syntax is the same as what you get in setup.py, plus you can nest other requirements files and point directly to file paths, URIs and even things like Mercurial and Git repositories.

File paths, URIs and VCS I’ll address in the next section. I believe these features are opportunistic, not the reason we have requirements.txt. The real reason is because there are two classes of python projects – packages which are intended to be reused and use setup.py, and applications that use them. The sort of developers that only write applications don’t really understand packaging and are happy to hardcode an assortment of modules into their application and hook them in with the convenient requirements.txt. These developers will most likely tell people to set up a virtualenv and pip install -r requirements.txt.

The result is a subset of python developers who consider requirements.txt all they need. They never bother to learn about setuptools. They are easily seduced by the apparent convenience of pointing directly to tarballs floating about on the net, and various types of VCS URI. It irks me that they think this is fantastically pragmatic and evangelise virtualenv and pip as indispensable tools of a python programmer.

URIs as dependencies sucks

setuptools lets you specify a package name and a version match string and, by default, downloads this from Pypi. Pypi provides the index. You can provide your own simple HTML index pages too, and have them get checked first before Pypi. Whoever wrote this stuff was trying to get developers to depend on packages by name, not by physical location or transport protocol. They were doing it right.

If you point to local file paths or a remote tarball in requirements.txt, you’re hardcoding things you don’t need to. You aren’t using a package repository. People can’t set up mirrors. You can’t specify a minimum version, only an exact version. One day that revision of your code will cease to work because the object will no longer be there. This should be really obviously something we don’t want to do, right?

Then we have dependencies that look like this:

git+https://github.org/my/stupid/fucking/package#egg=1.2.3

This requires the user to now have git installed and for pip to download an entire clone. A lot of the time, people don’t even use the version notation and assume the master branch is stable. This is all uncool. I know it is currently fashionable to install things directly from DVCS, but committing these URLs into your project? This is questionable, but when it is coming at the expense of properly written setup.py files it’s a bad thing.

If you like pip freeze you’re doing it wrong

I’m good at managing and tracking my dependencies. I do it without pip freeze. One might use pip freeze to make sure they haven’t missed any python dependencies late in a dev cycle. If you think pip freeze is giving you a list of dependencies to paste into requirements.txt (which you also don’t need), then you’re using –no-site-packages (which you also don’t need) with virtualenv and a whole stack of your dependencies are system global and not python anyway. Oh, and it has no way of telling which are your direct dependencies and which were pulled in by others.

The other way to find these deps is to destroy your environment and recreate it. With virtualenv+pip, this is going to take you ages. With LXC CoW and prebuilt eggs of all your deps you’re not actively working on, you will catch your system-wide missing dependencies and your direct python package dependencies very quickly. There is nothing wrong with pip freeze as such, it’s just that people think it’s useful as a side effect of other anti patterns.

Conclusion

This is my critical, albeit entirely subjective and somewhat speculative, analysis of the utility of virtualenv and pip and the developer culture that surrounds them. I love python as a language, but less as a platform because the fragmented standards of packaging and development workflow. For me it means I spend less time working in python, and more time working against it. More time arguing with intelligent people who genuinely believe that virtualenv and pip are all they need to develop, collaborate and deploy their applications. I do not use virtualenv or pip to do python development. I hope this article shows, at the least, that it’s possible to understand these programs and still be critical of them.

Advertisements