Why I hate virtualenv and pip

I don’t like virtualenv and I don’t like pip. I think they are not only unnecessary, but that they are misleading and harmful. Python programmers are generally not going to agree with me. Virtualenv and pip are almost defacto standards among much of the python community. This is why I am taking the time to write this, because I know how I sound when voice this opinion. Sure, I frequently go ahead and voice it anyway because I like to wind people up, but I’m conscious that I don’t fully justify myself verbally. Instead of trying to articulate the nuances behind my view, I hope to just point people to this article instead. Maybe I’ll get some support, which so far I’ve had almost none of. Maybe, once my arguments are fully understood, they’ll be soundly refuted. I’m pretty happy either way.

Virtualenv and the illusion of isolation

Isolation and repeatable clean room development without hidden dependencies on the base system is a good thing. Virtualenv’s primary aim is to conveniently provide python-level isolation. For python packages that depend on system libraries, only the python-level part of those packages are isolated. Provided the developer is fully conscious that this is python-level-only isolation, then it is useful. If the developer lapses into believing their virtualenv provides true isolation, then the net result is negative.

Full methods of isolation make virtualenv redundant

There are isolation methods that isolate the entire root filesystem. A heavy weight but comprehensive option is a virtual machine running under a hypervisor. Workflows to assist with this have been provided by software such as Vagrant for some time. At the other end of the spectrum are chroot environments and especially light weight operating system level containers such as LXC on Linux. LXC can even leverage a copy-on-write filesystem such as btrfs to the creation of an environment to be even faster use less disk space than a virtualenv.

Virtualenv for deployment is an antipattern

I can sense some readers bristle at the mention of tech such as LXC. We cannot require our target environment to be LXC-capable or for root access (which LXC still requires) to be granted simply to deploy our application! My response to this is that virtualenv is not useful for deployments at all. As stated already, virtualenv’s value lies only in conveniently allowing a user to _interactively_ create a python sandbox. Deployment should be at least semi-automatic and easy to repeat, thus scripting virtualenv to do what is convenient to do manually is actually more work than just setting up your PYTHONPATH variable properly in your entry points. It is very, very easy to install something as large as a Django application into a prefix. Easier, I would argue, then indirectly driving virtualenv and messing with python shebangs. And lets not forget that if you don’t have control over your target environment, you’re going to have to politely ask for the mysql client libraries and header files to be installed, system-wide, so you can *compile* mysql-python against them during deployment! Shipping software commercially is hard, and virtualenv doesn’t help at all.

Virtualenv is full of messy hacks

When you install a virtualenv, it’s not empty. In lib/ you’ll have a copy of the python standard library. In include/, a bunch of python headers. These appear spurious to me (but more in the next section), but it’s bin/ that bothers me the most. In bin/ you’ll have pip and easy_install. Virtualenv has munged both of their shebangs to run a copy of the python binary that sits beside them in the same directory. Any other scripts provided by packages will get their shebangs similarly munged too. You need to preserve this behaviour right down the line if you want to run things in this virtualenv from the outside, like a cron job. You will need to effectively hardcode the path of the virtualenv to run the correct python. This is at least as fiddly as manually setting up your PATH/PYTHONPATH. It’s actually way easier to do neither, but I’ll come back to that shortly…

I forgot to mention bin/activate

Sets PATH and changes your prompt. If you find this exciting, you’ve been living under a rock. Same goes for virtualenv wrapper. .NET developers on Windows are mocking you.

–no-site-packages

Virtualenv will fuck with sys.path in one of two ways. The –system-site-packages option will prepend the virtualenv site-packages to the existing paths so that your globally installed python modules can be used in the virtualenv. The default is –no-site-packages, which will make sure nothing from the global python installation will be loadable within the virtualenv. This would be why there are copies of things like the stdlib and the headers cluttering up the virtualenv. I find the existence of this option and the choice of it as a default very telling. Clearly virtualenv advocates don’t want any hidden dependencies or incorrect versions leaking into their environment. However their virtualenv will always be on the path first, so there’s little real danger (I haven’t forgotten about pip freeze – that’s coming later). It’s somewhat paranoid, but here lies the paradox. They never had complete isolation in the first place! What is the use of being 100% sure you’re not using the system version of the mysql-python python package when you are also 100% sure that you ARE using the system version of libmysqlclient! You can’t care and not care about isolation at the same time.

Pip and virtualenv seem to be buddies

It’s because they are both written by Ian Bicking. Both programs promote his personal philosophy and workflows. I don’t like virtualenv, mostly because of what it makes people believe, but I can accept it has its place. Actually I use it sometimes for ad hoc throwaway tests. Pip on the other hand simply shouldn’t exist at all. Pip is just an almost-compatible alternative for easy_install with some additional features that I personally wish didn’t exist. Interactively and non-interactively from things like puppet and binary package building I don’t use it, preferring easy_install because I have a prejudice against pip. Unfortunately, this isn’t true. There’s something a lot more satisfying about typing “pip install” than “easy_install”. I can’t deny it. easy_install is a stupid name. Having an underscore in it isn’t marketable. I would speculate that this is at least part of the reason pip is popular.

Pip always, always builds from source

Eggs are to Pythons as Jars are to Java…

pip appears to have deliberately dropped easy_installs ability to install a package from a binary egg. Somebody has decided this is a bad idea, despite binary egg distrubtion being a well established and mature part of the python platform. Of course, always building from source is good because you don’t need a separate prebuilt egg for every different target system. It’s inversely bad when you know exactly what your target platform is and you don’t want to require a compiler to be present on it (the .NET and Java folks are mocking you again). Stupidest of all is if you’re using a virtualenv with –no-site-packages and compiling scores of python modules that you didn’t even write every time someone in your team wants to run up a dev environment in a SOE.

God damn requirements.txt

The python way for a package to depend on other packages is install_requires in setup.py. setupools/distribute provide this mechanism which is used routinely by both easy_install and pip to automatically download and satisfy dependencies from Pypi. For reasons which I’ll pretend not to understand for a few sentences, pip also allows you to specify a list of dependencies in a text file. Typically, this will be requirements.txt. The syntax is the same as what you get in setup.py, plus you can nest other requirements files and point directly to file paths, URIs and even things like Mercurial and Git repositories.

File paths, URIs and VCS I’ll address in the next section. I believe these features are opportunistic, not the reason we have requirements.txt. The real reason is because there are two classes of python projects – packages which are intended to be reused and use setup.py, and applications that use them. The sort of developers that only write applications don’t really understand packaging and are happy to hardcode an assortment of modules into their application and hook them in with the convenient requirements.txt. These developers will most likely tell people to set up a virtualenv and pip install -r requirements.txt.

The result is a subset of python developers who consider requirements.txt all they need. They never bother to learn about setuptools. They are easily seduced by the apparent convenience of pointing directly to tarballs floating about on the net, and various types of VCS URI. It irks me that they think this is fantastically pragmatic and evangelise virtualenv and pip as indispensable tools of a python programmer.

URIs as dependencies sucks

setuptools lets you specify a package name and a version match string and, by default, downloads this from Pypi. Pypi provides the index. You can provide your own simple HTML index pages too, and have them get checked first before Pypi. Whoever wrote this stuff was trying to get developers to depend on packages by name, not by physical location or transport protocol. They were doing it right.

If you point to local file paths or a remote tarball in requirements.txt, you’re hardcoding things you don’t need to. You aren’t using a package repository. People can’t set up mirrors. You can’t specify a minimum version, only an exact version. One day that revision of your code will cease to work because the object will no longer be there. This should be really obviously something we don’t want to do, right?

Then we have dependencies that look like this:

git+https://github.org/my/stupid/fucking/package#egg=1.2.3

This requires the user to now have git installed and for pip to download an entire clone. A lot of the time, people don’t even use the version notation and assume the master branch is stable. This is all uncool. I know it is currently fashionable to install things directly from DVCS, but committing these URLs into your project? This is questionable, but when it is coming at the expense of properly written setup.py files it’s a bad thing.

If you like pip freeze you’re doing it wrong

I’m good at managing and tracking my dependencies. I do it without pip freeze. One might use pip freeze to make sure they haven’t missed any python dependencies late in a dev cycle. If you think pip freeze is giving you a list of dependencies to paste into requirements.txt (which you also don’t need), then you’re using –no-site-packages (which you also don’t need) with virtualenv and a whole stack of your dependencies are system global and not python anyway. Oh, and it has no way of telling which are your direct dependencies and which were pulled in by others.

The other way to find these deps is to destroy your environment and recreate it. With virtualenv+pip, this is going to take you ages. With LXC CoW and prebuilt eggs of all your deps you’re not actively working on, you will catch your system-wide missing dependencies and your direct python package dependencies very quickly. There is nothing wrong with pip freeze as such, it’s just that people think it’s useful as a side effect of other anti patterns.

Conclusion

This is my critical, albeit entirely subjective and somewhat speculative, analysis of the utility of virtualenv and pip and the developer culture that surrounds them. I love python as a language, but less as a platform because the fragmented standards of packaging and development workflow. For me it means I spend less time working in python, and more time working against it. More time arguing with intelligent people who genuinely believe that virtualenv and pip are all they need to develop, collaborate and deploy their applications. I do not use virtualenv or pip to do python development. I hope this article shows, at the least, that it’s possible to understand these programs and still be critical of them.

56 thoughts on “Why I hate virtualenv and pip

  1. Pingback: Why I hate virtualenv and pip « Boardmad

  2. Pingback: Why I hate virtualenv and pip | Enjoying The Moment

    • Ah, the typical fallacy. So what this even means? That those who come with criticisms must be prepared enough to elaborate alternatives? You should realize that elaborating alternatives is an order of magnitude harder than plainly say the actual way of doing things is wrong, plus before anyone knowing that we need alternative must be many criticisms out there. An argument is strong when the reasons contained are strong and nothing more.

      • I’d elaborate by saying that I’ve had heated arguments with developers who insist virtualenv is comparable to full isolation. Isolation methods are abound now, you just need to look. chroot-based, LXC, Vagrant, Vmware, Docker. My entire point is virtualenv is a placebo for those proper solutions.

    • I’m tired of reading that everywhere lately. Not everyone is an inventor — some are content to actually *use* software, and provide really good or not feedback about what it’s like to use your software.

  3. To be perfectly honest, I think your arguments are weak and you make no suggestion for improvement.
    Like the “take down the government” people, OK, assume you’re right, take them down, but what next?

    Saying that virtualenv is a hack because it fakes a few things is implying that virtual machines are a some huge hack, as they fake everything. Is it not?

    • No improvement to make. Virtualenv solves the wrong problem. A virtual machine with a full blown hypervisor gives system level isolation, but it’s heavy. I suggested LXC, which is a lot lighter.

      • Good example. First off because the non-python deps are going to cause you more trouble. Second, because it demonstrates there’s a big difference between your desktop environment and your dev runtime. Unless you’re actually developing for OSX, you probably want to be running a hypervisor with a Linux virtual machine on it. Under *there* is where our debate takes place. You could run LXC under there. If that sounds odd, I would advocate Vagrant over Virtualenv for OSX development for Linux production environments.

      • You’re missing the point. It would not be unreasonable for a javascript or, say, a ruby process to call out to a system program. That system program could be written in python. Of course, those sold on virtualenv will naturally have an aversion to anything that isn’t python because it doesn’t fit into their workflow.

      • NPM is sane dependency management? Because npm recursively installs dependencies it creates an Ops nightmare and bloats applications. Why do I need 10 copies of requests when all of the dependencies use the same version of requests? How can force all of those dependencies to use a non-exploitable version of library X? Joyent even says[1] “don’t use npm for production deployment even when using npm-shrinkwrap.” The Joyent guys build using npm, test their app, and then they deploy a tarball. How is that a package manager if you can’t (shouldn’t) deploy with it?

        [1] http://blog.nodejs.org/2012/02/27/managing-node-js-dependencies-with-shrinkwrap/

    • That’s because “take down the government” *is* the alternative not the argument. The argument is that organisms have an innate ability to self-organize, and you’re invited to criticise it without giving any alternative.

    • > Like the “take down the government” people, OK, assume you’re right, take them down, but what next?

      Some of us “take down the government” people don’t advocate for anything next. That’s the point :)

      > Saying that virtualenv is a hack because it fakes a few things is implying that virtual machines are a some huge hack, as they fake everything. Is it not?

      Not at all – the core of his argument seems to be that virtualenv only fakes the Python part. A VM reproduces the entire system, thereby eliminating the problems that are introduced with virtualenv.

      An example: Say I have a project with PIL in the requirements.txt. On my Mac, I can `pip install -r requirements.txt` and be up an running. On Linux, you’re likely to be able to do the same. On Windows, you’re going to get to figure out how to install binary packages into a virtual environment, or you’re going to install it system-wide using a .exe.

      Had I packaged my project with a Vagrantfile instead of (or in addition to) a requirements.txt, the Windows user wouldn’t have such trouble.

      I’m not subscribing to Andrew’s argument here – I’m merely recognizing that it has merit. Being able to create a fully reproducible Python environment isn’t something that I’ve found to be easy (or even 100% possible) today. As more and more scientific work is done in tools like iPython and SciPy, this is something we need to solve. While virtualenv+pip is part of my daily workflow and work quite well for what they are, they won’t be the tools that get us to truly reproducible results.

  4. So here’s how we deploy python projects. We use requirements.txt, virtualenv, and easy_install along with our own repository of eggs. We build a virtualenv with all the requirments from requirements.txt, install the project into the virtualenv, put everything inside a debian package, slap on all the OS level dependencies, version it with git tags and put the whole thing in a debian repository so that it can be deployed with “sudo apt-get install”.

    This actually works out pretty well but the bottom line is that all of it is a hack and it’s not like it’s any better in the Ruby or JavaScript world. Those worlds also have their own hacks. I personally prefer Ruby for personal projects and the situation there is even worse because there are things like rvm, rbenv, Gemfile, a few other things used to isolate Ruby environments and project dependencies. I don’t really know what the Node.js guys do but I hear they have something called npm.

    All of the above is the status quo but lately I have been using debootstrap and schroot to isolate my environments and it’s pretty awesome. I no longer have to worry about bullshit like Gemfile and Bundler and simply install all the gems and libraries I need with “gem” and “apt-get”. If I want a deployable diff of everything then I just use aufs to mount a writable branch on top of the chroot, install all my dependencies, put the top level directory in a debian package and then just ship that or the entire chroot with the writable branch and just set it up on the production box. This setup gives me a truly isolated lightweight environment and reproducable deployment environments in production. The problem is that I can’t quite convince people that this is the right approach because they are so used to rvm, rbenv, Gemfile, virtualenv, requirements.txt, pip and Bundler that to say otherwise is almost blasphemous. Even though we now have way better tools for isolating our environment and creating truly reproducible deployment artifacts people still prefer shitty hacks.

  5. I like your point of view, but ch0p is right. You should explain begter the the your right way for newbie like me :-)

    I quote this sentence: ” I’m good at managing and tracking my dependencies. I do it without pip freeze. One might use pip freeze to make..”

    I would appreciate if you write a post on how to manage dependency or just some links instead

    • So for a newbie, imagine if virtualenv didn’t exist. Instead of wasting time on that sugar pill, you’d suffer without isolation until you discovered proper isolation. If you’re on Linux, check out LXC and it’s command line tools. You’ll be impressed. If you’re on Windows, you have bigger problems :)

  6. I actually totally agree with much in this post, I hate requirements files and git paths to packages.requirements

    I think we would all agree system level isolation is > than a virtually isolated environment. That said tools for LXC usage are still young and I don’t really want a single VM for every project. Sometimes you need a box which runs multiple python applications all with different deps, installing these into a venv is a simple easy way of isolating these.

    • Yeah sure. If you want to manipulate PYTHONPATH and PATH with virtualenv then who’s to argue. If that was all people claimed about it I’d not have written the article. It was written in response to heated debates with people who I respect who claim virtualenv is some kind of panacea.

  7. Completely agree with this view – the fact that Virtualenv hacks around with my path, dependencies and auto-magically isolates them when it doesn;t consider _their- dependencies and source-lib requirements is insane, it means there’s just another layer of magic between me and what I’m building, which means it’s just another thing to have to debug if something goes wrong, and who in their right minds wants to debug path-level dependencies?

  8. Pingback: Why I hate virtualenv and pip | << nekoj ...

  9. I came to Python from a Ruby background and found getting a working environment was a lot more problematic.

    Do you think these tools have anything to learn from the tools in the Ruby ecosystem i.e. Bundler, RVM and rbenv?

    • Maybe a little, but I don’t see full isolation as the responsibility of a high level language with loads of bindings to the underlying system. There is only so much Python or Ruby *can* do to isolate themselves. As I’ve said elsewhere, if I found it convenient for a python project to call out to ruby (for example, to run puppet or something), then I would not be served by either virtualenv or rvm.

  10. I think you’re missing part of the problem; I’ve written a couple of rants about Python packaging myself (e.g. http://ollivander.franzoni.eu/2013/01/21/python-packaging-woes/ ).

    1) It’s Python packaging that is broken, not virtualenv; python packaging and deployment tools were driven by several different parties with different motivations (core developers, original setuptools creator, then forked, then pip, with virtualenv or zc.buildout for more input); now the Python Packaging Authority (https://github.com/pypa) is trying to fix that and steer the community towards shared tools and best practices that work together. That’s not an easy task and will require quite a lot of time.

    2) What happens if you’d like to install two/three/four different Python applications, with possibly different, conflicting upstream dependencies, on just one server? Virtualenv for deployment (especially with relocatable environments) makes this possible; you just pick virtualenv with all the deps, add the system-level dependencies, and you’re done.

    3) Saying that Python requires a whole OS-level as its dependency… would make Python a very poor language choice for almost any situation. Think about all the people who dislike Java because “it’s heavyweight” – and most of those love Python. Java only has a VM and a runtime – but how would you classify a language that requires a whole OS container as its runtime?

    4) requirements.txt is a workaround for something that is completely broken in install_requires: you cannot handle dependency conflicts. If your software and some of your upstream deps both depend on a certain package, but on a different version, you’re screwed: there’s no way around that with standard setuptools. requirements.txt may be an imperfect solution, but it’s a solution at least.

    • 1) It’s not great, I agree, but I have issues with any language level isolation method that calls itself “virtual”.

      2) We actually currently *do*, for historical reasons, develop in a virtualenv under a full VM. We happen to build it out to RPM then. The RPM installs a django app into a directory in production. The wsgi script sets sys.path. No virtualenv is involved, and in fact it’s easier to script the RPM build without it. That’s why I say it’s an interactive development tool only. Isolating python libs is easy without it.

      3) The Java VM is closely analogous to full isolation, JNDI aside. Python, Ruby, Perl and PHP can’t offer this. Their analogy is, by definition IMO, the entire system.

      4) Conflicts are valid in a dependency tree. You deal with them. Consider that if you’re writing a pypi package you could run into the same problem, but pip specifically will not read a requirements.txt file from an upstream (unless I’m very much mistaken).

  11. Pingback: Python | Pearltrees

  12. Great post.

    For anyone interested in container-level isolation that works with Macs as well, I recommend taking a look at Docker (http://docker.io). It’s got a lot of big names supporting it, and they’ll bit hitting v1 soon (although many are already using it in dev environments).

    • Yes agreed. It looks good. My post is more about the problem than the solution. If you abandon a bad solution when there are excellent ones available, you’ll soon discover them.

  13. You can only talk about the problem? No, you can fix it! So shut up and offer or make another acceptable solution. Or use the solutions for you from other developers.

    • I think you need to read the article again. I don’t say there is a problem with virtualenv, I say it’s a false solution and there are better options. I go on to list a few of them.

  14. I’m used to Windows. If I want a proper deployment, I can use Py2Exe or the like to actually collect everything my application needs, including extension libraries, into a complete bundle that can be tested without a Python environment, “virtual” or otherwise.
    On the other hand, if I have something lightweight I can just demand libraries to be installed in a server or in the user’s general-purpose Python.
    Library version conflicts are not a problem of applications, they are a problem of users trying to use multiple applications: handling conflicts appropriately is their responsibility. They might run a new web server in a new VM or a on a new computer, install a proper Python interpreter side by side (possibly taking advantage of LXC or the like), or use virtualenv on their own accord.
    Why should I waste time with complicated ways to make my application harder to install and library conflicts more subtle? Why should paranoid insulation be considered a reasonable policy?

    • I agree with you, but there’s an assumed context here of deploying to your own equipment. My position is, if you’re deploying to an environment you control, virtualenv is of little help. If you’re replicating a target environment for development, virtualenv is even less help. The current buzz around Docker is a good gauge of how significant this use case is.

  15. Pingback: Why Virtualenv? | mike.williamson

  16. Ok, I have to admin that I’m all but not a Python programmer. I have just experienced some little scripts to solve my day-to-day routine tasks.
    I came from different languages, all provided with an IDE and with a necessity to neither abstract the compiler version nor the packages.

    I’ve read something around, and found plenty of articles that suggest to use pip and virtualenv.
    I have to admit that the system per se can make sense, but it is difficult to master. Activate the environment, change the path, remember to install the package in the right folder.

    Frankly, I do prefer something visual like the virtualenvironment offered from PyCharm, which frankly I don’t know what sort of tricks it does in the background. But this won’t solve the necessity to have something executable from the command line….

    That’s my little two cents.

  17. Of note, system dependencies generally are a bunch of libraries (headers, libs, dynamically linkable libs). They can be perfectly isolated with INCLUDE_PATH, LD_LIBRARY_PATH (and equivalents). ros.org is a vast example of a ton of isolated packages (including python packages).

    And yes, lightweight, full virtualization is a great solution.

  18. Pingback: Django tips & tricks

  19. I don’t see why you liked easy_install over pip. The only advantage that easy_install have over pip is that it can install from binary packages, which hopefully (big hope…) will be less of an issue if more packages now adopt the wheel format.

    As long as you only use a pure python application, IMO pip is a good package manager. The drawback of easy_install is just too much to list, but my main gripe with it is that easy_install cannot uninstall packages. I often had to install, uninstall, and reinstall packages while trying out different versions of a dependency to find out the one that works.

    Also, you’re confounding requirements.txt with packaging solution. setup.py and requirements.txt solves different problems; the former is a simple way to describe and install dependencies which is all that end applications often needs, the latter is a full blown packaging solution. Many end applications simply do not need the power nor complexity of a full blown package management. You generally only should write setup.py if you’re writing a library or you need to distribute an application through PyPI (which is also slightly wrong as PyPI is primarily meant to distribute libraries, not end applications). You should use OS-level package managers if you want to distribute Python end applications.

  20. Great brain dump! For being more of a novice in Python programming, this was really useful information as my team and I have been discussing isolation and virtualized environments for our deployment needs. Virtualenv was never a possibility, but as part of my research in learning about virtualization solutions this turned out to be quite informative.

  21. Going to try to keep this short: Admittedly, I sort of skimmed, but I think I was able to take away the point you were trying to make. Based on that, I’d have to agree with you on most of the flaws pointed out, but I’d argue against your suggestion of system-level isolation. That approach is definitely the preferred solution when your project actually warrants it, but I think the bigger problem that hasn’t been addressed is developers treating dependencies like social networking. (This applies to Python a lot less than other language, but it’s growing increasingly more relevant as more people start using Python for web development – shocker)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s