Python Hack: Finding Automatically Generated Packages with Setuptools

Setuptools is the de-facto standard to build and distribute python packages. It is rather flexible, but can become rather difficult to use when your package starts to go in a non-standard direction.

Today, I will show a hack that allowed me to automatically discover packages generated at build time.

The Problem

Setuptools works by defining the build process in a python script, conventionally called setup.py. A basic setup.py could look like this (taken from the official docs):

from setuptools import setup, find_packages
setup(
    name="HelloWorld",
    version="0.1",
    packages=find_packages(),
)

We are interested in the following line: packages=find_packages(),. Setuptools requires that all packages that should be distributed be named. In particular, if you have a nested structure, with packages foo, foo.bar and foo.baz, you need to specify those three packages: just specifying foo won’t work.

This can quickly become annoying and error prone, and this is why setuptools provides the find_packages function. What this function does is to recursively look into directories, identifying what directories look like python packages. It can be configured by passing arguments to it, but we will not look at it here.

What is the problem, I hear you say? Pretty often, no problem. You run the setup.py, packages get identified, packaged and distributed. The problem is when you have code generation happening as a part of your build process. In my case, I am generating PEP 484 Stub Files for java classes to be accessed with JPype, but there are thousands of other reasons (OK, maybe hundreds) why you might want to do that. This code generation needs to be part of the build process, because the user should be able to configure it at build time (for instance configuring what java class should have typing information generated). Such a build configuration would look like this (for more information how to actually implement a setuptools command, look e.g. here):

from setuptools import setup, find_packages, Command

# Import the standard "build" command to allow customizing it
from distutils.command.build import build

class CodeGenerationCommand(Command):
    # In the real world, please do something here ;-)
    pass

class MyBuild(build):
    # Add code generation as the first step of the "build" command
    sub_commands = [('codegen', None)] + build.sub_commands

setup(
    name="HelloWorld",
    version="0.1",
    packages=find_packages(),
    # This maps the names "codegen" to our code generation command,
    # and "build" to our extended build command
    cmdclass={
      'codegen': CodeGenerationCommand,
      'build': MyBuild
    },
)

What happens in this case? What happens is that when setup gets called, its parameter list gets evaluated, in particular find_packages. Then, as part of its logic, setup runs the code generation if needed, and then builds the rest of the package… But the generated packages were not there when find_packages was run, and are thus not included in the distribution. Ouch.

The Solution

The solution is to delay evaluation of find_packages until it is needed. Setuptools does not provide a way to do this, so one needs a way to pretend we are passing a value, but passing a function instead. This can be implemented with properties:

from setuptools import setup, find_packages

class PackageFinder:
    @property
    def find_packages(self):
        return find_packages()

setup(
    name="HelloWorld",
    version="0.1",
    packages=PackageFinder().find_packages,
)

Properties are the “pythonic” version of getters and setters, and basically pretend to present values, when in fact it presents functions. From the perspective of the caller code, the property is a value (ie a variable), but what happens in the background is that, each time the value is accessed, a function will be evaluated, potentially returning a different value.1

So what happens here is that each time setup tries to look up the value of its packages parameter, our property gets re-evaluated, effectively calling find_packages with the latest state of the directory structure.

Outlook

That solution worked in my tests, and allowed me to finally move forward in my project, and forget about setuptools until the next time I have to tame it. I would not call it elegant or even robust: in particular, it relies on implementation details of the setup method to work properly. It would, for instance, break if setup was to copy the list of packages in a new list instance. It is, however, the best solution I could find that still allows code generation to be part of the setup call. If it is not a problem for you to run the code generation outside of setup (which includes setup.py, but before calling setup), go for it. Otherwise, use at your own risk.

If you encountered a similar problem, I would love to hear about how you solved it in the comments!

  1. there is more to properties, of course, but those are the aspects that matter here. If you are interested in other aspects of properties, such as how to implement setters or destructors, the documentation is pretty nice. 

Comment

Want to react? Send me an e-mail or use Webmentions

Webmentions

No webmentions were found.