Python Hack: Finding Automatically Generated Packages with Setuptools
Setuptools is the de-facto standard to build and distribute python packages. It is rather flexible, but can become rather difficult to use when your package starts to go in a non-standard direction.
Today, I will show a hack that allowed me to automatically discover packages generated at build time.
The Problem
Setuptools works by defining the build process in a python script,
conventionally called setup.py
.
A basic setup.py
could look like this (taken from the official docs):
from setuptools import setup, find_packages
setup(
name="HelloWorld",
version="0.1",
packages=find_packages(),
)
We are interested in the following line: packages=find_packages(),
.
Setuptools requires that all packages that should be distributed be named.
In particular, if you have a nested structure, with packages foo
,
foo.bar
and foo.baz
, you need to specify those three packages:
just specifying foo
won’t work.
This can quickly become annoying and error prone, and this is why setuptools
provides the find_packages
function.
What this function does is to recursively look into directories,
identifying what directories look like python packages.
It can be configured by passing arguments to it,
but we will not look at it here.
What is the problem, I hear you say?
Pretty often, no problem.
You run the setup.py
, packages get identified, packaged and distributed.
The problem is when you have code generation happening as a part of your build process.
In my case, I am generating PEP 484 Stub Files
for java classes to be accessed with JPype,
but there are thousands of other reasons (OK, maybe hundreds) why you might want to do that.
This code generation needs to be part of the build process,
because the user should be able to configure it at build time
(for instance configuring what java class should have typing information generated).
Such a build configuration would look like this
(for more information how to actually implement a setuptools command,
look e.g. here):
from setuptools import setup, find_packages, Command
# Import the standard "build" command to allow customizing it
from distutils.command.build import build
class CodeGenerationCommand(Command):
# In the real world, please do something here ;-)
pass
class MyBuild(build):
# Add code generation as the first step of the "build" command
sub_commands = [('codegen', None)] + build.sub_commands
setup(
name="HelloWorld",
version="0.1",
packages=find_packages(),
# This maps the names "codegen" to our code generation command,
# and "build" to our extended build command
cmdclass={
'codegen': CodeGenerationCommand,
'build': MyBuild
},
)
What happens in this case?
What happens is that when setup
gets called,
its parameter list gets evaluated,
in particular find_packages
.
Then, as part of its logic, setup
runs the code generation if needed,
and then builds the rest of the package…
But the generated packages were not there when find_packages
was run, and are thus not included in the distribution.
Ouch.
The Solution
The solution is to delay evaluation of find_packages
until it is needed.
Setuptools does not provide a way to do this, so one needs a way to pretend
we are passing a value, but passing a function instead.
This can be implemented with properties:
from setuptools import setup, find_packages
class PackageFinder:
@property
def find_packages(self):
return find_packages()
setup(
name="HelloWorld",
version="0.1",
packages=PackageFinder().find_packages,
)
Properties are the “pythonic” version of getters and setters, and basically pretend to present values, when in fact it presents functions. From the perspective of the caller code, the property is a value (ie a variable), but what happens in the background is that, each time the value is accessed, a function will be evaluated, potentially returning a different value.1
So what happens here is that each time setup
tries to look up the value of
its packages
parameter, our property gets re-evaluated,
effectively calling find_packages
with the latest state of the directory structure.
Outlook
That solution worked in my tests, and allowed me to finally move forward in my project,
and forget about setuptools until the next time I have to tame it.
I would not call it elegant or even robust:
in particular, it relies on implementation details of the setup
method to
work properly.
It would, for instance, break if setup
was to copy the list of packages in
a new list
instance.
It is, however, the best solution I could find that still allows code generation
to be part of the setup
call.
If it is not a problem for you to run the code generation outside of setup
(which includes setup.py
, but before calling setup
), go for it.
Otherwise, use at your own risk.
If you encountered a similar problem, I would love to hear about how you solved it in the comments!
-
there is more to properties, of course, but those are the aspects that matter here. If you are interested in other aspects of properties, such as how to implement setters or destructors, the documentation is pretty nice. ↩
Webmentions
No webmentions were found.
Comment
Want to react? Send me an e-mail or use Webmentions