Reviving PyMiniRacer
JS-in-Python, in-process, redux!
In this last blog post, I
created a helper to call JavaScript
from Python using a NodeJS sidecar process. In the post I
commented
that in- process JS evaluation might be nicer. The old
Sqreen PyMiniRacer
project had only
recently fallen into disrepair. Can we revive it?
TL;DR: Yes! It just took a couple weeks of elbow grease, and new ownership
(it me). I now own the two best ways to run JavaScript from a Python program:
PyMiniRacer
and
nodejs-eval
.
from py_mini_racer import MiniRacer
mr = MiniRacer()
# Let's run some JavaScript from Python!
>>> mr.eval("Math.pow(7, 3);")
343
# Updated, for the first time since 2021!
>>> mr.v8_version
"b'12.2.281.23'"
# Now supported: the Intl API!
>>> mr.eval('Intl.DateTimeFormat(["ban", "id"]).format(new Date())')
'16/3/2024'
# As of the v0.9.0 release, async execution (as a side effect, Control+C works):
>>> mr.eval('while (1) {}')
^CTraceback (most recent call last):
...
KeyboardInterrupt
Other new features can be found on the relnotes page, where v0.7.0 is the first new version since 2021.
A lineage
therubyracer
(2009-2018) Charles Lowell of Frontside Software created The Ruby Racer to embed V8 (the JavaScript engine used by Chrome, NodeJS, etc) into Ruby for direct JS execution from Ruby programs.- Unfortunately, the rich integration between Ruby Racer and V8 became a pain point for Ruby Racer, because upgrading V8 often meant revamping Ruby Racer to fit interface changes. So this project was eventually archived, and replaced with…
mini_racer
(2016-) Sam Saffron and others createdmini_racer
, a new Ruby / V8 integration, stripped down relative to Ruby Racer. This version is still maintained.sqreen/PyMiniRacer
(2016-2021) Sqreen, a web app security startup, createdPyMiniRacer
, a Python module modeled after Ruby’smini_racer
. This followed the same model of minimizing the interface with V8, and also used a Pythonctypes
integration (as opposed to a Python extension module) which furthermore minimized the interface with Python, resulting in a JS/Python integration with relatively little support burden.- Unfortunately,
PyMiniRacer
wasn’t updated after 2021 when Sqreen was acquired by DataDog.
- Unfortunately,
bpcreech/PyMiniRacer
(2024-) This is what you’re reading about now. :)
After discussion with the Sqreen (now DataDog) folks, we decided to host my
revival of their PyMiniRacer
project as a fork, which lives here:
General updates
Other than upgrading V8—which has its own section below—I took the opportunity to dust off various parts of this project.
Python ecosystem updates
In particular, lots of things have happened in the Python world!
Python versions (drop Python 2, add up to 3.12)
First, we can drop Python 2 which was globally EOL’d in 2020 (after a deprecation plan over a decade long!). Because the world is big, folks are still using Python 2 out there, but we don’t need to maintain an up-to-date V8 integration for them.
Meanwhile, as of this writing, Python is up to 3.12, which for PyMiniRacer
added some minor breakage here and there. For example, some change to the Python
memoryview
system
now requires
explicit memoryview
casting.
I also added support for the fancy new Python
importlib.resources
specification which lets Python directly run modules and load their
dependencies, e.g., PyMiniRacer
’s DLL file out of unusual non-filesystem
places (such as from within zip
files). This system was added in Python 3.7
and revamped in Python 3.9; PyMiniRacer
now supports both incantations of
loading-data-files-from-the-package.
Packaging with Hatch
PyMiniRacer
originally built binary distributions using setuptools
, and
managed its various bits of automation using a hand-written Makefile
. Python
now has a standardized pluggable packaging system for building binary
distributions, and Hatch is the most popular
implementation of it.
“Hatch is trying to be the Cargo or Go CLI equivalent for Python”
per its author. By using Hatch (and accepting its
various opinions) we can drop a lot of developer tooling configuration from
PyMiniRacer
.
Hatch includes a bundled an opinionated linter and code formatter in
Ruff, which lets us drop
flake8
and
isort
and their config files as development
dependencies. The only default setting I changed was the line length, from 120
to Black’s default of 88. I thought this pointless debate was
finally settled by Black
when it won the formatting war, but for some reason Hatch
overrides this setting to 120,
so I put it back where Black (and Ruff) default to.
Hatch also includes built-in support for Python version matrix testing. It works
super well (modulo, not on Alpine for
reasons) and
lets us drop tox
as a development dependency.
pytest
instead of unittest
This is more of a no-brainer these days. unittest
has been built into Python
since forever, but these days everyone (in the OSS community anyway) seems to be
converging on pytest
. So I converted all the tests to pytest
, which makes
for slightly simpler-looking tests and prettier console output, yay.
Docs!
Inspired by
this post, I
figured we should have an ARCHITECTURE.md
, so
I wrote one
(or see it on the mkdocs
site).
I also sprinkled in a ton of comments. PyMiniRacer
’s V8 build (see below) is
full of workarounds, written as config tweaks, patches, and little extra steps.
These workarounds do not help the forwards compatibility story, because each
little tweak to the build process is a potential source of future breakage when
the upstream V8 build process changes. Now, we at least have a paper trail of
where those tweaks came from!
Finally, and most dramatically from a cosmetic perspective, I migrated the
PyMiniRacer
docs from Sphinx to
Material for MkDocs and created
a docs build pipeline (AFAICT there wasn’t one!). Sphinx has been around and
working forever, but the current ecosystem mindshare seems to be pouring into
mkdocs-material
lately.
I am a little worried about maintainability since mkdocs-material
is a
complicated and load-bearing plugin for the mkdocs
, and itself works best
only when combined with
other plugins from mkdocs-material
’s own plugin system.
it’s a setup ripe for this situation. But, I went with
peer pressure, and the new docs look great with very little configuration,
because mkdocs-material
is indeed fantastic. The new docs live
here.
Actually building V8
Okay, the main work here is updating V8. The last Sqreen version of
PyMiniRacer
, from 2021, used V8 8.9, and no longer builds. V8 is up to 12.2
today.
General challenges in building V8
There is no official binary distribution of the V8 library as a standalone unit. The only way to use V8 is to build it yourself. Unless, perhaps, you use the whole NodeJS binary, which brings us back to my last blog post—maybe, after all, the best way to use V8 is via a server running inside NodeJS?
But building V8 is hard! Fun challenges in building V8:
V8 is enormous and building it is slow: V8 contains or dynamically downloads 6.6 GB of source! (Okay, not all “source”, actually: this includes some vendored copies of operating system roots, like
/usr
from flavors of Debian.) There are about 2.4k build steps, which include building tons of code generated by the Torque compiler. It takes over an hour to build from scratch on a free GitHub Actions runner (currently, a 4-CPU machine for Linux, etc). To build for Linuxaarch64
, GitHub doesn’t provide any free hosted runners, so we run via emulation, and it takes several days. This far exceeds the maximum 6 hours provided by GitHub Actions, meaning builds fail due to the time limit. We can work around that limitation by usingsccache
to cache and catch up on builds; after enough retries our GitHub Actions builds do eventually succeed. (And hopefully, one day, GitHub will provide freeaarch64
Linux runners!)V8 wants to set up its own build ecosystem: To build V8, you first download another set of utilities called
depot_tools
.depot_tools
includes its very own binaries built for some but not all our target platforms, for things like Python, Goma (a build cache we’re not using), Ninja (a build system we do use), GN (a meta-build system we also use), etc. Thedepot_tools
fetch
tool acts as a recursive dependency module grabber (like Git submodules, but fancy). Once we have all the source, V8 uses a series of Python scripts to wrap GN, which in turn wraps Ninja.That build ecosystem, and the build in general, doesn’t actually work on Alpine or Linux-on-Arm: For
PyMiniRacer
we want to target at least{ Windows, Mac, Linux [glibc], Linux [musl] } × { x86_64, aarch64 }
(aarch64
by popular demand). V8 doesn’t support building on Linux-on-arm64, although it does support cross-compiling for it. V8 doesn’t supportmusl
(Alpine’slibc
) in either on-host building or cross-compiling. So we need to do various fun config tweaks to make it actually work.V8 and the build system change all the time: V8 is under very heavy development at Google, for a variety of products (Chromium of course, but also ChromeOS, etc). The available and default config options change over time, meaning any intricate build setup we do in
PyMiniRacer
is likely to break in with newer V8 verions. So we want to minimize the amount of build configuration we do inPyMiniRacer
, to future-proof it as best we can.V8 needs a bleeding-edge LLVM (particularly,
clang
) and wants its ownlibstdc++
: V8 uses brand new features ofclang
, including an ML-driven optimization model. It comes with a build of the llvm toolchain, but only for supported platforms (thus excluding Alpine, and excluding building on Linuxaarch64
). We work around this by installing the latest LLVM from the LLVM project where the binaries vendored into V8 itself don’t work. But even this version isn’t new enough for V8! We still have to tweak the build config to make it build even with the latest stable LLVM.
Meanwhile, we impose another challenge by sticking to the free GitHub Actions
runners: unfortunately, GitHub Actions has no native aarch64
hosted runners,
and no native Alpine runners. We work around this using Umberto Raimondi’s
fantastic run-on-arch-action
.
This GitHub Action plug-in helps us build Docker containers for Linux
distributions and architectures, and then build PyMiniRacer
there.
Extra features added while updating V8
Aside from
all the V8 updates from v8.9 to v12.2,
I plumbed in the following which had been disabled in prior PyMiniRacer
builds:
- Support for the ECMAScript internalization API and
thus the ECMA
Intl
API. - V8 fast startup snapshots.
Both of these require pulling generated data files into the Python package, alongside the compiled DLL.
Potential future work in simplifying the V8 build
The Ruby mini_racer
project mentioned
above actually split the V8 build out into a separate project,
libv8-node
: “A project for
distributing the v8 runtime libraries and headers in both source and binary
form, packaged as a language-independent zip and as a Ruby gem.”. This project
takes a different tack on the problem by reusing NodeJS’s opinionated vendored
copy and build of V8 instead of trying to build V8 from
Google’s directions. We might be able to simplify
PyMiniRacer
by rebasing it upon the libv8-node
build of V8. We’d need to
ensure libv8-node
is up-to-date and stable (it’s not totally clear to me that
it is) and, because we’re dropping the V8 build entirely, move the compilation
of
PyMiniRacer
’s custom C++ code
out of GN
+ninja
and into another to-be-determined build system.
Alternatively, it would be nice if V8 lived within a common C/C++ package system. The winning multi-platform C/C++ package system today seems to be https://conan.io. Making V8 work with Conan (well enough for official upload to Conan Center) would be tough because, along with all the difficulties cited above, V8 loves to download its own dependencies in violation of Conan’s common-sense One-Definition Rule (ODR).
Other future work in PyMiniRacer
Future work may include:
- Updates to new V8 releases which we can assume will appear unabated.
- Support for
Python
asyncio
. - Other stuff from the old GitHub issues list.
- Standard library stuff.
PyMiniRacer
has noconsole.log
(and nowindow
object forconsole
to live on), nosetTimeout
, etc. Providing such functions would be handy, but also if we’re not careful may act as a breach of the security sandbox provided byPyMiniRacer
, will move away from the minimal-interface rule we’re going for, and may trend toward “just being NodeJS” with its rich standard library. At this point, we’d be better off by embedding NodeJS, or just running it as a sidecar.
If you’re reading this and want to contribute, go for it! See the contribution guide.