Architecture¶
This document contains some notes about the design of PyMiniRacer.
Security goals¶
First and foremost, PyMiniRacer makes no guarantees or warrantees, as noted in the license. This section documents the security goals of PyMiniRacer. Anything that doesn't meet these goals should be considered to be a bug (but with no warrantee or even a guaranteed path to remediation).
PyMiniRacer should be able to run untrusted JavaScript code¶
The ability for PyMiniRacer to run untrusted JavaScript code was an original design goal for Sqreen in developing PyMiniRacer, and continues to be a design goal today.
To that end, PyMiniRacer provides:
-
The innate sandboxing properties of V8. V8 is trusted by billions of folks to run untrusted JavaScript every day, as a part of Chrome and other web browsers. It has many features like the security sandbox and undergoes close security scrutiny.
-
The ability to create multiple
MiniRacer
instances which each have separate V8 isolates, to separate different blobs of untrusted code from each other. -
Optional timeouts and memory constraints on code being executed.
Caveats:
-
The continual security research is V8 under yields a corresponding stream of vulnerability reports.
-
... and while V8 as embedded in a web browser will typically receive (funded!) updates to correct those vulnerabilities, PyMiniRacer is unlikely to see as aggressive and consistent an update schedule.
-
... and of course PyMiniRacer itself may have vulnerabilities. This has happened before.
-
... and even if PyMiniRacer is updated to accomodate a vulnerability fix in itself or V8, it is incumbent upon Python applications which integrate it to actually redeploy with the new PyMiniRacer version.
If running potentially adversarial JavaScript code in a high-security environment, it might be a better choice to run code using a purpose-built isolation environment such as containers on gVisor, than to rely on PyMiniRacer for isolation.
JavaScript-to-Python callbacks may breach any isolation boundary¶
The MiniRacer.wrap_py_function
method allows PyMiniRacer users to expose Python
functions they write to JavaScript. This creates an extension framework which
essentially breaches the isolation boundary provided by V8.
This feature should only be used if the underlying JavaScript code is trusted, or if the author is certain the exposed Python function is safe for calls from untrusted code. (I.e., if you expose a Python function which allows reading arbitrary files from disk, this would obviously be bad if the JavaScript code which may call it is itself untrusted.)
Brief catalog of key components¶
docs/
¶
This is the mkdocs
site for PyMiniRacer. To maximize
compatibility with standard open-source repository layout, this directory is just a
bunch of stubs which include files from the package root.
hatch_build.py
¶
This is a Hatch build hook which builds Python wheels, by calling helpers/v8_build.py
.
helpers/v8_build.py
¶
This is the PyMiniRacer V8 build wrapper. Building V8 for many platforms (Windows, Mac,
glibc Linux, musl Linux) and architectures (x86_64, aarch64) is hard, especially since
V8 is primarily intended to be built by Google engineers on a somewhat different set of
of platforms (i.e., those Chrome runs on), and typically via cross-compilation from
relatively curated build hosts. So this file is complicated and full of if
statements.
src/v8_py_frontend/
¶
This is a small frontend for V8, written in C++. It manages initialization, context,
marshals and unmarshals inputs and outputs through V8's type system, etc. The front-end
exposes simple functions and types which are friendly to the Python ctypes
system.
These simple C++ functions in turn call the C++ V8 APIs.
As noted below, v8_py_frontend
is not a Python extension (it does not include
Python.h
or link libpython
, and it does not touch Python types).
(Compiled) src/py_mini_racer/libmini_racer.so
, src/py_mini_racer/mini_racer.dll
, src/py_mini_racer/libmini_racer.dylib
¶
These files (which one depends on the platform) contain the compiled V8 build,
complete with the frontend from src/v8_py_frontend
.
(Compiled) src/py_mini_racer/icudtl.dat
¶
This is a build-time-generated internationalization artifact, used at runtime by V8 and thus shipped with PyMiniRacer.
(Compiled) src/py_mini_racer/snapshot_blob.bin
¶
This is a build-time-generated startup snapshot, used at runtime by V8 and thus shipped with PyMiniRacer. This is a snapshot of the JavaScript heap including JavaScript built-ins, which accelerates JS engine startup.
src/py_mini_racer/
¶
This is the pure-Python implementation of PyMiniRacer. This loads the
(Python-independent) PyMiniRacer dynamic-link library (.dll
on windows, .so
on
Linux, .dylib
on MacOS) and uses the Python ctypes
system to call methods within it,
to manage V8 context and actually evaluate JavaScript code.
.github/workflows/pypi-build.yml
¶
This is the primary build script for PyMiniRacer, implemented as a GitHub Actions workflow.
Design decisions¶
These are listed in a topological sort, from most-fundamental to most-derived decisions.
In theory, answers to questions in the vein of "Why is it done this way?" belong in this document.
Minimize the interface with V8¶
V8 is extremely complex and is under continual, heavy development. Such development can result in interface changes, which may in turn break PyMiniRacer.
To mitigate the risk of breakage with new V8 builds, we seek to minimize the "API surface area" between PyMiniRacer and V8. This means we seek to limit "advanced" use of both:
- The V8 C++ API, and
- The V8 build system (GN) and build options.
Our success at minimizing the interface with the V8 build system can be measured by the
length of helpers/v8.build.py
(444 lines as of this writing!). Making V8 build on
multiple platforms takes a lot of trickery...
Minimize the interface with the CPython API (don't make an extension)¶
For similar reasons (the CPython API is complex and always in flux, although not as
much as V8), combined with the proliferation of Python versions (many versions of
CPython, PyPy, etc), we'd rather avoid directly interfacing with the CPython API. Thus,
instead of an extension module (which includes Python.h
and links against
libpython
), we build an ordinary Python-independent C++ library, and use ctypes
to
access it.
Build V8 from source¶
The V8 project does not produce stable binary distributions, i.e., static or dynamic
libraries. (In Linux terms, this would probably look like libv8
and libv8-dev
.)
Instead, any project (like NodeJS, Chromium, or... PyMiniRacer!) which wants to
integrate V8 must first build it.
Build V8 with our frontend (v8_py_frontend
) as a snuck-in component¶
We could just get a static library (i.e., libv8.a
) from the V8 build, and link that
into a dynamic-link library [i.e., libmini_racer.so
]) ourselves.
However:
- We do have one more C++ file to compile (the C++ code in
v8_py_frontend
) - Because we're not making a true Python extension module (see above), we aren't using
Python's
setuptools
Extension
infrastructure to perform a build.
This does, however, leave us needing some platform-independent C++ toolchain.
V8 already has such a toolchain, based on Ninja and Generated Ninja files (GN).
Rather than bringing in another toolchain, we sneak v8_py_frontend
(which is, after
all, just one C++ file) into the V8 tree itself, as a "custom dep". We then instruct GN
to build it as if it were an ordinary part of V8.
The result is a dynamic-link library which contains an ordinary release build of V8,
plus our Python ctypes
-friendly frontend.
Build PyPI wheels¶
Because V8 takes so long to build (about 2-3 hours at present on the free GitHub Actions
runners, and >12 hours when emulating aarch64
on them), we want to build wheels for
PyPI. We don't want folks to have to build V8 when they pip install mini-racer
!.
We build wheels for many operating systems and architectures based on popular demand via
GitHib issues. Currently the list is
{x86_64, aarch64} × {Debian Linux, Alpine Linux, Mac, Windows}
(but skipping Windows
aarch64
for now since there is not yet either a GitHub Actions runner, or emulation
layer for it).
Use the free GitHub Actions hosted runners¶
PyMiniRacer is not a funded project, so we run on the free GitHub Actions hosted runners. These currently let us build for many key platforms (including via emulation).
This also lets contributors easily run the same build automation by simply forking the PyMiniRacer repo and running the workflows (for free!) within their own forks.
Use sccache
to patch around build timeouts¶
As of this writing, the Linux aarch64
builds run on emulation GitHub because has no
free hosted aarch64
runners for Linux. This makes them so slow, they struggle to
complete at all. They take about 24 hours to run. The GitHub Actions
job timeout is only 6 hours,
so we have to restart the jobs multiple times. We rely on
sccache
to catch the build up to prior progress.
It would in theory be less ugly to segment the build into small interlinked jobs of less
than 6 hours each so they each succeed, but for now it's simpler to just manually
restart the failed jobs, each time loading from the build cache and making progress,
until they finally succeed. Hopefully at some point GitHub will provide native aarch64
Linux runners, which will alleviate this problem.
Use uraimo/run-on-arch-action
(and not cibuildwheel
)¶
So, we need to build wheels for multiple architectures. For Windows and Mac (x86_64
on
Windows, and both x86_64
and aarch64
on Mac) we can can use GitHub hosted runners.
For Linux builds (Debian and Alpine, and x86_64
and aarch64
), we use the fantastic
GitHub Action workflow step
uraimo/run-on-arch-action
, which lets
us build a docker container on the fly and run it on QEMU.
Many modern Python projects which need to build wheels with native code use
the cibuildwheel
project to manange their
builds. However, cibuildwheel
isn't a perfect fit here. Because we are building
Python-independent dynamic-link libraries instead of Python extension modules modules,
we aren't linking with any particular Python ABI. Thus we need only
(operating systems × architectures)
builds, whereas cibuildwheel
generates
(operating systems × architecture × Python flavors × Python versions)
wheels.
That's a ton of wheels!
Given that it takes hours to days to build PyMiniRacer for one target OS and
architecture, doing redundant builds is undesirable.
It might be possible to use cibuildwheel
with PyMiniRacer by segmenting the build of
the dynamic-link library (i.e., libmini_racer.so
) from the actual wheel build. That
is, we could have the following separate components:
- Create a separate Github Actions workflow to build the
libmini_racer.so
binary (i.e., the hard part). Publish that as a release, using the GitHub release artifact management system as a distribution mechanism. - The wheel build step could then simply download a pre-built binary from the latest
GitHub release. We could use
cibuildwheel
to manage this step. This would generate many redundant wheels (because the wheels we'd generate for, say, CPython 3.9 and 3.10 would be identical), but it wouldn't matter because it would be cheap and automatic.
This is similar to how the Ruby mini_racer
and
libv8-node
projects, which inspired
PyMiniRacer, work together today.
To sum up, to use cibuildwheel
, we would still need our own separate
multi-architecture build workflow for V8, ahead of the cibuildwheel
step. So
cibuildwheel
could potentially simplify the actual wheel distribution for us, but it
wouldn't simplify the overall workflow management.