PyMiniRacer v0.12.2
Call your Python from your JavaScript from your Python
The dynamic language embedding turducken is complete: you can now call Python from JavaScript from Python!
As of PyMiniRacer v0.12.2
, you can
write logic in Python and expose it to V8’s JavaScript sandbox. You can now in
theory, while writing only Python and JavaScript, create your own JS extension
library similar to
NodeJS’s standard library. Obviously,
anything created this way this will almost certainly be less extensive, less
standardized, and less efficient than the NodeJS standard library; but it will
be more tailored to your needs.
This post follows up on prior content related to reviving PyMiniRacer and then giving Python code the power to directly poke JS Objects, call JS Functions, and await JS Promises.
Adding Python extensions to V8 without writing any C++ code
PyMiniRacer runs a pretty vanilla V8 sandbox, and thus it doesn’t have any Web APIs, Web Workers APIs, or any of the NodeJS SDK. This provides, by default, both simplicity and security.
Thus, until now, you couldn’t, say, go and grab a random URL off the Internet,
or even log to stdout (e.g., using console.log
). No APIs for either operation
are bundled with V8.
Well, now you can use PyMiniRacer to extend the JavaScript API, thereby adding that functionality yourself!
import aiohttp
import asyncio
from py_mini_racer import MiniRacer
mr = MiniRacer()
async def demo():
async def log(content):
print(content)
async def antigravity():
import antigravity
async with aiohttp.ClientSession() as session:
async def get_url(url):
async with session.get(url) as resp:
return await resp.text()
async with (
mr.wrap_py_function(log) as log_js,
mr.wrap_py_function(get_url) as get_url_js,
mr.wrap_py_function(antigravity) as antigravity_js,
):
# Add a basic log-to-stdout capability to JavaScript:
mr.eval('this')['log'] = log_js
# Add a basic url fetch capability to JavaScript:
mr.eval('this')['get_url'] = get_url_js
# Add antigravity:
mr.eval('this')['antigravity'] = antigravity_js
await mr.eval("""
async () => {
const content = await get_url("https://xkcd.com/353/");
await log(content);
await antigravity();
}
""")()
# prints the contents of https://xkcd.com/353/ to stdout... and then also loads it in
# your browser:
asyncio.run(demo())
Security note
It is possible (modulo open-source disclaimers of warranty, etc) to use PyMiniRacer to run untrusted JS code; this is a stated security goal.
However, exposing your Python functions to JavaScript of course breaches the
hermetic V8 sandbox. If you extend PyMiniRacer by exposing Python functions, you
are obviously taking security matters into your own hands. The above demo, by
exposing an arbitrary get_url
function, would expose an obvious data
exfiltration and denial-of-service vector if we were running untrusted JS code
with it.
About async
You will note that the above demo is using async
heavily, both on the Python
and JavaScript side. PyMiniRacer only lets you expose async
Python functions
to V8. These are represented as async
on the JavaScript side, meaning you need
to await
them, meaning in turn that the easiest way to even call them is from
async
JavaScript code. Everything gets
red
very fast!
It turns out this is the only way to reliably expose Python functionality to V8.
V8 runs JavaScript in a single-threaded fashion, and doing anything synchronous
in a callback to Python would block the entire V8 isolate. Worse, things would
likely deadlock very quickly—if your Python extension function tries to call
back into V8 it will freeze. The only thing we can reasonably do, when
JavaScript calls outside the V8 sandbox, is create and return a Promise
, and
then do the work actually needed to fulfill that Promise
out of band.
Internal changes to PyMiniRacer
Implementing wrap_py_function
I put some general design ideas about wrap_py_function
here. What landed in
PyMiniRacer is basically “Alternate implementation idea 3”
on the GitHub issue outlining this feature.
Generally speaking:
Generalizing PyMiniRacer callbacks and sharing allocated objects with V8
PyMiniRacer already had a C++-to-Python callback mechanism; this was used to
await Promise
objects. We just had to generalize it!
… Which specifically means allowing V8 to reuse a callback. With Promises
,
callbacks are only used 0.5 times on average (either the resolve
or reject
is used, exactly once). PyMiniRacer handled cleanup of these on its own; the
first time either resolve
or reject
was called, PyMiniRacer could destroy
both callbacks. This technique was borrowed from
V8’s d8 tool.
But now we want to expose arbitary-use callbacks, which need to live longer… We just need to attach a tiny bit of C++ state (the Python address of the callback function, and a pointer to our V8 value wrapper) to a callback object, and hand that off to V8.
Unfortunately,
V8 is pretty apathetic about telling us when it’s done with an external C++ object we give it.
V8 will
happily accept a raw pointer to an external object,
but it doesn’t reliably call you back to tell you when it’s done with that raw
pointer. I couldn’t make the V8 finalizer callback work at all, and all
commentary I can find on the matter says trying to get V8 MakeWeak
and
finalizers work reliably is a fool’s errand. This creates a memory management
conundrum: we need to create a small bit of per-callback C++ state and hand it
to V8, but V8 won’t reliably tell us when it has finally dropped all references
to that state.
So I moved to a model of creating one such callback object per MiniRacer
context. This object outlives the v8::Isolate
, so we’re no longer relying on
V8 to tell us when it’s done with the pointer we gave it. In order to multiplex
many callbacks through that one object, we can just give JavaScript an ID number
(because V8 does manage to reliably destruct numbers!). This new logic lives
in
MiniRacer::JSCallbackMaker
.
Meanwhile the Python side of things can manage its own map of ID number to
callback, and remove entries from that map when it wants to. (This is what the
wrap_py_function
context manager does on __exit__
). Since Python is tearing
down the callback and its ID on its own schedule, in total ignorance of
JavaScript’s references to that same callback ID, it’s possible for JS to keep
trying to call the callback after Python already tore it down. This is working
as designed; such calls can be easily spotted when the either
callback_caller_id
or callback_id
doesn’t reference an active
CallbackCaller
or callback, respectively (see diagram below). Calls to
invalidated ID numbers can be easily and safely ignored.
I think this could be turned into a generalized strategy for safely sharing allocated C++ memory with V8:
- Assume any raw pointers and references to C++ objects which you directly
share with V8 will need to live at least as long as the
V8::Isolate
. - If you want to be able to delete any objects before the
v8::Isolate
exits, don’t share raw pointers and references. Instead:- On the C++ side, create an ID-number-to-pointer map, and give V8 only ID
numbers, as the
data
argument ofv8::Function::New
. - Your
v8::Function
callback implementation can read the ID number and safely convert it to a C++ pointer by looking it up in the map. - Design an API, whether in C++ or JavaScript (or Python which calls C++ in PyMiniRacer’s case) which authoratively tears down the C++ object and the ID-number-to-pointer map entry. (Don’t rely on V8 to tell you when all references to the object have dropped; it won’t do this reliably.)
- Because we aren’t tracking dangling IDs on the JavaScript side (we can’t!), be prepared for JavaScript code to try and use the ID after you’ve torn down the object. The C++ code can easily detect this (because the ID is not in the map), and safely reject such attempts.
- On the C++ side, create an ID-number-to-pointer map, and give V8 only ID
numbers, as the
System diagram
Here’s roughly what the system looks like:
|
Making teardown more deterministic
In my last post, I
talked about
“regressing to C++ developer phase 1” by using std::shared_ptr
everywhere to
manage object lifecycles. As of PyMiniRacer v0.11.1
we were using a DAG of
std::shared_ptr
references to manage lifecycle of a dozen different classes.
I discovered a bug that
this logic left behind. If JavaScript code launched never-ending background
work, (e.g., setTimeout(() => { while (1) {} }, 100)
), the C++
MiniRacer::Context
and its v8::Isolate
wouldn’t actually shut down when told
to by Python side.
The problem with the laissez-faire std::shared_ptr
-all-the-things memory
management pattern is that we leak memory when we have reference cycles.
Unfortunately we do have reference cycles in PyMiniRacer. In particular, all
over PyMiniRacer’s C++ codebase, we create and throw function closures onto the
v8::Isolate
message queue which contain shared_ptr
references to objects…
which transitively contain shared_ptr
references to the v8::Isolate
itself… which contains references to everything in the message queue: a
reference cycle!
A simple way to resolve that in another system might be to explicitly clear
out the v8::Isolate
’s message queue on shutdown. Basically, by wiping out all
the not-yet-executed closures, we can “cut” all the reference cycles on exit.
However, a sticky point with PyMiniRacer’s design is that it uses that same
message queue even for teardown: we put tasks on the message queue which delete
C++ objects, because it’s the easiest way to ensure they’re deleted under the
v8::Isolate
lock (discussion of that
here). So PyMiniRacer
can’t simply clear out the message queue, or it will leak C++ objects on exit.
After playing with this some and failing to get the lifecycles just right, and
struggling to even understand the effective teardown order of dozens of
different reference-counted objects, I realized we could switch to a different,
easier-to-think-about pattern: every object putting tasks onto the v8::Isolate
simply needs to ensure those tasks complete before anything it puts into those
tasks is torn down.
I’ll restate that rule: if you call MiniRacer::IsolateManager::Run(xyz)
, you
are reponsible for ensuring that task is done before any objects you bound
into the function closure xyz
are destroyed.
This rule seems obvious in restrospect! But it’s hard to implement. I:
Modified
MiniRacer::IsolateManager::Run
to always return a future, to make it easier to wait on these tasks. This is used in various places to ensure the above rule is honored.Refactored
CancelableTaskManager
completely to make it explicitly track all the tasks it makes, just so it can reliably cancel and await them all upon teardown (where before it would just sort of “fire and forget” tasks, not at all caring if they ever actually finished).Added a simple new garbage-collection algorithm in
IsolateObjectCollector
to ensure all the garbage is gone before we move on to tearing down theIsolateManager
and its message pump loop.
This is a little more code than we had before, but things are much simpler to reason about now! I can confidently tell you that PyMiniRacer shuts down when you tell it to. :)