PyMiniRacer v0.12.2

Call your Python from your JavaScript from your Python

Page content

The dynamic language embedding turducken is complete: you can now call Python from JavaScript from Python!

As of PyMiniRacer v0.12.2, you can write logic in Python and expose it to V8’s JavaScript sandbox. You can now in theory, while writing only Python and JavaScript, create your own JS extension library similar to NodeJS’s standard library. Obviously, anything created this way this will almost certainly be less extensive, less standardized, and less efficient than the NodeJS standard library; but it will be more tailored to your needs.

This post follows up on prior content related to reviving PyMiniRacer and then giving Python code the power to directly poke JS Objects, call JS Functions, and await JS Promises.

Adding Python extensions to V8 without writing any C++ code

PyMiniRacer runs a pretty vanilla V8 sandbox, and thus it doesn’t have any Web APIs, Web Workers APIs, or any of the NodeJS SDK. This provides, by default, both simplicity and security.

Thus, until now, you couldn’t, say, go and grab a random URL off the Internet, or even log to stdout (e.g., using console.log). No APIs for either operation are bundled with V8.

Well, now you can use PyMiniRacer to extend the JavaScript API, thereby adding that functionality yourself!

import aiohttp
import asyncio
from py_mini_racer import MiniRacer

mr = MiniRacer()

async def demo():
  async def log(content):
    print(content)

  async def antigravity():
    import antigravity

  async with aiohttp.ClientSession() as session:
    async def get_url(url):
      async with session.get(url) as resp:
         return await resp.text()

    async with (
         mr.wrap_py_function(log) as log_js,
         mr.wrap_py_function(get_url) as get_url_js,
         mr.wrap_py_function(antigravity) as antigravity_js,
      ):
      # Add a basic log-to-stdout capability to JavaScript:
      mr.eval('this')['log'] = log_js

      # Add a basic url fetch capability to JavaScript:
      mr.eval('this')['get_url'] = get_url_js

      # Add antigravity:
      mr.eval('this')['antigravity'] = antigravity_js

      await mr.eval("""
async () => {
   const content = await get_url("https://xkcd.com/353/");
   await log(content);
   await antigravity();
}
""")()

# prints the contents of https://xkcd.com/353/ to stdout... and then also loads it in
# your browser:
asyncio.run(demo())

Security note

It is possible (modulo open-source disclaimers of warranty, etc) to use PyMiniRacer to run untrusted JS code; this is a stated security goal.

However, exposing your Python functions to JavaScript of course breaches the hermetic V8 sandbox. If you extend PyMiniRacer by exposing Python functions, you are obviously taking security matters into your own hands. The above demo, by exposing an arbitrary get_url function, would expose an obvious data exfiltration and denial-of-service vector if we were running untrusted JS code with it.

About `async`

You will note that the above demo is using async heavily, both on the Python and JavaScript side. PyMiniRacer only lets you expose async Python functions to V8. These are represented as async on the JavaScript side, meaning you need to await them, meaning in turn that the easiest way to even call them is from async JavaScript code. Everything gets red very fast!

It turns out this is the only way to reliably expose Python functionality to V8. V8 runs JavaScript in a single-threaded fashion, and doing anything synchronous in a callback to Python would block the entire V8 isolate. Worse, things would likely deadlock very quickly—if your Python extension function tries to call back into V8 it will freeze. The only thing we can reasonably do, when JavaScript calls outside the V8 sandbox, is create and return a Promise, and then do the work actually needed to fulfill that Promise out of band.

Internal changes to PyMiniRacer

Implementing `wrap_py_function`

I put some general design ideas about wrap_py_function here. What landed in PyMiniRacer is basically “Alternate implementation idea 3” on the GitHub issue outlining this feature. Generally speaking:

PyMiniRacer already had a C++-to-Python callback mechanism; this was used to await Promise objects. We just had to generalize it!

… Which specifically means allowing V8 to reuse a callback. With Promises, callbacks are only used 0.5 times on average (either the resolve or reject is used, exactly once). PyMiniRacer handled cleanup of these on its own; the first time either resolve or reject was called, PyMiniRacer could destroy both callbacks. This technique was borrowed from V8’s d8 tool.

But now we want to expose arbitary-use callbacks, which need to live longer… We just need to attach a tiny bit of C++ state (the Python address of the callback function, and a pointer to our V8 value wrapper) to a callback object, and hand that off to V8.

Unfortunately, V8 is pretty apathetic about telling us when it’s done with an external C++ object we give it. V8 will happily accept a raw pointer to an external object, but it doesn’t reliably call you back to tell you when it’s done with that raw pointer. I couldn’t make the V8 finalizer callback work at all, and all commentary I can find on the matter says trying to get V8 MakeWeak and finalizers work reliably is a fool’s errand. This creates a memory management conundrum: we need to create a small bit of per-callback C++ state and hand it to V8, but V8 won’t reliably tell us when it has finally dropped all references to that state.

So I moved to a model of creating one such callback object per MiniRacer context. This object outlives the v8::Isolate, so we’re no longer relying on V8 to tell us when it’s done with the pointer we gave it. In order to multiplex many callbacks through that one object, we can just give JavaScript an ID number (because V8 does manage to reliably destruct numbers!). This new logic lives in MiniRacer::JSCallbackMaker.

Meanwhile the Python side of things can manage its own map of ID number to callback, and remove entries from that map when it wants to. (This is what the wrap_py_function context manager does on __exit__). Since Python is tearing down the callback and its ID on its own schedule, in total ignorance of JavaScript’s references to that same callback ID, it’s possible for JS to keep trying to call the callback after Python already tore it down. This is working as designed; such calls can be easily spotted when the either callback_caller_id or callback_id doesn’t reference an active CallbackCaller or callback, respectively (see diagram below). Calls to invalidated ID numbers can be easily and safely ignored.

I think this could be turned into a generalized strategy for safely sharing allocated C++ memory with V8:

Assume any raw pointers and references to C++ objects which you directly share with V8 will need to live at least as long as the V8::Isolate.
If you want to be able to delete any objects before the v8::Isolate exits, don’t share raw pointers and references. Instead:
1. On the C++ side, create an ID-number-to-pointer map, and give V8 only ID numbers, as the data argument of v8::Function::New.
2. Your v8::Function callback implementation can read the ID number and safely convert it to a C++ pointer by looking it up in the map.
3. Design an API, whether in C++ or JavaScript (or Python which calls C++ in PyMiniRacer’s case) which authoratively tears down the C++ object and the ID-number-to-pointer map entry. (Don’t rely on V8 to tell you when all references to the object have dropped; it won’t do this reliably.)
4. Because we aren’t tracking dangling IDs on the JavaScript side (we can’t!), be prepared for JavaScript code to try and use the ID after you’ve torn down the object. The C++ code can easily detect this (because the ID is not in the map), and safely reject such attempts.

System diagram

Here’s roughly what the system looks like:

MiniRacer Python user code instantiates a py_mini_racer.MiniRacer object which contains a py_mini_racer._Context object.
The py_mini_racer._Context Python object instantiates a C++ MiniRacer::Context, passing in a pointer to a generic callback in the Python-side _CallbackRegistry. The MiniRacer::Context creates a MiniRacer::CallbackCaller which has a process-scope callback_caller_id associated with it.
MiniRacer Python user code passes Python function my_callback_func into MiniRacer.wrap_py_function. MiniRacer stores a wrapper of my_callback_func in its _CallbackRegistry, thus generating a callback_id.
MiniRacer.wrap_py_function passes this callback_id down to the C++ side of the house to generate a V8 callback function.
MiniRacer::JSCallbackMaker creates a v8::Function within the v8::Isolate, with data attached containing an array of [callback_id, callback_caller_id]. This data is all MiniRacer::JSCallbackMaker needs to later find the right Python-side callback when this function is called. MiniRacer::JSCallbackMaker returns a handle to this v8::Function all the way back up to Python, which can then take that handle (represented as a JSFunction in Python) and pass it around to JavaScript code.
Eventually, JavaScript code calls the callback created above.
V8 dispatches that callback to MiniRacer::JSCallbackMaker::OnCalledStatic.
MiniRacer::JSCallbackMaker::OnCalledStatic digs out the [callback_id, callback_caller_id] array to find the MiniRacer::CallbackCaller, and the callback_id to pass back to it.
MiniRacer::CallbackCaller converts the returned V8 value to a MiniRacer::BinaryValue, and calls back to the Python C function pointer with that and the callback_id.
The MiniRacer._ContextRegistry converts the callback_id to the destination Python function object (my_callback_func), and finally passes the function parameters back to it.

Making teardown more deterministic

All the things meme

In my last post, I talked about “regressing to C++ developer phase 1” by using std::shared_ptr everywhere to manage object lifecycles. As of PyMiniRacer v0.11.1 we were using a DAG of std::shared_ptr references to manage lifecycle of a dozen different classes.

I discovered a bug that this logic left behind. If JavaScript code launched never-ending background work, (e.g., setTimeout(() => { while (1) {} }, 100)), the C++ MiniRacer::Context and its v8::Isolate wouldn’t actually shut down when told to by Python side.

The problem with the laissez-faire std::shared_ptr-all-the-things memory management pattern is that we leak memory when we have reference cycles. Unfortunately we do have reference cycles in PyMiniRacer. In particular, all over PyMiniRacer’s C++ codebase, we create and throw function closures onto the v8::Isolate message queue which contain shared_ptr references to objects… which transitively contain shared_ptr references to the v8::Isolate itself… which contains references to everything in the message queue: a reference cycle!

A simple way to resolve that in another system might be to explicitly clear out the v8::Isolate’s message queue on shutdown. Basically, by wiping out all the not-yet-executed closures, we can “cut” all the reference cycles on exit. However, a sticky point with PyMiniRacer’s design is that it uses that same message queue even for teardown: we put tasks on the message queue which delete C++ objects, because it’s the easiest way to ensure they’re deleted under the v8::Isolate lock (discussion of that here). So PyMiniRacer can’t simply clear out the message queue, or it will leak C++ objects on exit.

After playing with this some and failing to get the lifecycles just right, and struggling to even understand the effective teardown order of dozens of different reference-counted objects, I realized we could switch to a different, easier-to-think-about pattern: every object putting tasks onto the v8::Isolate simply needs to ensure those tasks complete before anything it puts into those tasks is torn down.

I’ll restate that rule: if you call MiniRacer::IsolateManager::Run(xyz), you are reponsible for ensuring that task is done before any objects you bound into the function closure xyz are destroyed.

This rule seems obvious in restrospect! But it’s hard to implement. I:

Modified MiniRacer::IsolateManager::Run to always return a future, to make it easier to wait on these tasks. This is used in various places to ensure the above rule is honored.
Refactored CancelableTaskManager completely to make it explicitly track all the tasks it makes, just so it can reliably cancel and await them all upon teardown (where before it would just sort of “fire and forget” tasks, not at all caring if they ever actually finished).
Added a simple new garbage-collection algorithm in IsolateObjectCollector to ensure all the garbage is gone before we move on to tearing down the IsolateManager and its message pump loop.

This is a little more code than we had before, but things are much simpler to reason about now! I can confidently tell you that PyMiniRacer shuts down when you tell it to. :)