A Datasheet for NextGen MW

The opposite of Thomas Dolby

I was terrible at the first four weeks of organic chemistry. I just couldn’t get the right pictures into my head.

The depictions of the chemical reaction mechanisms I was supposed to memorize seemed like just so many Cs (and Hs and Os and, alarmingly, Fs) laid out randomly as if I were playing Scrabble. And I swear the letters rearranged themselves every time I looked away, like a scene out of a movie about an art student’s science-class nightmares (minus the extended fantasy sequence in which the letters grow fangs and leap off the page to menace the poor protagonist – unless I’ve blocked that part out).

Fortunately, I knew exactly what to do: I had to start picturing molecules in 3D, and in motion, as soon as possible. That ability seemed to take its own sweet time to develop. But once things “clicked” and I could visualize molecules in motion, the reactions finally made sense, as did all the associated talk of electronegativity, nucleophilic attack, and inside-out umbrellas. I aced the final.

Now, our Molecular Workbench software isn’t specifically designed to help undergraduates get through organic chemistry. It is designed to help students at many levels by letting them interact with simulations of the molecular world so they get the right pictures into their heads, sooner. It’s here to help that future art student and movie director beginning to nurse a complex about the 10th grade science class he’s stuck in right now.

The weight of history

But the “Classic” Molecular Workbench we have now was built for a different world. It runs in desktop Java, for one thing, meaning (among other things) that it’ll never run on iPads. More fundamentally, it was built to be “Microsoft Word for molecules” in a time when Microsoft Word was the dominant model for thinking about how to use a computer:

“Hello, blank page! Let’s see, today I’ll make a diffusion simulation. I should write something about it … Let’s make that 12-point Comic Sans. No, my graphic designer brother-in-law keeps telling me not use that so much, so Verdana it is, then. Now how do I add that model again? Oh yeah, Tools -> Insert -> Molecular Model…”

This model is constraining even though it’s always been possible to download and open Molecular Workbench via the Web, and even though MW saves simulation-containing activities to special URLs.

We have somewhat different expectations these days because of the Web, social media, mobile apps, and casual games. If I build a great in-class “activity” based on a series of molecular models, then I should be able to share that activity with the world with minimum difficulty. And if you find one of the simulations I created particularly illustrative, you should be able to put that model in a blog you control, or include the model as part of your answer to a question on http://physicsforums.com/.

Moreover you ought to be able to perturb the running simulation by reaching out and touching it with your fingers, or simply by shaking your tablet to see what effect that has on the simulation “inside” it. You shouldn’t be required to operate the simulation at one remove, via a mouse and keyboard, when it’s not necessary.

That’s why we’re excited about the Google-funded, next-generation Molecular Workbench we have started to build. The HTML5 + JavaScript technology we’re using to build the next generation of our MW software (hereafter called NextGen MW for short) will make it much more practical to enable these kinds of uses.

Boldly doing that thing you should never do

But designing NextGen MW to be a native of the real-time Web of 2012 rather than a visitor from the land of 1990s desktop computing means that we’re committed to rebuilding the capabilities of “Classic” Molecular Workbench from scratch. That is, we’re doing the very thing Joel Spolsky says you must never do! But ignoring platforms which run Java badly or not at all isn’t an option, and neither is trying to run Classic MW in a Google Web Toolkit-style compatibility layer that compiles Java to JavaScript. (With the latter option, we would almost surely be unable to optimize either the computational speed or the overall user experience well enough to make it practical to use NextGen MW on phones, inexpensive tablets, or even expensive tablets. But even that misses the point. We’re not a consumer products company trying to optimize the return on our past investment. We’re an R&D lab. We try new things.)

But writing things from scratch poses a challenge. We want the molecular dynamics simulations run by NextGen MW to run “the same” as the equivalent simulations run in Classic MW. But “the same” is a slippery concept. In traditional software development, asking two different implementations of a function or method to produce the “same” result often means simply that they return identical data given identical input, modulo a few unimportant differences.

It would be nice to extend this idea to the two-dimensional molecular dynamics simulations we are now implementing in NextGen MW. Classic MW doesn’t have a test suite that we can simply adapt and reuse. But, still, we might think to set up identical initial conditions in NextGen MW and Classic MW, let the simulations run for the same length of simulated time, and then check back to make sure that the atoms and molecules end up in approximately the same places, and the measurements (temperature, pressure, etc.) are sufficiently close. And, voilà, proof that at least this NextGen MW model works “the same” as the Classic MW model. (Or that it doesn’t, and NextGen MW needs to be fixed.)

Never the same thing twice?

Unfortunately, this won’t work. Not even a little bit, and the reason is kind of deep. The trajectories of the particles in a molecular dynamics simulation (and in reality) exhibit a phenomenon known as sensitive dependence on initial conditions. Think of two identical simulations with exactly the same initial conditions except a tiny difference.

Now, pick a favorite particle and watch “the same” particle in each simulation as you let the simulations run. (And assume the simulations run in lockstep.) For a very short time, the particle will appear to follow the same trajectory in simulation 1 as in simulation 2. But as you let the simulation run a little longer, the trajectories of the two particles will grow farther and farther apart, until, very quickly, looking at simulation 1 tells you nothing about where to find the particle in simulation 2.

Very well, you say: maybe simulation 1 and simulation 2 started a little too far apart. So let’s make the difference in the initial conditions a little smaller. Sure enough, the trajectories stay correlated a little bit longer. But a very little bit. Here’s the rub: if you want to simulation 2 to match simulation 1 for twice as long, you need the initial conditions to be some number, let’s say 10, times closer. But if you need the simulations to match for 1 more “time” as long, that is, 3 times as long, you need the initial conditions to be 10 times closer still, or 100 times closer. And if you want simulation 1 to make a meaningful prediction about simulation 2 for ten times as long? Now you need the initial conditions to be a billion(10⁹) times closer. In practice, this means that if there’s any difference at all between the two initial conditions, no matter how seemingly insignificant, then outside of a short window of time the two simulations will predict very different particle locations and velocities.

Perhaps you think this is a contrived situation having nothing to do with comparing Classic MW and NextGen MW. Can’t we start them with, not just similar, but identical initial conditions? Unfortunately, this escape hatch is barred, too. The tiniest and most seemingly insignificant difference between the algorithms NextGen MW runs and the algorithms Classic MW runs right away result in a small difference in the trajectories, and after that point, sensitive dependence on initial conditions takes over: the subsequent trajectories soon become totally different. Trying to run precisely the same algorithms in NextGen MW as in Classic, down to the exact order of operations, would not only intolerably constrain our ability to develop new capabilities in NextGen MW, but would be futile: the differing numerical approximations made by Java and JavaScript would result in yet another small difference which would in short order become a big difference.

Science!

So, wait a minute: You can’t test NextGen MW against Classic MW because even the tiniest difference between them makes them behave … totally differently? How do we trust either program, then? And how is this science again?

Well, notice that I didn’t say quite say the two programs behave totally differently. Yes, the exact trajectories of the molecules will quickly diverge, but the properties we can actually measure in the real world — temperature, pressure, and the like — unfold according to laws we understand, and should be the same in each (not counting minor, and predictable, statistical fluctuations.) After all, we can do beautifully repeatable experiments on “molecules in a box” in the real world without knowing the location of the molecules exactly. Indeed, when van der Waals improved on the ideal gas law by introducing his equation of state, which includes corrections for molecular volume and intermolecular attraction, the notion that molecules actually existed was not yet universally accepted.

So what we need are molecular models whose temperature, pressure, diffusion coefficient, heat capacity, or the like depend in some way on the correctness of the underlying physics. Ideally, we would like to be able to run a Classic MW model and have it reliably produce a single number which (whatever property it actually measures) is demonstrably different when the physics have been calculated incorrectly. Then we could really compare NextGen MW and Classic MW — and perhaps even find a few lingering errors in Classic MW!

Unfortunately for this dream, our library of models created for Classic MW tend to be complex interactives which require user input and aim to get across the “gestalt” of molecular phenomena (e.g., one model encourages students to recognize that water molecules diffusing across a membrane aren’t actively “aiming for” the partition with a higher solute concentrations but move randomly). The models are not intended to be part of numerical experiments designed carefully to produce estimates otherwise-difficult-to-measure properties of the real world. They require substantial rework if they are to generate single numbers that are known to reliably test the physics calculations. For that matter, there aren’t many Classic models at all that conveniently limit themselves to just the features we have working right now in NextGen MW, and we can’t just wait until we develop all the features before we begin testing.

Charts and graphs that should finally make it clear

Therefore, we have turned to making new Classic MW models that demonstrate the physics we want NextGen MW to calculate, and comparing the numbers generated in Classic MW to the numbers generated when the equivalent model is run in NextGen MW. I’ve begun to think of this process as creating the “datasheet” for Classic and NextGen MW, after the datasheets which contain charts and graphs detailing the performance characteristics of an electronics part, and which an engineer using the part can expect it to obey.

So far, we’ve just gotten started creating the MW datasheet. I’ve written a few ugly scripts in half-remembered Python to create models and plot the results and so far, sure enough, it looks like an issue with the NextGen MW physics engine that I knew needed fixing, needs fixing! (The issue is an overly clever, ad hoc correction I introduced to smooth out some of the peculiar behavior of our pre-existing “Simple Atoms Model.” But that’s good fodder for a future blog post.)

But we have ambitions for these characterization tests. Using the applet form of Classic MW, we hope to make it possible to run each of these “characterization tests” by visiting a page with equivalent Classic and NextGen MW models side by side, with output going to an interactive graph. But with or without this interactive form of the test, once characterization tests have been done they will help us to find appropriate parameters for automated tests that will run whenever we update NextGen MW, so that we can be sure that the physics remain reliable.

I’ll update you as we make progress.