Speeding up PyAlembic

I posted this write-up to the alembic google group, but it seems like something that should also end up here.

Loading alembic vertex data for use in Python (and numpy) can be very slow, so I went looking for a faster way. My sample data is 562 frames of a 35877 vertex mesh, and prop in my examples is retrieved from IPolyMeshSchema.getPositionsProperty().

Option 1: Naive vertex access. This is the slow way that I wanted to improve upon. On my sample data, this took about 64 seconds.

data1 = np.array([[(x,y,z) for x,y,z in sample] for sample in prop.samples])

Option 1b: It’s noticeably faster to index the vertex components than it is to unpack them. On my sample data, this took about 48 seconds.

data1 = np.array([[(v[0], v[1], v[2]) for v in sample] for sample in prop.samples])

Option 2: Component access.
The sample objects in prop.samples are imath.V3fArray objects, and they have component accessors.  So, rather than reading each vertex (with a lot of Python object creation overhead) you can access the .x, .y, and .z components of the arrays individually. On my sample data, this took about 26 seconds.

data2 = np.array([(list(sample.x), list(sample.y), list(sample.z)) for sample in prop.samples])
data2 = data2.reshape((data2.shape[0], -1, 3))

Option 3: imathnumpy
Did you know that, along with the imath module, there’s a *separate* imathnumpy module?  Because I sure didn’t (and they don’t include it in maya). And as of this writing, there are only 4 Google results, so it seems that nobody else knew either. That said, there’s a *bit* of a caveat with this one. The object returned from imath.arrayToNumpy may get garbage collected. If that happens, your numpy array (which is just a memoryview) will contain junk data. To fix this, wrap your call in an array copy as shown.
On my sample data, this took about 5 seconds for a fresh run. (If I reran the test, it took about 1.2 seconds on a subsequent run)

import imathnumpy
data3 = np.array([np.array(imathnumpy.arrayToNumpy(s), copy=True) for s in prop.samples]))

Setting data back to this is easy too. Because this is a memoryview (two objects with pointers to the same underlying data) we can simply write to the numpy object we created, and it will fill the V3fArray.

# pts is a (N, 3) shaped np.array
array = V3fArray(len(pts))
memView = arrayToNumpy(array)
np.copyto(memView, pts)

So I think we have a winner. However, there was something else I saw that *may* have beaten imathnumpy … if it wasn’t bugged.
Option 4: Serialization *BUGGED* IArrayProperty has a serialize() method. It looks like it *should* read the sample data of a property, and return it as a string. This would, of course, be extremely useful for reading data directly into numpy without the slow stopover in python. However, every single type of property I tried gives me this error:

TypeError: No to_python (by-value) converter found for C++ type: class std::basic_stringstream<char,struct std::char_traits<char>,class std::allocator<char> >

TL;DR: Use imathnumpy. It’s about 100x faster than the naive vertex unpacking. Just make sure to copy the array like in the example, otherwise you’ll get garbage data.

Python CAN be fast for Maya vertex access

Accessing vertex data in Python can be trying at times. Whether you go through cmds or OpenMaya, the time it takes to get, and especially set the data can be the difference between building a tool in Python, or pulling out the big guns and writing a c++ plugin. But fear not, there is a way around this fast data access so long as you have numpy installed for maya.

Edit: Since this got a little traction on Facebook, here’s a link to a github gist with a module that can freely convert between numpy and Maya’s array types.
https://gist.github.com/tbttfox/9ca775bf629c7a1285c27c8d9d961bca

If you don’t have numpy installed for maya, you can get it for 2014-2016 here, and 2018 here. One of those should work for 2017, but I don’t have it installed to test with.

Speaking of testing, I’m going to subdivide the heck out of a sphere and see how long it takes to get a python list of the verts using both cmds and OpenMaya.

I subdivided a poly sphere 5 times to get it up to about 400k verts just to make the numbers bigger, and then I ran this completely un-scientific test:

from maya import OpenMaya as om
from maya import cmds

def getPointsOpenMaya(pyList=True):
	sel = om.MSelectionList()
	sel.add('pSphere1')

	dagPath = om.MDagPath()
	sel.getDagPath(0, dagPath)
	verts = om.MPointArray()
	fnMesh = om.MFnMesh(dagPath)
	fnMesh.getPoints(verts, om.MSpace.kWorld)

	if pyList:
		return [([verts[i][0], verts[i][1], verts[i][2]]) for i in range(verts.length()) ]
	return verts

def getPointsCmds():
	flatverts = cmds.xform("pSphere1.vtx[*]", translation=1, query=1, worldSpace=False)
	args = [iter(flatverts)] * 3
	return zip(*args)


import time
start = time.time()
getPointsCmds()
took = time.time() - start
print "cmds Took", took

start = time.time()
getPointsOpenMaya(True)
took = time.time() - start
print "OpenMaya Took", took

start = time.time()
getPointsOpenMaya(False)
took = time.time() - start
print "MPointArray Took", took

#cmds Took 0.594000101089
#OpenMaya Took 2.10800004005
#MPointArray Took 0.0269999504089

So cmds took about half a second to get that data. That’s not bad if you only have to do it once. Turning an MPointArray into a list of tuples took about 2 seconds. Ugh. But just getting the MPointArray by itself took 1/40 of a second. That’s not too shabby, but we can’t really do anything with it without running it through that python list conversion that took 100x longer. But there’s a better way.

The trick is the MScriptUtil, and the PlainOldData getters/setters on the Maya numeric types return SWIG objects. There’s also an undocumented function OpenMaya.MFnMesh.getRawPoints() (you can find it in the C++ docs though) that returns a SWIG Float Pointer. So what does that give us? Well, if you call int() on that object, it returns that object’s memory address. And that’s a step in the right direction … but it’s not quite there yet. But numpy does have ctypeslib for reading C style data, so if we convert this SWIG object to a ctypes array, we should be able to pass that into numpy.

from maya import OpenMaya as om
from ctypes import c_float
import numpy as np
import time

start = time.time()
sel = om.MSelectionList()
sel.add('pSphere1')
dagPath = om.MDagPath()
sel.getDagPath(0, dagPath)

fnMesh = om.MFnMesh(dagPath)
# rawPts is a SWIG float pointer
rawPts = fnMesh.getRawPoints()
ptCount = fnMesh.numVertices()


# Cast the swig double pointer to a ctypes array
cta = (c_float * ptCount * 3).from_address(int(rawPts))
# Memory map the ctypes array into numpy
out = np.ctypeslib.as_array(cta)
# ptr, cta, and out are all pointing to the same memory address now
# so changing out will change ptr
# but in this case

# for safety, make a copy of out so I don't corrupt memory
out = np.copy(out)
end = time.time()

print "Getting the numpy interface took:", end - start

# Getting the numpy interface took: 0.00299978256226

So yeah, getting that data straight into numpy is another 8x faster than just accessing the MPointArray. Be careful though because you can very easily corrupt memory and cause all kinds of havoc. All of this is dealing with pointers and memoryviews, which don’t increase python’s reference counting.  So just like MScriptUtil objects, you either have to pass the pointer object around, or get your data out of it before the pointer gets garbage collected. In the script above, you can see I run np.copy(out) to do just that.

However, this isn’t just a one way street.  If you were to use MScriptUtil to build a float pointer, you could fill that pointer with data from numpy which would then be accessible to OpenMaya functions.

Now if only Autodesk would include the imathnumpy module (which is part of pyalembic) so I could write a stupidly fast alembic exporter in python.