Speeding up PyAlembic

I posted this write-up to the alembic google group, but it seems like something that should also end up here.

Loading alembic vertex data for use in Python (and numpy) can be very slow, so I went looking for a faster way. My sample data is 562 frames of a 35877 vertex mesh, and prop in my examples is retrieved from IPolyMeshSchema.getPositionsProperty().

Option 1: Naive vertex access. This is the slow way that I wanted to improve upon. On my sample data, this took about 64 seconds.

data1 = np.array([[(x,y,z) for x,y,z in sample] for sample in prop.samples])

Option 1b: It’s noticeably faster to index the vertex components than it is to unpack them. On my sample data, this took about 48 seconds.

data1 = np.array([[(v[0], v[1], v[2]) for v in sample] for sample in prop.samples])

Option 2: Component access.
The sample objects in prop.samples are imath.V3fArray objects, and they have component accessors.  So, rather than reading each vertex (with a lot of Python object creation overhead) you can access the .x, .y, and .z components of the arrays individually. On my sample data, this took about 26 seconds.

data2 = np.array([(list(sample.x), list(sample.y), list(sample.z)) for sample in prop.samples])
data2 = data2.reshape((data2.shape[0], -1, 3))

Option 3: imathnumpy
Did you know that, along with the imath module, there’s a *separate* imathnumpy module?  Because I sure didn’t (and they don’t include it in maya). And as of this writing, there are only 4 Google results, so it seems that nobody else knew either. That said, there’s a *bit* of a caveat with this one. The object returned from imath.arrayToNumpy may get garbage collected. If that happens, your numpy array (which is just a memoryview) will contain junk data. To fix this, wrap your call in an array copy as shown.
On my sample data, this took about 5 seconds for a fresh run. (If I reran the test, it took about 1.2 seconds on a subsequent run)

import imathnumpy
data3 = np.array([np.array(imathnumpy.arrayToNumpy(s), copy=True) for s in prop.samples]))

Setting data back to this is easy too. Because this is a memoryview (two objects with pointers to the same underlying data) we can simply write to the numpy object we created, and it will fill the V3fArray.

# pts is a (N, 3) shaped np.array
array = V3fArray(len(pts))
memView = arrayToNumpy(array)
np.copyto(memView, pts)

So I think we have a winner. However, there was something else I saw that *may* have beaten imathnumpy … if it wasn’t bugged.
Option 4: Serialization *BUGGED* IArrayProperty has a serialize() method. It looks like it *should* read the sample data of a property, and return it as a string. This would, of course, be extremely useful for reading data directly into numpy without the slow stopover in python. However, every single type of property I tried gives me this error:

TypeError: No to_python (by-value) converter found for C++ type: class std::basic_stringstream<char,struct std::char_traits<char>,class std::allocator<char> >

TL;DR: Use imathnumpy. It’s about 100x faster than the naive vertex unpacking. Just make sure to copy the array like in the example, otherwise you’ll get garbage data.

Leave a Reply

Your email address will not be published. Required fields are marked *