Speeding up PyAlembic

I posted this write-up to the alembic google group, but it seems like something that should also end up here.

Loading alembic vertex data for use in Python (and numpy) can be very slow, so I went looking for a faster way. My sample data is 562 frames of a 35877 vertex mesh, and prop in my examples is retrieved from IPolyMeshSchema.getPositionsProperty().

Option 1: Naive vertex access. This is the slow way that I wanted to improve upon. On my sample data, this took about 64 seconds.

data1 = np.array([[(x,y,z) for x,y,z in sample] for sample in prop.samples])

Option 1b: It’s noticeably faster to index the vertex components than it is to unpack them. On my sample data, this took about 48 seconds.

data1 = np.array([[(v[0], v[1], v[2]) for v in sample] for sample in prop.samples])

Option 2: Component access.
The sample objects in prop.samples are imath.V3fArray objects, and they have component accessors. So, rather than reading each vertex (with a lot of Python object creation overhead) you can access the .x, .y, and .z components of the arrays individually. On my sample data, this took about 26 seconds.

data2 = np.array([(list(sample.x), list(sample.y), list(sample.z)) for sample in prop.samples])
data2 = data2.reshape((data2.shape[0], -1, 3))

Option 3: imathnumpy
Did you know that, along with the imath module, there’s a *separate* imathnumpy module? Because I sure didn’t (and they don’t include it in maya). And as of this writing, there are only 4 Google results, so it seems that nobody else knew either. That said, there’s a *bit* of a caveat with this one. The object returned from imath.arrayToNumpy may get garbage collected. If that happens, your numpy array (which is just a memoryview) will contain junk data. To fix this, wrap your call in an array copy as shown.
On my sample data, this took about 5 seconds for a fresh run. (If I reran the test, it took about 1.2 seconds on a subsequent run)

import imathnumpy
data3 = np.array([np.array(imathnumpy.arrayToNumpy(s), copy=True) for s in prop.samples]))

Setting data back to this is easy too. Because this is a memoryview (two objects with pointers to the same underlying data) we can simply write to the numpy object we created, and it will fill the V3fArray.

# pts is a (N, 3) shaped np.array
array = V3fArray(len(pts))
memView = arrayToNumpy(array)
np.copyto(memView, pts)

So I think we have a winner. However, there was something else I saw that *may* have beaten imathnumpy … if it wasn’t bugged.
Option 4: Serialization *BUGGED* IArrayProperty has a serialize() method. It looks like it *should* read the sample data of a property, and return it as a string. This would, of course, be extremely useful for reading data directly into numpy without the slow stopover in python. However, every single type of property I tried gives me this error:

TypeError: No to_python (by-value) converter found for C++ type: class std::basic_stringstream<char,struct std::char_traits<char>,class std::allocator<char> >

TL;DR: Use imathnumpy. It’s about 100x faster than the naive vertex unpacking. Just make sure to copy the array like in the example, otherwise you’ll get garbage data.

TFox TD

Coding for 3D and Pipeline

Speeding up PyAlembic

Leave a Reply Cancel reply