When I was temporary working at my former university UPC (Universitat politècnica de Catalunya) in the TALP department (Center for Language and Speech Technologies and Applications), I found myself in the following situation:
My employers had a lot of python code creating matrices and stuff saved to .npy (numpy files) but wanted to speed up their processes.
I thought about loading those files in existing C code, to effectively use the GPU with OpenGL/CUDA. And once everyone was convinced, I spent some time developing a small library to do so, in the following post you will find an explanation of the numpy format and the code for the C library.