PaperBoat, Sailing in the deep waters of Machine Learning
Ismion PaperBoat is a machine learning library with some cool features that can make your life easier when you are developing new machine learning algorithms or applications.
- PaperBoat has been developed based on the C++ template metaprogramming principles. This approach makes PaperBoat easy to configure and integrate with other platforms (already integrated in Logicblox and HPCCSystems ). Most important though it helps tremendously with performance.
- Data is always stored based on the minimum data precision needed. Every column is stored in the precision specified by the user. Columns with the same precision are stored next to each other. This triggers vectorization speedups offered in any modern preprocessor. Also, libraries like BLAS/LAPACK/FLAME can speed up vector operations.
- Templatization also avoids the virtual function overhead and it allows the compiler to do extensive optimizations, since all the code is available at compile time. Our experiments showed 4x speedup over an implementation of the library with virtual functions.
- Another feature of PaperBoat is threading . All fundamental algorithms are tasks that are executed asynchronously. Synchronization of tasks is based on a data availability model, inspired by datalog .
- Another advantage of PaperBoat is the multidimensional indexing structures like kd-trees that can offer orders of magnitude speedups.
- With kd-trees being first tier feature of the library, geometric tricks can be easily applied algorithms. Multidimensional trees can speed up performance in two ways, either by clever stratified sampling algorithms or by clever geometric tricks, that lead to efficient branch and bound pruning.
- The unique data layer abstraction allows easy integration with other host platforms on three levels, (a) light, data copy, (b) deep, working directly on the host platform data structures, (c) sanitized, running on a separate process or even machine, in a client server mode for more safety.
PaperBoat is currently in beta mode. New algorithms are added and tested every day. The full PaperBoat toolbox has been licensed by Ismion clients and it has been tested in production. Feel free to download and use the PaperBoat library for your projects. If you want to use it commercially, please contact us at email@example.com to obtain a license to use it. Ismion will also customize the code for specific machine learning project requirements. Source code is hosted in GitHub under this license.
ECL-PB, for the HPCC platform is hosted in GitHub.
Check our Tutorial section for examples and usage of the code.