Most enterprises have business critical code that is well maintained and high performance. The switching costs to rewrite or port this code can often prevent adoption of new frameworks due to the level of technical debt. Adding another level of indirection through network proxies often results in an unacceptable performance hit. This problem is particularly acute in edge compute workloads where high throughput sensors feed real-time processing and storage pipelines. Illuminate Technologies’ threat detection solutions for 5G networks apply Spark and custom Data Sources to implement this workload efficiently. In this talk we pursue an alternative approach of integrating proven native code with the power of the Spark DataSourceV2 API. This allows the power of the Spark platform for ETL, structured streaming and data formatting to be combined with the data processing logic of existing code.
The talk will walkthrough the structure of a custom datasource that can be used in streaming or file modes that wraps a native C++ processing engine in a single JAR. Techniques such as JNI wrappers, autoloading shared libraries and maven build integration will be shown. The talk will also cover pitfalls such as multiplatform support and library dependencies. The talk will include demos, benchmarks and framework code will be made available on github.
Doug Carson is a Solutions Architect working for Illuminate Technologies in Edinburgh. During his career he has architected measurement and processing solutions for leading edge telecoms technologies such as intelligent networks, voice over packet, mobile networks and recently, virtualized networks. He holds nine patents in telecommunications protocol processing techniques. He has been using Spark since 2015 and is a co-author of a paper that applied genetic improvement to Spark queries presented at the BCS Real AI 2015 conference and GECCO '16. He is currently using Spark for cyber threat detection in our telecommunications infrastructure that employs custom datasources.