The ever-growing continuous influx of data causes every component in a system to burst at its seams. GPUs and ASICs are helping on the compute side, whereas in-memory and flash storage devices are utilized to keep up with those local IOPS. All of those can perform extremely well in smaller setups and under contained workloads. However, today's workloads require more and more power that directly translates into higher scale.
Training major AI models can no longer fit into humble setups. Streaming ingestion systems are barely keeping up with the load. These are just a few examples of why enterprises require a massive versatile infrastructure, that continuously grows and scales. The problems start when workloads are then scaled out to reveal the hardships of traditional network infrastructures in coping with those bandwidth hungry and latency sensitive applications.
In this talk, we are going to dive into how intelligent hardware offloads can mitigate network bottlenecks in Big Data and AI platforms, and compare the offering and performance of what's available in major public clouds, as well as a la carte on-premise solutions.
Microsoft Azure's advanced compute and network infrastructure allows Spark to run in the cloud without compromising on performance. With the growing arsenal of hardware offloads available on cloud VMs, owning and maintaining bleeding edge hardware is no longer a prerequisite for accelerated compute. In this talk, we will demonstrate how hardware accelerations in Azure can be utilized to speed-up Spark jobs seamlessly, with the aid of RDMA (Remote Direct Memory Access) support in the VM. We will demonstrate use cases of benchmarks and real-world applications, that achieve impressive performance improvements with minimal configuration. Session hashtag: #HWCSAIS18
The opportunity in accelerating Spark by improving its network data transfer facilities has been under much debate in the last few years. RDMA (remote direct memory access) is a network acceleration technology that is very prominent in the HPC (high-performance computing) world, but has not yet made its way to mainstream Apache Spark. Proper implementation of RDMA in network-oriented applications can improve scalability, throughput, latency and CPU utilization. In this talk we are going to present a new RDMA solution for Apache Spark that shows amazing improvements in multiple Spark use cases. The solution is under development in our labs, and is going to be released to the public as an open-source plug-in. Session hashtag: #EUres3