Karthikeyan Nagalingam

Technical Marketing Engineer, NetApp

I am working as a Bigdata Analytics Technical Marketing Engineer in NetApp Inc. I am architecting Hadoop solutions, Proof of concepts, presenting Hadoop solutions to customer, field experts, partners through events such as NetApp Insight, Forsight, Research triangle park local meetups, NetApp executive Briefing center, presales, postsales and assisting customers.



Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp Data Fabric and NetApp Private StorageSummit 2017

This session will explain how NetApp simplifies the process of analyzing IoT data, using Apache Spark clusters across data centers and the cloud using NetApp Private Storage (NPS) for AWS/Azure, NetApp Data Fabric and NetApp Connectors for NFS and S3. IoT data originates at the edge in different geographical locations, and it can arrive at different data centers or the cloud depending on sensor location. The challenge is how to combine these different data streams across different datacenters to generate wider insights. Learn how NetApp Data Fabric helps solve this challenge. In the Data Fabric architecture, the IoT data is ingested via Kafka into an Apache Spark cluster running in AWS/Azure, but the data is stored in NPS provisioned NFS share through NFS Connector. The IoT data in NPS can then be moved to on-prem datacenters, or on-prem IoT data can be moved to NPS or ONTAP Cloud for processing in AWS/Azure using NetApp SnapMirror Flex Clone or NFS Connector. We'll also review how NetApp StorageGRID object storage maintains IoT data for archival purposes using S3 Target. The above options allow you to analyze IoT data from AWS, StorageGRID, HDFS or NFS, providing a feasible solution for deploying Spark clusters across datacenters. Takeaways will include identifying Spark challenges that can be remedied by extending your Spark environment to take advantage of NPS; understanding how NPS and StorageGRID can provide a cost-effective alternative for dev/test, DR for Spark analytics; and understanding Spark architecture and deployment options that utilize data from multiple locations, including on-prem and cloud-based repositories. Session hashtag: #SFeco4