Claudiu Barbura

Director of Engineering, Blueprint Technologies

Claudiu is Director of Engineering at Blueprint Technologies, he oversees Product Engineering where he builds large scale advanced analytics pipelines, IoT and Data Science applications for customers in oil & gas, energy and retail industries. Formerly VP of Engineering at Ubix.io, automating data science at scale and Sr. Dir. of Engineering, xPatterns Platform Services at Atigeo, building several advanced analytics platforms and applications in healthcare and financial industries, Claudiu is a hands on architect, dev manager and executive with 20+ years of experience in Open Source, Big Data Science and Microsoft technology stacks, frequent speaker at data conferences.

Past sessions

Live demo and lessons learned building and publishing an advanced video analytics solution in the Azure Marketplace. This is a deep technical dive into the engineering and data science employed throughout, with all challenges encountered by combining Deep Learning and Computer Vision for object detection and tracking, the operational management and tool building efforts for scaling the video processing and insights extraction to large GPU/CPU Databricks clusters and the machine learning required to detect behavioral patterns, anomalies and scene similarities across processed video tracks.

The entire solution was build using open source scala, python, spark 3.0, mxnet, pytorch, scikit-learn as well as Databricks Connect.

In this session watch:
Claudiu Barbura, Director of Engineering, Blueprint Technologies

[daisna21-sessions-od]

Summit 2014 xPatterns on Spark, Shark, Tachyon and Mesos

June 29, 2014 05:00 PM PT

xPatterns is a big data analytics platform as a service that enables rapid development of enterprise-grade analytical applications. It provides tools, api sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to NoSql and SolrCloud feeding real-time access through low-latency/high-throughput apis as well as dashboard and visualization api/tools leveraging the available data and models. We will showcase the entire lifecycle of one of the xPatterns applications built for our largest production customer (20 billion medical, pharmacy and lab data records worth 200 TB of compressed hdfs data) while evolving our infrastructure from Hadoop and Hive to Spark, Shark, Tachyon and Mesos. We will provide detailed ELT pipeline stats with lessons learned (Hadoop vs Spark, Hive vs Shark vs Shark w/ Tachyon), tips & tricks for fine-tuning performance on various EC2 hardware configurations, live demos of Jaws, our Restful SharkServer and GUI for exploring the warehouse through Shark queries, Mesos providing resource management for multiple workloads (Hadoop/Hive, Spark, multiple instances of load balanced Jaws), Tachyon, an in-memory distributed file system, backed by hdfs that allows for a better performing and more resilient Spark/Shark stack, the Export to NoSql API console (generates geo-replicated apis for real-time access to Cassandra data exported from the warehouse through Spark jobs), the Referral Provider Network, a user-facing dashboard application (D3.js) and finally, monitoring and instrumentation consoles (Nagios, Ganglia and Graphite).