Modern infrastructure and applications generate extraordinary volumes of log and telemetry data. At Pure Storage, we know this first hand: we have over 5PB of log data from production customers running our all-flash storage systems, from our engineering testbeds, and from test stations at manufacturing partners. Every part of our company — from engineering to sales — now depends on the insights we gather from this data. Given the diversity of our end users, it’s no surprise that our analysis tools comprise a broad mix of reporting queries, stream-processing operations, ad-hoc analyses, and deeper machine-learning algorithms. In this session, we will cover lessons learned from scaling our data warehouse and how we are leveraging Apache Spark’s capabilities as a central hub to meet our analytics demands.
Brian Gold is an engineering director at Pure Storage and part of the founding team for FlashBlade, Pure’s scale-out, all-flash file and object storage platform. He’s contributed to nearly every part of the FlashBlade architecture and development from inception to production, but fortunately most of his code has been rewritten by others. Brian received a PhD from Carnegie Mellon University, focusing on computer architecture and resilient computing.