Since the invention of SQL and relational databases, data production has been about specifying how data is transformed through queries. While Apache Spark can certainly be used as a general distributed query engine, the power and granularity of Spark’s APIs enables a revolutionary increase in data engineering productivity: goal-based data production. Goal-based data production concerns itself with specifying WHAT the desired result is, leaving the details of HOW the result is achieved to a smart data warehouse running on top of Spark. That not only substantially increases productivity, but also significantly expands the audience that can work directly with Spark: from developers and data scientists to technical business users. With specific data and architecture patterns spanning the range from ETL to machine learning data prep and with live demos, this session will demonstrate how Spark users can gain the benefits of goal-based data production.
Session hashtag: #EUent1
Sim Simeonov is an entrepreneur, investor and startup mentor. He is the founding CTO of Swoop and IPM.ai, startups that use privacy-preserving AI to improve patient outcomes and marketing effectiveness in life sciences and healthcare. Previously, Sim was the founding CTO of Evidon (CrownPeak) & Thing Labs (AOL) and a founding investor in Veracode (Broadcom). In his VC days, Sim was an EIR at General Catalyst Partners and technology partner at Polaris Partners where he helped start five companies the firms invested in, three of which have already been acquired. Before his days as an investor, Sim was vice president of emerging technologies and chief architect at Macromedia (now Adobe). Earlier, he was a founding member and chief architect at Allaire, one of the first Internet platform companies whose flagship product, ColdFusion, ran thousands of sites such as Priceline and MySpace.