Sandy May

Co-organiser of Data Science London and Lead Data Engineer, Elastacloud

Sandy is a Lead Data Engineer at Elastacloud where he has worked for 4 years on myriad projects ranging from SME to FTSE 100 customers. He is a strong advocate of Databricks on Azure and using Spark to solve Big Data problems; he has recently become a Databricks Champion. Having worked on one of the original Databricks on Azure projects he continues to expand his Big Data knowledge using new Open Source technologies such as Delta Lake and ML Flow.

Sandy co-organises the Data Science London meet-up and continues to try to push what he picks up back to the community so they can learn from his mistakes. Having spoken at Spark Summit, Future Decoded, Red Shirt tours and more his knowledge range covers most of the Azure Data Stack with a keen interest in Big Data, Machine Learning and Data Visualisation.

UPCOMING SESSIONS

PAST SESSIONS

Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to Monitor 1,000+ Solar Farms in Real Time

Renewables AI is at the forefront of innovation in the solar energy market. As the name suggests, we use AI to make predictions on energy output from large portfolios of solar farms. This talk lays out the fundamental architecture, technology and approaches that make the platform work beginning with key features of the Azure Databricks cloud and how it works seamlessly with Azure Data Lake and Azure Event Hubs. There will be good coverage of ML and DL Pipelines and how they are used with image recognition and machine learning through Structured Streaming to make real-time decisions. Key Takeaways: Prediction of next day irradiance and power ratios with real-time accuracies of 95% Structured streaming of IoT data from hundreds of thousands of inverters at 5 minute intervals Real-time joining of weather data and several other external datasets Use of Deep Learning Pipelines and advanced time series methods to predict 48 hours of future energy production Near-real time processing of image data at frequent intervals to predict cloud cover from onsite cameras and drones Analysis of data and preventative maintenance of fan failures in solar inverters Session hashtag: #SAISDD11