If you’re subscribed to email@example.com, or work in a large company, you may see some common Spark error messages. Even attending Spark Summit over the past few years you have seen talks like the “Top K Mistakes in Spark.” While cool non-machine learning based tools do exist to examine Spark’s logs — they don’t use machine learning and therefore are not as cool but also limited in by the amount of effort humans can put into writing rules for them. This talk will look what happens when we train “regular” clustering models on stack traces, and explore DL models for classifying user message to the Spark list. Come for the reassurance that the robots are not yet able to fix themselves, and stay to learn how to work better with the help of our robot friends. The tl;dr of this talk is Spark ML on Spark output, plus a little bit of Tensorflow is fun for the whole family, but probably shouldn’t automatically respond to user list posts just yet.
Session hashtag: #SAISML10
Holden is an Apache Spark committer and PMC member who focus on PySpark and Kubernetes support. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Her current side project is working on a book to teach children distributed systems, http://www.distributedcomputing4kids.com/.
Gris Cuevas is an Open Source Program Manager at Google Cloud and an aspiring Data Scientist. She recently graduated with a Masters in Operations Research and Data Science at UC Berkeley. Gris has worked on developing online communities for the past 7 years and is now collaborating on the design of an algorithm to predict author quality in online forums at Google. Gris is interested in Natural Language Processing, Information Retrieval, and Open Source technologies. She loves The Beatles, juggling and Mexican food of course.