Using Spark ML on Spark Errors – What Do the Clusters Tell Us?

Download Slides

If you’re subscribed to, or work in a large company, you may see some common Spark error messages. Even attending Spark Summit over the past few years you have seen talks like the “Top K Mistakes in Spark.” While cool non-machine learning based tools do exist to examine Spark’s logs — they don’t use machine learning and therefore are not as cool but also limited in by the amount of effort humans can put into writing rules for them. This talk will look what happens when we train “regular” clustering models on stack traces, and explore DL models for classifying user message to the Spark list. Come for the reassurance that the robots are not yet able to fix themselves, and stay to learn how to work better with the help of our robot friends. The tl;dr of this talk is Spark ML on Spark output, plus a little bit of Tensorflow is fun for the whole family, but probably shouldn’t automatically respond to user list posts just yet.

Session hashtag: #SAISML10

« back
About Griselda Cuevas

Gris Cuevas is an Open Source Program Manager at Google Cloud and an aspiring Data Scientist. She recently graduated with a Masters in Operations Research and Data Science at UC Berkeley. Gris has worked on developing online communities for the past 7 years and is now collaborating on the design of an algorithm to predict author quality in online forums at Google. Gris is interested in Natural Language Processing, Information Retrieval, and Open Source technologies. She loves The Beatles, juggling and Mexican food of course.