Analysing Large Datasets Efficiently with Apache Spark and IntelliJ IDEA - Databricks

Analysing Large Datasets Efficiently with Apache Spark and IntelliJ IDEA

As a Data Engineer you’re building the foundation for the success of any data-driven products or projects your company may envision. No pressure. And, of course, every time you write Spark code, everything runs flawlessly and blazingly fast on the first try, right? Sadly, that’s not the case. From a typo in a method name to a non-optimal way of writing a job, numerous things could result in hours of work wasted – waiting for that Spark job to finish only to crash on the last line. It doesn’t have to be that way. In this talk I’ll show you how you can be more productive when writing Spark jobs to process data. We’ll also look into monitoring and optimizing Spark jobs. All without leaving IntelliJ IDEA.



« back
About Maria Khalusova

JetBrains

Maria is a Developer Advocate at JetBrains where she focuses on data science, data engineering, and machine learning. Before joining the advocacy team, she has has been a part of such projects as IntelliJ IDEA, TeamCity, Upsource. She is also one of the organizers and co-founders of PyData Montreal, and she has been a speaker at a number of industry events.