As one of the world’s leading news organisations, at the Guardian we want our journalism to reach the widest audience possible. That’s why we’ve invested in our own analytics platform, Ophan, to learn from the tens of millions of page views a day reading our content. In this talk, I’ll describe why we wanted to introduce Spark to an already successful platform, the challenges we’ve faced in adoption and how we’re now using it. Expect a warts and all talk including common problems in writing and debugging Spark applications and some mistakes entirely of my own making, as well as covering where we’ve found Spark to be really useful and how we’ve got the most out of it.
Phil Wills is lead software architect for The Guardian. He doesn't believe it's possible to do this well without regularly writing production code. Phil has helped build and scale theguardian.com, the tools used to produce it and Ophan (https://www.journalism.co.uk/news/how-ophan-offers-bespoke-data-to-inform-content-at-the-guardian/s2/a563349/), the analytics tool used to ensure our journalism reaches the widest possible audience. Within the team he's driven the adoption of Scala and Continuous Delivery, which he's written about as part of Build Quality In (https://leanpub.com/buildqualityin). More recently he's been seeing how Spark can be harnessed to improve the way we consume news.