For Roularta, a news & media publishing company, it is of a great importance to understand reader behavior and what content attract, engage and convert readers. At Roularta, we have built an AI-driven article quality scoring solution on using Spark for parallelized compute, Delta for efficient data lake use, BERT for NLP and MLflow for model management. The article quality score solution is an NLP-based ML model which gives for every article published – a calculated and forecasted article quality score based on 3 dimensions (conversion, traffic and engagement).
The score helps editorial and data teams to make data-driven article-decisions such as launching another social post, posting an article behind the paywall and/or top-listing the article on the homepage.
The article quality score gives editorial a quantitative base for writing more impactful articles and running a better news desk. In this talk, we will cover how this article quality score tool works incl.
– The role of Delta to accelerate the data ingestion and feature engineering pipelines
– The use of the NLP BERT language model (Dutch based) for extracting features from the articles text in a Spark environment
– The use of MLflow for experiments tracking and model management
– The use of MLflow to serve model as REST endpoint within Databricks in order to score newly published articles
Speaker: Ivana Pejeva
Ivana is a data scientist, passionate about machine learning and artificial intelligence. As part of the Data Science and Strategy competence center at element61, she helps organizations build and grow business with data. Ivana is an engineering professional with a Master's degree in Artificial Intelligence from KU Leuven.