NLP with MLlib: Global Empire-Building for Fun and Profit

Download Slides

In order to establish a user base across the globe, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings, which are deeply integrated into many facets of a product. To address these challenges, Qordoba is using scalable, open-source machine learning to automate the process. Specifically, they need to generate high-quality translations in many different languages, and make them available in real-time across platforms, e.g. mobile, print and web.
In this session, you’ll learn about the techniques Qordoba is using to provide:
– Continuous deployment of localized strings
– Live syncing across platforms (mobile, web, photoshop, sketch, help desk, etc.)
– Content generation for any locale
– Emotional response

You’ll also hear about their architecture for handling billions of localized strings in many different languages, including their use of:
– Scala and Akka as an orchestration layer
– Apache Cassandra and MariaDB as a storage layer
– Apache Spark and Apache PredictionIO (incubating) for natural language processing
– Apache Kafka as a message bus for reporting, billing and notifications
– Docker, Marathon and Apache Mesos for containerized deployment

Session hashtag: #SFds17

About Michelle Casbon

Michelle Casbon is Director of Data Science at Qordoba, a platform that uses machine learning to help companies globalize their products. Her focus is on scalable NLP that generalizes across (natural) languages. Previously, she was a Senior Data Science Engineer at Idibon, where she built tools for generating predictions on text datasets. Michelle completed a Masters at the University of Cambridge, focusing on NLP, speech recognition, speech synthesis, and machine translation. She loves working with open source projects and has contributed to Apache Spark and Apache Flume.