Global Empire-Building for Fun and Profit - Databricks

Global Empire-Building for Fun and Profit

Download Slides

In order to establish a user base across the globe, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings, which are deeply integrated into many facets of a product. To address these challenges at Qordoba, we’re using highly scalable technologies and machine learning to automate the process. Specifically, we need to generate high-quality translations in many different languages and make them available in real-time across platforms, e.g. mobile, print, and web.In this talk, we describe the techniques we’re using to provide:
* Continuous deployment of localized strings
* Live syncing across platforms (mobile, web, photoshop, sketch, help desk, etc.)
* Content generation for any locale
* Emotional response

We will also share our architecture for handling billions of localized strings in many different languages. We talk about our use of:
* Scala and Akka as an orchestration layer
* Apache Cassandra and MariaDB as a storage layer
* Apache Spark for natural language processing
* Apache Kafka as a message bus for reporting, billing, & notifications
* Docker, Marathon, & Apache Mesos for containerized deployment

We present our solution in the context of a platform that makes it feasible to build products that feel native to every user, regardless of language.

About Michelle Casbon

Michelle Casbon is Director of Data Science at Qordoba, a platform that uses machine learning to help companies globalize their products. Her focus is on scalable NLP that generalizes across (natural) languages. Previously, she was a Senior Data Science Engineer at Idibon, where she built tools for generating predictions on text datasets. Michelle completed a Masters at the University of Cambridge, focusing on NLP, speech recognition, speech synthesis, and machine translation. She loves working with open source projects and has contributed to Apache Spark and Apache Flume.