Advanced Natural Language Processing with Apache Spark NLP

Download Slides

NLP is a key component in many data science systems that must understand or reason about text. This hands-on tutorial uses the open-source Spark NLP library to explore advanced NLP in Python. Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of some of the most recent research in applied deep learning. It’s the most widely used NLP library in the enterprise today.

You’ll edit and extend a set of executable Python notebooks by implementing these common NLP tasks: named entity recognition, sentiment analysis, spell checking and correction, document classification, and multilingual and multi domain support. The discussion of each NLP task includes the latest advances in deep learning used to tackle it, including the prebuilt use of BERT embeddings within Spark NLP, using tuned embeddings, and “post-BERT” research results like XLNet, ALBERT, and roBERTa. Spark NLP builds on the Apache Spark and TensorFlow ecosystems, and as such it’s the only open-source NLP library that can natively scale to use any Spark cluster, as well as take advantage of the latest processors from Intel and Nvidia. You’ll run the notebooks locally on your laptop, but we’ll explain and show a complete case study and benchmarks on how to scale an NLP pipeline for both training and inference.

Speakers: David Talby and Veysel Kocaman

Watch more Data + AI sessions here
Try Databricks for free
« back
About David Talby

John Snow Labs

David Talby is a chief technology officer at John Snow Labs, helping healthcare & life science companies put AI to good use. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

About Veysel Kocaman

John Snow Labs

Veysel is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence and big data with over ten years of experience. He is currently a Senior Data Scientist at John Snow Labs, improving the Spark NLP for Healthcare library and delivering hands-on projects in Healthcare and Life Science. He's the intructor of advaned NLP courses on Experfy and Udemy. Veysel has broad experience consulting on Statistics, Data Science, Software Architecture, DevOps, Machine Learning and AI to several start-ups, bootcamps and companies around the globe.