Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks. Ideally, all three of these pieces should be able to be integrated into a single workflow. This makes development, experimentation, and deploying results much easier. Spark’s MLlib provides a number of machine learning algorithms, and now there are also projects making deep learning achievable in MLlib pipelines. All we need is the NLP annotation frameworks. SparkNLP adds NLP annotations into the MLlib ecosystem. This talk will introduce SparkNLP: how to use it, its current functionality, and where it is going in the future.
Session hashtag: #EUdd4
Alex Thomas is a data scientist at Indeed. Over his career, Alex has used natural language processing (NLP) and machine learning with clinical data, identity data, and (now) employer and jobseeker data. He has worked with Apache Spark since version 0.9 as well as NLP libraries and frameworks including UIMA and OpenNLP.