Senior Data Engineer - Databricks

Senior Data Engineer

Elsevier is in the midst of a transformation, evolving from a publishing company that assures quality control in scientific output (although this will remain important) into a researcher productivity & analytics company that assures better outcomes in every interaction within the scientific world. We are focused on building an integrated, social and personal toolset that delivers differential value to researchers and research entities in helping them with the challenges they face. The linking of our traditional strengths of our publishing business to leading edge technology is critical to our success.

The Elsevier Research Products division focuses on enhancing the performance of Elsevier’s online Researcher Platforms, specifically those devoted to Publishing (ScienceDirect, Health Advance, EVISE, and more), research enablement (Scopus, Mendeley) and Research Intelligence (SciVal, Pure, and more). We drive both the integration and the personalization of this portfolio, creating an interoperable environment that can be customized to meet the specific needs of researchers.

Our global centre of excellence for big data and search, based in London, is engaged in creating a world-class big data, machine-learning and search platform, the primary asset of Elsevier Research Products.

Role Purpose

This is a newly-recreated role within a newly-created global centre of excellence for search based in London consisting of our own in-house search engineers and supported by specialists from the world’s leading search consulting companies. You will apply your deep experience of and enthusiasm for the latest massively parallel distributed technologies to power the discoverability of our scientific and academic content by researchers world-wide.

What you will be doing

Developing and improving the customer search and discovery experience across our research products
Building our knowledge platform for next generation of applications
Designing next-generation search capabilities using the latest very-large-scale and distributed cloud platforms
What we are looking for:

Excellent analytical and problem solving skills
Excellent understanding of Property Graphs, RDF’s, triple stores (e.g. Neo4j, Ontotext, etc.)
Excellent knowledge/proficiency in one or more programming languages (e.g. Python, Java etc.)
Knowledge of machine learning, NLP and other modern data processing techniques will be a huge plus
Technical skills should include software development experience in a curly brace language or Python, as well as scripting abilities. Writing queries, handling data (ETL), and experience using *nix systems, open source software and libraries.
Proficient data modelling and data architecture
Experience in environments for big data engineering and distributed computation (e.g. Spark environments such as DataBricks, Zeppelin, or elastic search)
Basic knowledge of software version control systems (git preferred)
Ability to write scripts for task automation
Experience in gathering requirements for software
Familiarity to cloud technologies i.e. Amazon Web Services (AWS)
Open mind to work with new technologies
Curiosity for algorithm development
Education, Knowledge, Skills and Experiences

University graduate (Master level) in computer science, or an associated area.
Ability to drive new developments and implement process changes and disruptive technologies in the organization.
Familiarity with agile software development.
Experience in stream processing system (eg. Spark, Kafka) is a bonus
Good communication and documentation skills with the ability to convey complex technical concepts to non-technical professionals.