Nicholas Nolasco - Databricks

Nicholas Nolasco

Software Architect, Blackbaud

Nick Nolasco is a software architect with a wide range of experience across critical production systems in multiple industries. He has worked on real-time software for automated guided vehicles, both worked on and led payments and email teams and is most recently building out Blackbaud’s big data platform. The common thread across his career has been a focus on business critical areas and building out scalable data-driven solutions.


Democratizing DataSummit 2020

We present our solution for building an AI Architecture that provides engineering teams the ability to leverage data to drive insight and help our customers solve their problems. We started with siloed data, entities that were described differently by each product, different formats, complicated security and access schemes, data spread over numerous locations and systems. We discuss how we created a Delta Lake to bring this data together, instrumenting data ingestion from the various external data sources, setting up legal and security protocols for accessing the Delta Lake, and go into detail about our method of making all the data conformed into a Common Data Model using a metadata driven pipeline.

This metadata driven pipeline or Configuration Driven Pipeline (CDP) uses Spark Structured Streaming to take change events from the ingested data, references a Data Catalog Service to obtain mapping and the transformations required to push this conformed data into the Common Data Model. The pipeline uses extensive Spark API to perform the numerous types of transformations required to take these change events as they come in and UPSERT into a Delta CDM. This model can take any set of relational databases (1000s in our case), and transform them into a big data format (Delta Lake/parquet) CDM in a scalable, performant way all from metadata. It can then perform schema-on-read to project from this CDM into any requested destination location (database, filesystem, stream, etc). This provides the ability for Data Scientists to request data by specifying metadata, and the pipeline will automatically run producing the schema they require with all data types conformed to a standard value and depositing it to their specified destination.