Enterprise Data Governance and Compliance at Scale – Databricks

Enterprise Data Governance and Compliance at Scale

Download Slides

Twilio is a cloud communication platform supporting 40,000+customers, 1+ Million Developers, handling millions of messages per minute across the globe from various different sectors. There are many regulated industries and parts of the world where data needs to be moved, stored and accessed securely. Twilio provides firm foundation for that and is focused towards providing customers a secure and scalable telecommunication cloud platform.

Handling this massive amount of data in secured way is possible because of Kafka and Spark. Twilio’s Data platform team is building a compliance layer on top of Data Pipeline, Data Lake and Bulk Data Transformer to handle different compliance requirements such as GDPR, HIPAA, PCI etc. Secured Data Pipeline is a streaming channel for Data Lake, BI Data Warehouse and Elastic Search whereas Bulk Data Transformer is a ETL channel to transfer and transform bulk data from RDMS. Kafka Connect, Spark SQL and Data frames powers streaming channel and makes data wrangling and de-duping efficient.

The Data Compliance layer has various components such as Data Anonymization, Authentication, Authorization, Auditing, Custom Retention and Data Deletion to handle the requirements of Processor and Controller. Anonymization as a service provides redaction, encryption and data obfuscation and is based on the varying needs of compliance and customers. Role based Access Control is applied on Kafka layers and S3 Layers to make sure only valid systems and users can access the critical data and rest of them will access to have only redacted data. Auditing service tracks all the access to various resources both from processor and controller perspective. Distributed Spark executor model makes the petabytes of data deletion efficient after the custom retention period. Thus scalable, fault-tolerant, distributed, secured, audited data governance pipeline is possible through Kafka, Kafka connect and Spark.

Session hashtag: #EntSAIS11

« back
About Sri Esha Subbiah

Sri Esha Subbiah is Senior Engineering Manager in Twilio leading Data Platform team. She is a Technically well-versed and business savvy Leader with strong management experience coupled with technical experience. Earlier she worked for eBay and Oracle for 7 years. She has 17+ years of Industry experience in various areas such as Oracle Middleware, Oracle ERP, Front End Frameworks, Backend Java Frameworks, Big Data and AI. She has done M.S (By Research) and published many research papers and articles in Journals about Intelligent Software Agents and IoT devices.

About Sunil Patil

Sunil Patil is a Senior Engineer in Twilio's Data Platform Team. He excels in architecting and developing Big data solutions to meet enterprise business needs. Earlier he worked for MapR and IBM for 10 years. He 15+ years of Industry experience in various areas such as IBM Middleware, Front End Technologies, Java/Scala Frameworks and Big data. Sunil has published many articles for website such as Javaworld, Orielly's and also presented in various conferences on topics such as IBM's middleware, Big Data and IOT