Shawn Benjamin has nearly 20 years of experience in Federal Sector Information Technology experience. Joining U.S. Citizenship and Immigration Services in 2006, he is a founding member of USCIS’s enterprise data warehouse program and has continued to apply innovation in analytics and data strategy. Shawn now serves as the Chief of Data & Business Intelligence for USCIS Office of Information Technology.
U.S. Citizenship and Immigration Services (USCIS) is the government agency that oversees lawful immigration to the United States. USCIS seeks to secure America's promise as a nation of immigrants by providing accurate and useful information to our customers, granting immigration and citizenship benefits, promoting an awareness and understanding of citizenship, and ensuring the integrity of our immigration system. To keep up with the growing demand for timely and efficient data accessibility for immigration, USCIS had to continuously improve, evaluate, streamline and revise our data analytics processes.
Apache Spark with Databricks, cloud native data lake with Delta Lake, MLflow, and other tools became a crucial factor for the success of our Agency's programs such as Electronic Immigration System (ELIS), eProcessing, Operational and case status reporting, Fraud detection, Refugee, Asylum, and International Operations (RAIO), Forecasting, etc. by liberating the data. Prior, USCIS was overwhelmed by legacy systems with operational data stores and a dated data warehouse containing disparate datasets from mainframes to relational databases to unstructured data all needing continuous updates. Fortunately, there was a way to remain current with the source systems with a more reliable platform to reduce risks, while providing better function and containerized applications. We were outgrowing our traditional relational database capacity to service the added demand from the user community and stakeholders. Although a recent move to the cloud improved our capability, we required a dynamically scalable platform that could adapt and cater to the growing data demand. This presentation and technical demo will have a deep dive on the path of accomplishing this requirement and lessons learned in an efficient and economical way of using Databricks and related technologies like Apache Spark, Delta Lake and MLflow.