Streamlining the path to citizenship with data
Faster sales-lead follow up
To process tables with 120 million rows
Faster query performance
The number of immigration- and citizenship-related applications has skyrocketed over the past decade across naturalizations, green cards, employment authorizations and other categories. With millions of applications and petitions flooding the USCIS, processing delays have reached crisis levels — with overall case processing times increasing 91% since FY2014.* Core to these issues was an on-premises, legacy architecture that was complex, slow and costly to scale. By migrating to AWS and Databricks, USCIS adopted a unified approach to data analytics with more big data processing power and the federation of data across dozens of disparate sources. This has unlocked operational efficiencies and new opportunities for their entire data organization to drive business intelligence and fuel ML innovations designed to streamline application and petition processes.
Processing delays fueled by on-premises, legacy architecture
Charged with administering the nation’s immigration system, USCIS is focused on enabling people to obtain work authorizations, apply for immigration benefits, and seek asylum, and it allows U.S. employers to fill critical workforce gaps. Meeting this mission requires the efficient processing of millions of immigration-related applications.
The USCIS engineering team saw an opportunity to leverage data and analytics to automate certain processes and accelerate processing times. However, their on-premises technology stack, made up of legacy systems like Oracle, SAS and Informatica, proved to be highly complex to manage and was overwhelmed by the scale of data.
“We were using Informatica for ETL, and the pipeline was fairly brittle,” explained Shawn Benjamin, chief of data and business intelligence at USCIS. “As a result, we had a lengthy development cycle and longer workflows. It made it impossible to deliver relevant data in real time.” This level of performance was unacceptable as they supported over 2,300 data analysts and data scientists, who all tried to access data across dozens of different sources.
The data opportunities at USCIS were boundless. From a data science perspective, they were looking for opportunities to use predictive analytics to answer difficult questions, such as, Which applications are being submitted the most? How many applications can they expect in future years? And what is the probability of someone not showing to an appointment? They were also looking for ways to streamline processes by digitizing applications and using NLP to better evaluate interviews. The lack of a scalable data science platform kept their data scientists from being able to deliver on these use cases.
USCIS realized they needed to modernize their data infrastructure by migrating to the cloud and adopting a unified platform that could harness all of their data for easier access and ingestion, while enabling downstream data analytics and ML.
Removing complexities with a fully managed cloud platform
Databricks provided USCIS with significant impact where it mattered most — faster processing speeds that enabled data analysts to deliver timely reports to decision makers — and that freed up data scientists to build ML models to help improve operations. Leveraging the efficiencies of the cloud and Delta Lake, they were able to easily provision a 26-node cluster within minutes and ingest tables with 120 million rows into S3 in under 10 minutes. Prior to Databricks, performing the same processes on Informatica would have taken somewhere between two to three hours.
Databricks has also served as a transformation agent to their data warehousing strategy, leveraging Delta Lake to create a lakehouse that federates all their data regardless of where it’s stored for downstream consumption. In fact, they have been able to migrate 2,000 tables from Oracle to Databricks in less than a week.
Through the use of interactive notebooks, data scientists can easily collaborate with each other and across other data teams within the organization. “Notebooks allow multiple groups to work together from a single point. It eliminates having that swivel chair activity as people can work directly in the interface together.” And with MLflow, they are able to easily build multiple ML projects and experiments with ease.
A new era of data-driven innovation improves operations
Since migrating to the cloud and integrating Databricks into their data analytics workflows, USCIS is able to make smarter decisions that help streamline processes and leverage ML to reduce application processing times. These newfound efficiencies and capabilities have allowed them to scale their data footprint from about 30 data sources to 75 data sources without issue.
USCIS now has the ability to understand their data quicker, which has unlocked new opportunities for innovation. As an example, Benjamin cites that it used to take them a full business day to run a very complex query. With Databricks, they are able to run the same query in 19 minutes — a 24x performance gain. This meant that they were spending far less time troubleshooting and more time creating value.
Even processing speeds for Tableau dashboards saw a marked improvement, which was important as Databricks supports over 6,000 Tableau dashboards. Benjamin’s team noticed that running some of their dashboards used to take around 15 minutes — and sometimes the queries would fail altogether due to high data load. When they ran the same queries with Databricks, they were able to return the dashboards in under 15 seconds. Faster access to data insights means smarter decision-making in near real time.
Finally, the data science team is now able to leverage all of their data to help USCIS make more informed decisions and streamline operations. For example, they have implemented eProcessing, which has replaced paper applications with electronic applications, greatly improving operations and speeding up processes. Benjamin said, “Whether we are trying to predict the probability of a no-show to an appointment, streamline a manual process or perform sentiment analysis of survey data, the opportunities in front of us are now endless.”
By liberating their data and making it easy for anyone to leverage it, the agency has been able to increase their user base by 3x. With more data, more resources and more performance, the agency has since implemented many new programs, including Electronic Immigration System (ELIS), eProcessing of applications, operational and case status reporting, fraud detection, refugee asylum, international operations (RAIO), and forecasting. With Databricks serving as a crucial factor in enabling USCIS to extract data from anywhere and feed it to whoever needs it at any time, they are well positioned to continue to drive innovation and operational efficiency in order to facilitate lawful immigration to the United States.
*Source: AILA Policy Brief: USCIS Processing Delays Have Reached Crisis Levels Under the Trump Administration
The U.S. Citizenship and Immigration Services (USCIS) gains actionable insights from dashboards via Tableau to better understand how to streamline operations and process immigration and employment applications as well as petitions faster. Today, their data analyst team has over 6,000 Tableau dashboards running — all powered by Databricks.