Analyzing Algorand Blockchain Data With Databricks Delta (Part 2)
This post was written in collaboration betweeen Eric Gieseke, principal software engineer at Algorand, and Anindita Mahapatra, solutions architect, Databricks. Algorand is a public, decentralized blockchain system that uses a proof of stake consensus protocol. It is fast and energy efficient, with a transaction commit time under five seconds and a throughput of one...
Accelerating ML Experimentation in MLflow
This fall, I interned with the ML team, which is responsible for building the tools and services that make it easy to do machine learning on Databricks. During my internship, I implemented several ease-of-use features in MLflow, an open-source machine learning lifecycle management project, and made enhancements to the Reproduce Run capability on the Databricks...
Automatically Evolve Your Nested Column Schema, Stream From a Delta Table Version, and Check Your Constraints
We recently announced the release of Delta Lake 0.8.0, which introduces schema evolution and performance improvements in merge and operational metrics in table history. The key features in this release are: Unlimited MATCHED and NOT MATCHED clauses for merge operations in Scala, Java, and Python. Merge operations now support any number of whenMatched and whenNotMatched...
Ray & MLflow: Taking Distributed Machine Learning Applications to Production
This is a guest blog from software engineers Amog Kamsetty and Archit Kulkarni of Anyscale and contributors to Ray.io In this blog post, we're announcing two new integrations with Ray and MLflow: Ray Tune+MLflow Tracking and Ray Serve+MLflow Models, which together make it much easier to build machine learning (ML) models and take them to...
Combining Rules-based and AI Models to Combat Financial Fraud
The financial services industry (FSI) is rushing towards transformational change, delivering transactional features and facilitating payments through new digital channels to remain competitive. Unfortunately, the speed and convenience that these capabilities afford also benefit fraudsters. Fraud in financial services still remains the number one threat to organizations’ bottom line given the record-high increase in overall...
Python Autocomplete Improvements for Databricks Notebooks
At Databricks, we strive to provide a world-class development experience for data scientists and engineers, and new features are constantly getting added to our notebooks to improve our users’ productivity. We are especially excited about the latest of these features, a new autocomplete experience for Python notebooks (powered by the Jedi library ) and new...
ACID Transactions on Data Lakes Tech Talks: Getting Started with Delta Lake
As part of our Data + AI Online Meetup, we’ve explored topics ranging from genomics (with guests from Regeneron) to machine learning pipelines and GPU-accelerated ML to Tableau performance optimization. One key topic area has been an exploration of the Lakehouse. The rise of the Lakehouse architectural pattern is built upon tech innovations enabling the...
Leveraging ESG Data to Operationalize Sustainability
The benefits of Environmental, Social and Governance (ESG) are well understood across the financial services industry. In our previous blog post, we demonstrated how asset managers can leverage data and AI to better optimize their portfolios and identify organizations that not only look good from an ESG perspective, but also do good — companies that...
Reputation Risk: Improving Business Competency and Nurturing Happy Customers by Building a Risk Analysis Engine
Why reputation risk matters? When it comes to the term "risk management", Financial Service Institutions (FSI) have seen guidance and frameworks around capital requirements from Basel standards. But, none of these guidelines mention reputation risk and for years organizations have lacked a clear way to manage and measure non-financial risks such as reputation risk. Given...
Announcing Single-Node Clusters on Databricks
Databricks is used by data teams to solve the world's toughest problems. This can involve running large-scale data processing jobs to extract, transform, and analyze data. However, it often also involves data analysis, data science, and machine learning at the scale of a single machine, for instance using libraries like scikit-learn. To streamline these single...