Power to the SQL People: Introducing Python UDFs in Databricks SQLJuly 22, 2022 by Martin Grund, Herman van Hövell, Stefania Leone and Jakob Mund in Platform Blog We were thrilled to announce the preview for Python User-Defined Functions (UDFs) in Databricks SQL (DBSQL) at last month's Data and AI Summit...
Parallel ML: How Compass Built a Framework for Training Many Machine Learning Models on DatabricksJuly 20, 2022 by Marshall Carter and Sujoy Dutta in Engineering Blog This is a collaborative post from Databricks and Compass . We thank Sujoy Dutta, Senior Machine Learning Engineer at Compass, for his contributions...
Databricks SQL Highlights From Data & AI SummitJuly 20, 2022 by Shant Hovsepian, Miranda Luna, Cyrielle Simeone and Alex Lichen in Platform Blog Data warehouses are not keeping up with today's world: the explosion of languages other than SQL, unstructured data, machine learning, IoT and streaming...
Hunting for IOCs Without Knowing Table Names or Field LabelsJuly 15, 2022 by Monzy Merza and Lipyeow Lim in Security and Trust There is a breach! You are an infosec incident responder and you get called in to investigate. You show up and start asking...
Using Spark Structured Streaming to Scale Your AnalyticsJuly 14, 2022 by Spencer Elkington and Ben Tallman in Data Streaming This is a guest post from the M Science Data Science & Engineering Team. Modern data doesn't stop growing "Engineers are taught by...
Introducing Spark Connect - The Power of Apache Spark, EverywhereJuly 7, 2022 by Stefania Leone, Martin Grund, Herman van Hövell and Reynold Xin in Engineering Blog At last week's Data and AI Summit, we highlighted a new project called Spark Connect in the opening keynote. This blog post walks...
Designing a Java Connector for Delta Sharing RecipientJune 29, 2022 by Milos Colic and Vuong Nguyen in Engineering Blog Making an open data marketplace Stepping into this brave new digital world we are certain that data will be a central product for...
Introducing MLflow Pipelines with MLflow 2.0June 29, 2022 by Ahmed Bilal, Jin Zhang, Corey Zumar and Xiangrui Meng in Engineering Blog Since we launched MLflow in 2018, MLflow has become the most popular MLOps framework, with over 11M monthly downloads! Today, teams of all...
Connect From Anywhere to Databricks SQLJune 29, 2022 by Reynold Xin, Shant Hovsepian, Bilal Aslam, Tao Tao, Arik Fraimovich, Moe Derakhshani and Cyrielle Simeone in Engineering Blog Today we are thrilled to announce a full lineup of open source connectors for Go , Node.js , Python , as well as...
Project Lightspeed: Faster and Simpler Stream Processing With Apache SparkJune 28, 2022 by Karthik Ramasamy, Matei Zaharia, Reynold Xin, Michael Armbrust, Awez Syed, Ray Zhu, Alexander Balikov, Jerry Peng, Shrikanth Shankar and Sameer Paranjpye in Engineering Blog Streaming data is a critical area of computing today. It is the basis for making quick decisions on the enormous amounts of incoming...