Thousands of data engineers, scientists, analysts, and leaders joined this year’s Data + AI Summit 2023 by Databricks to learn more about data, analytics, and AI. They also got to hear about the latest open-source technologies, real-world case studies, and best practices that deliver mission-critical data, analytics, and AI inside transformative organizations.
Our Principal Consultant, Martin Dideriksen, joined the virtual experience and has gathered some of his favorite insights you can read about in this article:
“I streamed every possible session from the “Data + AI Summit,” where I immersed myself in many enlightening sessions.
I am truly impressed with the innovations unveiled by both vendors.”
Let’s start with Databricks. The introduction of Delta 3.0 with UniForm opens up a world of possibilities. It allows us to seamlessly work with the Delta Format while supporting tools that rely on the Iceberg format, enabling us to read the same files effortlessly.
But that’s not all. Databricks has made huge advancements in Workflows. Adding serverless computing and enhanced control flow for jobs empowers us to create fully parameterized, dynamically executed, and modular DAGs for heightened efficiency.
And there’s exciting news in the Databricks Marketplace as well. The upcoming features include AI Models in Databricks Marketplace and introducing a Private exchange.
For a long time, Snowflake has been championing Data Cleanrooms, and now, Databricks has embraced the concept too. With Cleanrooms, participants can securely share and join existing data, running complex workloads in any language while maintaining data privacy – no need to replicate the data with delta sharing. Cleanrooms on the Databricks are scalable to multiple participants across clouds and regions.
Regarding data sharing, Databricks now allows sharing of Notebooks, enabling recipients to consume and clone shared notebooks effortlessly. View Sharing allows sharing of logical views without replicating the data, catering to data curation and access control. Additionally, sharing of schemas eliminates the need to share individual objects, with future objects within the schema automatically shared.
Lakehouse Federation is a game-changer, especially for larger companies. With Lakehouse Federation in Unity Catalog, users can seamlessly discover, query, and govern data from various sources such as BigQuery, MySQL, Postgres, Redshift, Snowflake, SQL Server, Synapse, and more. This unified and simplified experience significantly speeds up ad-hoc analysis and prototyping for data, analytics, and AI use cases.
Materialized Views for Data Warehouses are another major leap forward. Materialized views reduce costs and improve query latency by pre-computing slow queries and frequently used computations.
I am thrilled about the extensive security and governance announcements as well. Enhanced access control for rows and columns, along with “Tags and Classification,” provides valuable contextual insights about the data. This enables users to jumpstart their work and accelerate analytics and AI initiatives. Data assets can easily be described and tagged, improving understanding, gaining insights into asset popularity, identifying domain experts, and facilitating data enrichment.
“I streamed every possible session from the “Data + AI Summit,” where I immersed myself in many enlightening sessions.
I am truly impressed with the innovations unveiled by both vendors. There are numerous other remarkable announcements I haven’t covered here. Still, I highly recommend watching the Keynotes from the Databricks Data + AI Summit to get an in-depth introduction to some major highlights.”
Martin Kjær Dideriksen, Principal Consultant at Intellishore