Two weeks to Snowflake Summit, Sloth DB, dlthub pro launch #135 w/e 22 May 2026
Join the 8,500-strong data herd getting all you need to know about Data for your Friday roundup
Another mega week in data with dlthub launching a paid offering. We’re also continuing to demonstrate how easy it is to build pipelines, with Gary speedrunning an ELT pipeline in <5 minutes using Orchestra below.
We’re also had over 150 people come to the webinar the other day with AI Agents building Pipelines. Unreal scenes.
Announcement: Hugo Lu is in San Francisco
If you’re heading to OpenHouse or Snowflake Summit, Drop HL a message and he would love to grab coffee!
Or come meet us at events:
Lightdash demo night on 3 June —> https://luma.com/gl94l89s
Low Key Happy Hour 26 May —> https://luma.com/gq4bcwrl
The Session on 2 June —> https://luma.com/sj56fmk1
Announcement: Orchestra SAO is GA
SAO is now enabled for all accounts. This means you can use it irrespective of if you are an enterprise, scale-up, or even a free tier customer.
dbt just got easier.
SAO is an Orchestra-managed python package built for dbt Core™ that allows users to store and use the state of models. This helps to orchestrate dbt Core™ more intelligently and only run models when certain conditions are met.
With Orchestra SAO, Orchestra users can:
Save cost and speed-up runs: the state of dbt models is stored using
source_freshnesschecks which means models are skipped if there is no new data and no code has changedStop wasting time tagging models: adding
build_afterconfigurations allow for declarative scheduling. There is no need to manual taggingMove to real-time: dbt state-aware can run every 10 or 5 minutes, enabling close-to-realtime patterns with leading ELT tools like Estuary
Users of dbt do not need to upgrade to a proprietary engine. Orchestra leverages the Sao Paolo repo. Simply enable Orchestra SAO with one line of code:
tasks:
execute_dbt:
integration: DBT_CORE
integration_job: DBT_CORE_EXECUTE
parameters:
commands: dbt build
package_manager: PIP
python_version: ‘3.12’
use_state_orchestration: true // <-- this line
.
.
.Some of the biggest and most advanced data teams are already leveraging State-Aware Orchestra in Orchesta. Experian use Orchestra State-Aware Orchestration to power dozens of analysts across multiple domains, to easily build SQL models powering marketing analytics.
Orchestra Product Features
NEW We run
dbt debugwhen validating an orchestra dbt connection now which is pretty great!!NEW Custom Webhook Alerts (send custom webhook payloads to any monitoring tool e.g. Datadog)
NEW Postgres Sensor enabled (trigger a pipeline whenever something happens in Postgres)
NEW trigger on any condition (link)
Trigger an Orchestra Pipeline as a task (link)
Scaling sequential Tasks in the MetaEngine - one config, infinite possibilities. Metadata-driven frameworks in ADF are a thing of the past!
Large Task Outputs - you can now pass data between tasks up to many Megabytes
Force Cancel - you can force cancel any task, pushing it into a cancelled state
Omni Integration
Pipeline run concurrency is live! Read more here
Monitor your Estuary Syncs from Orchestra (read more here)
MCP Video is live. This is the only MCP you need or API you need to reliably fix your data pipelines automatically with AI.
You can now run claude agents in Orchestra. These can automate anything, especially things you can design as code submission tasks.
AI Assistant is in Orchestra too - ask our assistant questions without having to leave to the docs!
Released our EKS operator - you can now trigger and monitor flows on Kubernetes in AWS
UI RESKIN! DARK MODE AND LIGHT MODE! TELL YOUR FRIENDS!
Improved Docs Site — CHECK OUT OUR DOCS AND ASK IT QUESTIONS
Workspaces - say goodbye to multiple airflow instances
Improved Lineage Filtering
Medium 🧠
🧠 Build ML Models Anywhere, Run Inference in Snowflake (link)
🧠 Real-Time Streaming from Amazon Kinesis to Snowflake with Openflow (link)
🧠 The Evolution of Cassandra Data Movement at Netflix (link)
🧠 Hybrid AI: Combining Deterministic Analytics with LLM Reasoning (link)
🧠 Escaping the Valley of Choice in BI | How Agentic BI will kill Data Analysts (link)
🧠 Why Kubernetes Became Inevitable (link)
🧠 Building an AI-Native ELT Pipeline with MotherDuck, Orchestra and Claude (link)
🧠 Which Fields Actually Need to Move? A Field-Level Migration Strategy for Cloud Data Mesh (link)
🧠 The Hidden Bottleneck in Quantum Machine Learning: Getting Data into a Quantum Computer (link)
LinkedIn 🕴
🕴 The Future of Work Belongs to Builders, Not Bureaucrats (link)
🕴 Why Snowflake Cortex Code Is a Massive Productivity Upgrade for Engineers (link)
🕴 Parallel AI Research Is Here: Inside Snowflake Intelligence’s Agent Swarms (link)
🕴 Everything You Need to Know About Dataform in 15 Minutes (link)
🕴 How Data Engineers Use Dataform Hooks to Automate Smarter Pipelines (link)
🕴 Snowflake Summit Buzz: Autonomous & Self-Healing Data Systems Are Here (link)
🕴 200+ Engineers Signed Up to Watch AI Agents Deploy Data Pipelines Live (link)
🕴 From Idea to Production in Minutes: Streamlit + Cortex Code Inside Snowflake (link)
🕴 Why the Best Networking at Snowflake Summit Happens After Hours (link)
News 📰
📰 Snowflake Offers Agencies Discounts for Data Tools Under OneGov Agreement With GSA… Read More
📰 Databricks elevates Adastra to Gold level in its Partner Program… Read More
📰 AI and Data Center Startups Receive Billions in Funding… Read More
📰 dlthub pro launch (link)
📰 v4c raises Series A with backing from Databricks (link)
YouTube and Podcast 🎥
Editor’s Pick
🎥 Most Teams Choose the Wrong Data Architecture in Microsoft Fabric (link)
🎥 Governance made 100x easier with the Fabric Core MCP Server (Installation guide) (link)
🎥 Claude codes build entire ELT Pipeline on Motherduck and Orchestra | Agentic Data Engineering (link)
🎥 Bun Drops Zig, Claude Raises Prices & Is Vibe Coding Killing Open Source? (link)
🎥 Your Obsidian Vault Can Now Run SQL (and Your Agent Can Read It) (link)
🎥 Querying Data in Dremio using OpenCode Coding Agent (link)
🎥 Using the Google Antigravity IDE for Agentic Analytics with Dremio (link)
Special 💫
💫 Data Engineering Weekly #270 (link)
💫 Introducing Dimster, a performance benchmarking tool for Apache Kafka (link)
💫 Benchmarking Apache Kafka Consumer Groups vs Share Groups (overhead test) (link)
💫 Check-out SlothDB (link), an experimental SQL Engine built on duck db
Jobs 💼
💼 Senior Data Engineer (dbt) at DriveTime (link)
💼 Lead Data Engineer at iAdeptive (link)
💼 Power BI Engineering Lead at Entegrata (link)
Run dbt models cheaply and easily with state?
If you’re looking for an easy way to run your dbt core models, look no further than Orchestra. Orchestra gives you state-aware orchestration out the box, which can reduce your dbt core costs, and make scheduling much easier!
dbt, dbt core and dbt labs are all trademarks of dbt labs inc

