AWS' iceberg announcement 1 week on, Pyramid raises 50m, Orchestra RBAC 💪Roundup #61 w/e 06 Dec 2024
Join the 4,700-strong data herd getting all you need to know about Data for your Friday roundup
If you found this update useful, don’t forget to subscribe! If not, let us know how we can improve – we’re all about keeping it relevant and avoiding the generic.
Looking to stay ahead of the curve? Click subscribe below for more updates.
This was a very large week, Pyramid Analytics raised 50m and AWS announced support for S3, prompting a huge range of discussions around the data community. Check out more below.
Or read on Medium https://medium.com/@hugolu87
Orchestra Product Updates - Granular Role-based access control and Azure SSO
It has now never been easier for enterprises to try Orchestra who live in Azure
Role Based access Control (!!!!!!!!)
One consistent thing we hear from data teams is that they struggle to manage the enormous and vast number of data pipelines being demanded from them by the business
At scale, Interfaces like Airflow which lack Role-based access Control mean the UI is intimidating, and no-one can see what they need to see
Orchestra now supports USERS, GROUPS, and ROLES across PRODUCTs, PIPELINES, INTEGRATIONS and a few other things.
This means you can minimise the amount of things people see, which makes maintaining everything a lot more manageable
It means you can set-up DOMAIN or MAINTAINER roles who just look after “their little bit” of the pipeline
It means you can MONITOR everything from one place while ensuring people can’t access stuff they’re not meant to. All to give YOU back time to focus on buiding instead of worrying about who has access to what.
Winter Data Conference
Excited to share that anyone using our special code HUGO50 can get a 50% discount to the Winter Data Conference in Zell Am See - check it out here.
Meme Drop
When you forget to do your job and instead spend your quarter maintaining boilerplate infra
Medium 🧠
🧠 The Future of Data Pipelines: Trends in Scalable Data Architecture (link)
🧠 Let’s never use the phrase Data Observability Ever Again (link)
🧠 Bridging the Data Literacy Gap (link)
🧠Content Drive — How we organize and share billions of files in Netflix studio (link)
🧠 Catching an Edge — December (link)
🧠 dbt Cloud vs. Orchestra (link)
LinkedIn🕴
🕴Need to extract data from Documents(PDF, JPG, PNG & etc.)? (link)
🕴 Data Day Texas is coming up. Get 20% off your MF’ing tickets using the discount code MFJOEREIS (link)
News 📰
**Editors Pick**
📰 Largest Mergers and Acquisitions (M&A) Deals Data (Link)
📰 Intel’s CEO steps down (link)
📰 Pyramid raises $50m from Blackrock (link)
**Editors Pick**
S3 announcements and hot takes
So I really enjoyed Daniel Beach’s take (link)
Official announcement from AWS (link)
Roy did some great hands-on examples (link)
YouTube and Podcast 🎥
**Editors Pick**
🎥 Whiteboard Overview - Views, Materialized Views and Dremio's Reflections (link)
🎥 An introduction to symmetry in TLA+ (Link)
🎥 Prevent Incomplete data in your datasets using dbt
🎥 How to leverage dlt and Orchestra to reduce ELT costs by 90% (link)
Special 💫
💫 Someone made a game about data engineering it’s actually quite cool (link)
💫 Staff vs engineering manager (link)
💫 Follow data Jesus on Bluesky : https://bsky.app/profile/datajesus.bsky.social
Jobs 💼
💼 Analytics Engineer at Loop returns (link)
💼 Nestle are hiring for Data Engineers (link)
💼 Senior Product Analyst at ProductBoard
💼 Very exciting data engineering role at Landmarc (recommended, (link))
💼 Interested in Data Engineering for one of the best charitable orgs in the UK? Follow Enthuse or get in touch (link)
💼 Interested in building the future of Data in VC at Dawn? Get in touch to learn more about this one.
💼 Some great data roles around platform and architecture at Lundbeck Pharma (link)
Want to get visibility into dbt Core™️ ? You’ll love this
💡 Read more about the Orchestra dbt Core™️ integration here
dbt Core obviously needs to run in an orchestrator - if you’re not doing this already, what are you doing? Many Data Teams are realising that their 99% uptime isn’t actually enough to get stakeholders to trust the Data in BI Use-cases; uptime needs to be much higher, so that’s why you need orchestration and visibility of pipelines.
Orchestra supports running dbt, with some great features:
Enhanced debugging! Identify dbt model/test cost bottlenecks easily
Simplification! One less platform to manage; let Orchestra be your dbt™️ HQ
Price! Simple and lightweight usage-based pricing where the unit costs decrease as your models increase
Worth talking? Chat here.