The tools, processes, and communication styles of DevOps have improved the ability of software teams to create and deliver meaningful work over the past decade. In the meantime data teams have begun struggling with similar problems. I’m here to discuss how to reliably deliver data projects.
Along with the rise of Data Science as the “Sexiest Job Of The 21st Century” came the realization that roughly 80% of a data scientist’s time was being spent collecting, cleaning, and storing the information they needed to do their job. The emerging role of the Data Engineer was created to offload that work onto a separate team and reduce duplication of effort and inefficient use of time. This separation between the management and reliability of an organization’s data and the exploration and interpretation of that information has led to the same tensions that exist between developers and technical operations which we have been working to ease for the past decade.
As practitioners and teams concerned with the engineering and science of data have gained experience and maturity they have also begun re-learning the same lessons that the DevOps transformation has been imparting to the teams concerned with creation and delivery of software. Along the way, data engineers have begun building more processes and tools around automation, testing, monitoring, and alerting of the systems that they are responsible for.
There is a lot to be learned in both directions as data becomes increasingly critical in any successful software system and more complex systems are required to manage all of the moving parts. I’m here to discuss areas that data engineers and operations teams overlap at the technical and social level, the types of tools that can be adopted to improve effectiveness in both directions, and how we can extend the impact of DevOps transformations to more units in the business organization.
By the time you leave you will have a better appreciation of what data engineers do, that there are lots of lessons for us to teach them, and that there are lots of lessons for them to to teach us. This talk will also help data engineers to identify their blind spots and how they can address them.
I am the host of the Data Engineering Podcast and Podcast.__init__, and the manager of technical operations at MIT Open Learning, which has allowed me to talk about and understand the challenges facing data engineers, how they interact with data scientists, and how DevOps practices can be incorporated in the data domain.
Previous conference presentations that I have given are: - DevOps Days Boston 2017 - Open Sourcing Your Infrastructure: https://www.youtube.com/watch?v=VRSQV29VHuA&t=19s - Open edX Conference 2018 - Openly Deploying Open edX at MIT Open Learning: https://youtu.be/1ickRXh_WPg?t=1h26m27s