Big Data at the Intersection of Typed FP and Category Theory

By Long Cao

Elevator Pitch

Features in modern big data frameworks like Apache Spark are related to concepts in typed functional programming and even in category theory. In this talk, we’ll discuss practical ways to apply these foundational ideas in building better data pipelines at scale.


Big data, functional programming, and category theory aren’t just three trendy topics smashed into a talk title as bait!

Foundational ideas from typed functional programming and category theory have real and practical applications for working with big data and can also be utilized to write more principled pipelines at scale. Whether it’s aggregating with monoids or writing more typesafe Spark jobs, we’ll try and bridge these topics together in a way that can be immediately useful. Some knowledge of Scala and a big data framework like Apache Hadoop, Spark, or Beam is suggested but not necessary.


I do this stuff at work with FP/Scala/Spark.

I’d like to do a talk where I try and connect the dots between big data, typed functional programming, and a little bit of category theory to show attendees that these fields are related and that some concepts have always been there. This is intended for advanced beginners to intermediates and is especially geared towards data engineers or those who work with Scala (maybe via Spark) as a bridge to typed FP and the like. Some knowledge of Scala and a big data framework is suggested but not required.