9:15 – 9:45
Invited Talk
Pedro Pedreira
Software Engineer at Meta
Scaling AI Training: Storage Layout Challenges
Abstract and speaker bio
Abstract. Modern AI training pipelines demand efficient data loading and preparation; the storage layout layer plays an important role in the performance of both the read and write paths. This talk explores challenges in designing and evolving storage formats optimized for training workloads, discusses feature storage and processing, implications of wide tables, and challenges with normalization. It also presents Nimble, Meta's columnar file format, some of its main design decisions, features, and discusses future challenges and areas of exploration.
About the speaker. Pedro Pedreira is a Software Engineer at Meta, where he has spent over 13 years working on large-scale compute, storage, and query processing systems. Pedro leads Velox, a cross-organizational effort involving 20+ companies aimed at unifying execution engines using an open-source library, in addition to a variety of related efforts aimed at modernizing compute engines, more recently focused on AI training. Prior to Velox, Pedro led the creation of Cubrick and has contributed to a series of data open source infrastructure projects including Presto, Spark, Gluten, Nimble, Arrow, and others. His work focuses on building high-performance, composable execution libraries that serve as the foundation for analytical and AI workloads at scale.