Maximilian Böther

Hey, nice to meet you! 👋

I am pursuing a Ph.D. in Computer Science at ETH Zurich Systems Group and the Efficient Architectures and Systems Lab (EASL), supervised by Ana Klimovic and Gustavo Alonso. My interests include data management for machine learning, machine learning pipelines and deployments, and data selection techniques.

I am currently working on Mixtera (Github), a lightweight data plane for LLM/VLM training, and on Modyn (Github), a platform for training machine learning models on datasets that grow over time. In all projects, we emphasize the importance of data for machine learning, and explore how we can build systems supporting data-centric ML.

I obtained B.Sc. and M.Sc. degrees in IT-Systems Engineering from Hasso Plattner Institute, Potsdam, Germany in 2020 and 2022. I published several papers at renowned venues, e.g., SIGMOD, VLDB, MLSys, and ICLR, have received a Best Paper Award at GECCO’21, and interned at Google on the CoreML team. Please find my CV here.

news

Mar 19, 2025	I have received the ML and Systems Rising Star Award 2025. Thank you so much!
Mar 4, 2025	I presented Mixtera and Modyn at BTW’25. Thank you for the great discussions!
Mar 1, 2025	We just released a preprint on Mixtera, our data plane for foundation model training. If you are training LLMs or VLMs, and are looking for infrastructure for data loading and mixing, please feel free to reach out!
Feb 10, 2025	Our paper on distributed submodular subset selection–a result from my Google internship–has been accepted to MLSys’25. See you soon in Santa Clara!
Oct 31, 2024	Our paper on Modyn has been accepted to SIGMOD’25 in Berlin!
Oct 4, 2024	We organized the Systems for Cost-Efficient AI Track at the AI+X Summit in Zurich.
Sep 30, 2024	Our vision paper on Mixtera, our lightweight data lake for LLM training, has been accepted to HotInfra’24 at SOSP. See you in Austin, TX!
Aug 2, 2024	Happy to have attended the Dagstuhl Seminar 24311: Resource-Efficient Machine Learning.
Jun 17, 2024	I will talk about Modyn at the Data-centric Machine Learning (DML) workshop at ICLR’24. See you in Vienna!
Feb 26, 2024	We just released a preprint of our paper on scaling out practical subset selection using submodular functions. This paper is a result of my internship at Google.
Jun 17, 2023	Our paper on analyzing vectorized hash tables across CPU architectures just got accepted at VLDB’23 in Vancouver!
Jun 5, 2023	I joined Google for a summer research internship in Sunnyvale, California, USA! I am working on scaling out submodular data subset selection.
Apr 11, 2023	Our work-in-progress workshop paper on Modyn, our research platform for model training on dynamic datasets, has been accepted at EuroMLSys’23 in Rome!
Nov 1, 2022	I joined the ETH Zurich Systems Group and the Efficient Architectures and Systems Lab (EASL) to do a Ph.D. in Machine Learning Systems, supervised by Professor Ana Klimovic. Looking forward to the new adventures in Switzerland!
Oct 15, 2022	Our paper on efficiently computing directed minimum spanning trees (arboresence) has been accepted for publication at ALENEX 2023. Check out the final version here.
Jun 6, 2022	Our Law Smells paper, which applies concepts of software engineering to the law, has been published in AI&Law. Check out the final version here.
Jan 24, 2022	Our paper on deep learning for combinatorial optimization just got accepted at ICLR! Check out the final version here.
Dec 27, 2021	This website just went online!

selected publications

arXiv

Mixtera: A Data Plane for Foundation Model Training

Böther, Maximilian, Yao, Xiaozhe, Kerimoglu, Tolga and 3 more authors

2025

Bib HTML PDF

@article{Bother2025Mixtera,
  author = {B{\"{o}}ther, Maximilian and Yao, Xiaozhe and Kerimoglu, Tolga and Graur, Dan and Gsteiger, Viktor and Klimovic, Ana},
  title = {Mixtera: A Data Plane for Foundation Model Training},
  year = {2025},
  eprinttype = {arXiv},
  eprint = {2502.19790},
  preprint = {true}
}

MLSys

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Böther, Maximilian, Sebastian, Abraham, Awasthi, Pranjal and 2 more authors

In Proceedings of the Conference on Machine Learning and Systems (MLSys) 2025

Bib HTML PDF

@inproceedings{Bother2025Submod,
  author = {B{\"{o}}ther, Maximilian and Sebastian, Abraham and Awasthi, Pranjal and Klimovic, Ana and Ramalingam, Srikumar},
  title = {On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions},
  booktitle = {Proceedings of the Conference on Machine Learning and Systems ({MLSys})},
  year = {2025},
}

SIGMOD

Modyn: Data-Centric Machine Learning Pipeline Orchestration

Böther, Maximilian, Robroek, Ties, Gsteiger, Viktor and 3 more authors

In Proceedings of the Conference on Management of Data (SIGMOD) 2025

Bib HTML PDF

@inproceedings{Bother2025Modyn,
  author = {B\"{o}ther, Maximilian and Robroek, Ties and Gsteiger, Viktor and Ma, Xianzhe and T\"{o}z\"{u}n, P{\i}nar and Klimovic, Ana},
  title = {Modyn: Data-Centric Machine Learning Pipeline Orchestration},
  booktitle = {Proceedings of the Conference on Management of Data (SIGMOD)},
  year = {2025},
  doi = {10.1145/3709705},
  url = {https://dl.acm.org/doi/10.1145/3709705},
}

VLDB

Analyzing Vectorized Hash Maps Across CPU Architectures

Böther, Maximilian, Benson, Lawrence, Klimovic, Ana and 1 more author

Proceedings of the VLDB Endowment 2023

Bib HTML PDF

@article{Bother2023Hashmaps,
  author = {B\"{o}ther, Maximilian and Benson, Lawrence and Klimovic, Ana and Rabl, Tilmann},
  title = {Analyzing Vectorized Hash Maps Across CPU Architectures},
  journal = {Proceedings of the {VLDB} Endowment},
  pages = {2755 - 2768},
  number = {11},
  volume = {16},
  year = {2023},
  doi = {10.14778/3611479.3611485},
  url = {https://dl.acm.org/doi/10.14778/3611479.3611485},
}