Check out ÜberWeb, where we present our insights on curating multilingual data at the 20 trillion token scale.