OLMo: The Truly Open Source Large Language Model

OLMo, the Open Language Model, is a groundbreaking large language model open-sourced by the Allen Institute for AI. It offers complete openness, sharing training data, code, and evaluation tools. The Dolma dataset, with three trillion tokens, serves as the foundation for OLMo's pretraining. Researchers can replicate the training process and even train their own language models. AI2 plans to continue its open-source initiatives beyond OLMo, setting a precedent for future developments. OLMo represents a significant step towards complete openness in the field of large language models.