I am very proud to be one of the top contributors* of the Aya, the world’s largest open-source, multilingual dataset collection** and a SOTA, multilingual first LLM (Cohere For AI)

🙏 Special thanks to İrem Ergün for inspiring our society at Bilkent Yapay Zeka Topluluğu about the Aya project. My deepest appreciation also goes to project head Sara Hooker and to Ahmet Üstün, Shivalika Singh, Sarah J., and Madeline S., along with all other team members and the 3,000+ contributors for their outstanding efforts.

🤩 It’s incredible to be part of a project with over 3000 contributors worldwide, all thanks to Cohere For AI team and @’s leadership. Truly inspiring!

(*) #1 contributor in Turkish and #16 Global out of 3000+ contributors to Aya dataset

⭐ Aya Project comprises a SOTA open-source, multilingual model and the world’s largest multilingual dataset collection.

📑 Aya is intentionally fully multilingual first – Most generative AI models are designed to be English first, and many global communities have been left unsupported due to the language limitations of existing models.

🤖 Large Language Model: The Aya model is state-of-the-art and outperforms existing massively multilingual open-source models – The Aya model covers 101 different languages, far ahead (2x) of open-source baselines (mt0, bloomz). It outperforms the best multilingual models in benchmark tests while extending language coverage. (Licence: Apache 2.0)

(**) 🗃️ Dataset: In addition to the Aya model, we are releasing the Aya collection – the largest collection of multilingual instruction fine tuned dataset to date, with 513 million prompts and completions covering 114 languages. We fully open-source the collection, which includes rare human-curated annotations from fluent speakers worldwide.

🚀 Largest participatory research initiative to date, changing how breakthroughs happen – The research involved 3K independent collaborators across 119 countries, making this the largest open science project to date in the field of machine learning.

Aya will be made available to the research community to further advance this critical effort. The work provides AI researchers with a first-of-its-kind foundation to support multilingual open-source AI research projects.