# Large Multi-Language Models for News Translation
* In this repo you may find examples __how to fine-tune Large Language Models__ (LLM) and apply them to the real task of __news translation__.
* Also in this repo we provide __news parser__, so you can easily parse any news web page you want (for example CNN, BBC news) and test how pre-trained LLM would __translate parsed real news__.
# __1. Facebook: M2M100__
__Facebook: M2M100 (1.2b parameters)__ - is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks, covering 100 languages.
__All available languages:__ Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian , Bulgarian, Bengali, Breton, Bosnian, Catalan; Valencian, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, Western Frisian, Irish, Gaelic; Scottish Gaelic , Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian; Haitian Creole, Hungarian, Armenian, Indonesian , Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Central Khmer, Kannada, Korean , Luxembourgish; Letzeburgesch, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch; Flemish, Norwegian, Northern Sotho, Occitan (post 1500), Oriya, Panjabi; Punjabi, Polish, Pushto; Pashto, Portuguese, Romanian; Moldavian; Moldovan , Russian, Sindhi, Sinhala; Sinhalese, Slovak, Slovenian , Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
# __2. Google: mT5__
__Google: mT5 (1.2b parameters)__ - mT5 is pretrained on the mC4 corpus, covering 101 languages.
__All available languages:__ Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.