Building ASEAN’s Voice in AI: A Regional LLM for a Shared Future
Date:
2 April 2025Category:
OpinionsTopics:
TechnologyShare Article:
Print Article:
By Satria Mahesya Muhammad, Assistant, Research Activities: OpenAI’s launch of ChatGPT in 2022 ignited global interest in artificial intelligence (AI), driving rapid advancements in the development of Large Language Models (LLMs). Both private companies and also governments are now investing in LLMs to boost national competitiveness, ensure data sovereignty, and reduce reliance on foreign AI systems.
In Southeast Asia, AI adoption is gaining momentum. The region’s AI market is expected to grow at a compound annual growth rate (CAGR) of 27.71% between 2025 and 2030, reaching a projected market volume of US$30.3 billion. Despite this growth, a truly collaborative LLM that reflects Southeast Asia’s rich linguistic and cultural diversity has yet to be realised. AI Singapore’s Southeast Asian Languages in One Network (SEA-LION) represents a significant step towards filling this gap.
While SEA-LION benefits from open-source contributions, its funding and technical development are primarily centralised in Singapore. The ERIA One ASEAN Start-up White Paper 2024 highlights the country’s substantial investment in developing localised LLMs, reportedly totalling S$70 Million. In contrast, European multilingual LLMs are typically developed through a collective approach involving universities, private-sector partnerships, and funding from the European Union (EU). A truly regional multilingual LLM should be a product of collective collaboration – an LLM for ASEAN, by ASEAN – ensuring inclusive participation from all member states while reflecting shared linguistic and cultural diversity.
The Need for a Regional LLM
LLMs are advanced AI systems designed to understand and generate human-like responses. Built on neural network architectures, particularly transformers, they are trained using deep learning techniques on vast datasets that include publicly available websites, books, and code. This enables LLMs to understand natural language, recognise patterns, and generate coherent responses from simple sentences to more complex multimodal outputs, such as images, sound, and videos.
Many popular LLMs – such as OpenAI’s ChatGPT and Meta’s Llama – are primarily trained on internet data, where English accounts for 49.2% of the content. This linguistic imbalance disadvantages regions with underrepresented languages and cultures like ASEAN, home to 700 million people and 1,200 regional languages. Research has shown that ChatGPT, for instance, struggles with low-resource Southeast Asian languages, resulting in translation inaccuracies and difficulties distinguishing nuances between similar languages like Bahasa Indonesia and Melayu. Moreover, these models often struggle to generate accurate code-mixed text, which is common in countries like Indonesia, where English, Bahasa Indonesia, and local dialects are frequently blended in informal conversations.
The influence of language data on AI biases is also a pressing concern. A recent AI Safety report by Singapore’s Infocomm Media Development Authority (IMDA) notes that most AI applications are developed primarily for English speakers, despite being deployed globally. As a result, AI systems may struggle to accurately reflect the linguistic and contextual diversity of non-English-speaking regions. Of the 5,313 AI-generated responses analysed in the report, nearly half of those in English exhibited bias. Bias was even more prevalent in regional languages, with 2 out of 3 responses showing bias.
ASEAN's Path to AI Self-Reliance
To address these challenges, both multilingual and monolingual LLMs have been developed in ASEAN to better capture the region’s linguistic and cultural diversity. Notably, the multilingual SEA-LION model has been trained on 980 billion tokens, covering multiple Southeast Asian languages including English. Meanwhile, monolingual models such as Viet Nam’s PhoGPT and Indonesia’s Sahabat-AI were trained on 102 billion and 50 billion tokens, respectively. These models have been trained and fine-tuned on regional and local datasets, incorporating local linguistic nuances, slang, and cultural references that mainstream LLMs often overlook. While these LLMs are a significant step in localising AI technologies, the development remains fragmented due to overlapping data regulations, varying levels of AI preparedness amongst member states, and the absence of a unified regional AI strategy.
For ASEAN to create a more inclusive AI development landscape, a more collective approach is essential.
ASEAN AI Research Hub and Language Repository
While AISG has laid the groundwork for multilingual Southeast Asian LLMs, a broader ASEAN-driven effort is needed to expand data resources and develop truly representative LLMs. Member states could form a consortium, pooling human capital and financial resources to establish a shared AI research hub that facilitates high-quality multilingual data and cross-border collaboration.
A key component of this initiative would be the creation of an ASEAN Language Repository – a centralised, open-access platform where member states can contribute structured linguistic, domain-specific and multimodal datasets for LLM development and preserve their languages. This hub would also serve as a knowledge-sharing platform, fostering collaboration between governments, academia, and local communities.
Fostering Public–Private Partnerships Beyond ASEAN
Since much of the advanced AI expertise and technology originates from the private sector in developed economies, ASEAN must cultivate strong public–private partnerships. While collaborations with prominent Americancompanies like Nvidia, Microsoft, Google, and Amazon remain valuable, ASEAN should also engage with leading Asian IT firms such as Alibaba, Baidu, NEC, SoftBank, and Naver. Strengthening ties with countries like the United States, China, the Republic of Korea, and Japan – while integrating further with Asia’s startup ecosystem – will allow ASEAN to tap into diverse technological expertise, innovation, and substantial investments.
LLM development will continue to be a cornerstone of the region’s AI’s future. To protect its digital sovereignty and preserve its linguistic and cultural heritage, ASEAN must adopt a collective and collaborative approach to AI development. By pooling resources through a regional AI research hub and strengthening public–private partnerships with global tech leaders, ASEAN can build inclusive, high-quality AI products that accurately reflect its diverse languages and cultures. This unified effort will not only enhance technological self-reliance but also drive digital inclusion, ensuring that all member states equitably benefit from AI advancements in a sustainable manner.
This opinion piece was written by By Satria Mahesya Muhammad, Assistant, Research Activities, ERIA, and has been published in Borneo Bulletin, The Manila Times, and The Jakarta Post. Click here to subscribe to the monthly newsletter.
Disclaimer: The views expressed are purely those of the authors and may not in any circumstances be regarded as stating an official position of the Economic Research Institute for ASEAN and East Asia.