Study Finds Korean AI Models Trail Overseas Competitors by Wide Margin | Be Korea-savvy

Study Finds Korean AI Models Trail Overseas Competitors by Wide Margin


Domestic AI Models Score Far Lower Than U.S. and Global Peers in Key Benchmark (Yonhap)

Domestic AI Models Score Far Lower Than U.S. and Global Peers in Key Benchmark (Yonhap)

SEOUL, Dec. 15 (Korea Bizwire) — South Korean artificial intelligence teams competing to build a “national AI” model are lagging far behind global competitors in solving advanced mathematics problems, according to a new study released on Monday.

A research team led by Kim Jong-rak, a mathematics professor at Sogang University, evaluated five domestic large language models (LLMs) against five leading foreign systems — including GPT-5.1, Gemini 3 Pro Preview, Claude Opus 4.5, Grok 4.1 Fast and DeepSeek V3.2 — using 50 questions drawn from South Korea’s college entrance exam, university essay tests and graduate-level mathematics problems in Korea, India and Japan.

Foreign models scored between 76 and 92 points. Only one Korean model, Upstage’s Solar Pro-2, surpassed 50 points, earning 58. The remaining domestic systems — including LG’s Exaone 4.0.1, Naver’s HCX-007, SK Telecom’s A.X 4.0 (72B) and NCSoft’s lightweight Llama Barco 8B Instruct — posted scores in the 20-point range or lower. Llama Barco produced the weakest performance, scoring just 2 points.

The researchers allowed all domestic models to use Python-based tools to boost accuracy, acknowledging that most struggled to perform the reasoning steps required to solve mathematical questions. Even with this support, the gap remained wide.

A second test using the team’s newly developed EntropyMath dataset — designed to evaluate reasoning ability from undergraduate to faculty-level difficulty — produced similar results. Foreign models scored between 82.8 and 90 points, while Korean models ranged from 7.1 to 53.3.

When models were given up to three attempts per question, Grok earned a perfect score and other foreign models scored 90. Solar Pro-2 reached 70, Exaone 60, HCX-007 40, A.X 4.0 30 and Llama Barco 20.

Professor Kim said the experiment was prompted by repeated inquiries about the performance of domestic “sovereign AI” candidates on national exam problems. “The results clearly show how far Korean models lag behind frontier AI systems,” he said.

Industry officials cautioned that the comparison may not fully reflect each model’s capability, noting that several Korean LLMs do not yet include formal reasoning modes essential for mathematical tasks. They added that forthcoming sovereign AI foundation models — still under development — are expected to show improved performance.

The research team said it will retest newly released national AI models and expand its evaluation framework to create domain-specific datasets in science, manufacturing and culture. The project was supported by Sogang University’s Institute of Mathematical Data Science and DeepFountain.

Kevin Lee (kevinlee@koreabizwire.com)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>