State Institute Unveils New Korean Language Datasets for AI Learning

(image: Korea Bizwire)

SEOUL, March 31 (Korea Bizwire) — The National Institute of the Korean Language said Tuesday that it had unveiled eight kinds of new Korean language datasets, essential learning data for artificial intelligence (AI) to improve Korean language processing capability through its website.

In August 2020, the institute released Korean language corpus for AI learning on a large scale (13 sets and 1.8 billion words) through its website.

Newly disclosed datasets include five kinds of new data and three kinds of modified data that were created by adding and modifying the previously-released data.

The newly disclosed datasets consist of the data of 4 million daily conversations and newspaper articles and the data of 9 million phrases.

The institute said that the newly-disclosed datasets would contribute to the development of a wide variety of AI services, including voice-based conversation systems, while being able to be used widely for research on the voice of the Korean language.

The newly-disclosed datasets also include the data created by adding and modifying the previously-released corpus that analyzed the grammatical relation of the words in newspaper articles and sentences.

J. S. Shin (js_shin@koreabizwire.com)

Related Posts

Leave a Reply Cancel reply