Upstage's NLP Research Recognized at Prestigious EMNLP 2023 Conference
2023/10/23
TWO PAPERS ACCEPTED AT EMNLP 2023, THE HIGHEST AUTHORITY IN UPSTAGE NATURAL LANGUAGE PROCESSING
LAST JUNE, ICML 2023-DMLR ADOPTED 7 PAPERS, THE LARGEST NUMBER AMONG DOMESTIC COMPANIES, FOLLOWED BY A SUCCESSION OF WORLD-CLASS CONFERENCES.
IN JUST 3 YEARS SINCE ESTABLISHMENT, 100 PAPERS IN THE FIELD OF ARTIFICIAL INTELLIGENCE HAVE BEEN PUBLISHED AT HOME AND ABROAD AND PAPERS ACCEPTED INTO THE TOP 7 NLP CONFERENCES.
(Upstage=2023/10/23) Upstage has demonstrated its world-class research capabilities in natural language processing.
Upstage announced its global leadership in AI technology by presenting two papers at EMNLP (Empirical Methods in Natural Language Processing) 2023, the most esteemed academic conference in the field of natural language processing.
EMNLP is a premier academic conference focusing on research related to natural language processing approaches, including AI translation, chatbots, and machine reading comprehension, all based on linguistic data.
Last year's conference received a total of 3,242 papers, with only 715 accepted, resulting in an acceptance rate of 22%. EMNLP 2023, to be held in Singapore from December 6 to December 10, will feature participation from leading AI companies such as Google, Apple, Amazon, and Baidu.
The two accepted papers are outcomes of NLP research related to the Korean language, conducted in collaboration with Professor Lim Hee-seok's research team at Korea University, under the leadership of Chanjun Park, Upstage's Tech Lead.
The first paper, 'KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing,' introduces a new benchmark dataset for post-processing in Korean speech recognition. It proposes a methodology to evaluate and identify weaknesses in speech recognition models comprehensively.
This paper points out the problems with traditional evaluation methods that do not provide accurate information about the weaknesses of speech recognition models by considering both voice and text level aspects, and improves the explainability of the model by considering voice and text level errors comprehensively. It's research.
Identifying issues with traditional evaluation methods that lack precision in revealing model weaknesses, this research enhances model explainability by considering both voice and text-level errors. The proposed evaluation method, involving 37 voice-level and 13 text-level error types, was applied to commercialized voice recognition systems like Google Cloud and CLOVA.
The second paper, 'CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients,' suggests a novel data augmentation technique that capitalizes on the unique characteristics of the Korean language.
Unlike English, Korean consists of small units called morphemes, and sentence meanings vary based on morpheme combinations. This paper addresses blind spots in data collection, ensuring that the meaning of a sentence is not altered or rendered unnatural during augmentation.
Through CHEF, a data augmentation methodology grounded in Korean language characteristics, the paper proposes a technique to generate natural sentences by modifying Korean morpheme combinations using a generative language model.
Upstage's performance at EMNLP 2023 marks another success at global academic conferences. In June, Upstage led domestic companies by presenting seven papers at ICML 2023-DMLR, the most prestigious workshop in Data-Centric AI, achieving significant research results within three years of its founding.
Furthermore, Upstage accomplished the remarkable feat of publishing 100 AI papers domestically and internationally and having papers accepted at all of the top seven conferences in the NLP field, according to Google Scholar rankings.
Google Scholar Ranking, an authoritative indicator, assesses papers based on citations, measuring the influence of academic societies. The top seven conferences in the NLP field include ACL, EMNLP, NAACL, TACL, COLING, LREC, and WMT, with Upstage achieving thesis results in all conferences except TACL, classified as a dual journal.
Sung Ki, CEO of Upstage, said, "We are very pleased to contribute research results at various global academic conferences, including EMNLP 2023." He added, "Upstage will continue investing in continuous R&D to make the highest-performing AI more accessible to everyone. We are committed to making it happen."