Upstage's NLP Research Recognized at Prestigious EMNLP 2023 Conference
2023/10/23
TWO PAPERS ACCEPTED AT EMNLP 2023, THE HIGHEST AUTHORITY IN UPSTAGE NATURAL LANGUAGE PROCESSING
LAST JUNE, ICML 2023-DMLR ADOPTED 7 PAPERS, THE LARGEST NUMBER AMONG DOMESTIC COMPANIES, FOLLOWED BY A SUCCESSION OF WORLD-CLASS CONFERENCES.
IN JUST 3 YEARS SINCE ESTABLISHMENT, 100 PAPERS IN THE FIELD OF ARTIFICIAL INTELLIGENCE HAVE BEEN PUBLISHED AT HOME AND ABROAD AND PAPERS ACCEPTED INTO THE TOP 7 NLP CONFERENCES.
(Upstage=2023/10/23) Upstage has demonstrated its world-class research capabilities in natural language processing.
Upstage announced its global leadership in AI technology by presenting two papers at EMNLP (Empirical Methods in Natural Language Processing) 2023, the most esteemed academic conference in the field of natural language processing.
EMNLP is a premier academic conference focusing on research related to natural language processing approaches, including AI translation, chatbots, and machine reading comprehension, all based on linguistic data.
Last year's conference received a total of 3,242 papers, with only 715 accepted, resulting in an acceptance rate of 22%. EMNLP 2023, to be held in Singapore from December 6 to December 10, will feature participation from leading AI companies such as Google, Apple, Amazon, and Baidu.
The two accepted papers are outcomes of NLP research related to the Korean language, conducted in collaboration with Professor Lim Hee-seok's research team at Korea University, under the leadership of Chanjun Park, Upstage's Tech Lead.
The first paper, 'KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing,' introduces a new benchmark dataset for post-processing in Korean speech recognition. It proposes a methodology to evaluate and identify weaknesses in speech recognition models comprehensively.
This paper points out the problems with traditional evaluation methods that do not provide accurate information about the weaknesses of speech recognition models by considering both voice and text level aspects, and improves the explainability of the model by considering voice and text level errors comprehensively. It's research.
Identifying issues with traditional evaluation methods that lack precision in revealing model weaknesses, this research enhances model explainability by considering both voice and text-level errors. The proposed evaluation method, involving 37 voice-level and 13 text-level error types, was applied to commercialized voice recognition systems like Google Cloud and CLOVA.
The second paper, 'CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients,' suggests a novel data augmentation technique that capitalizes on the unique characteristics of the Korean language.
Unlike English, Korean consists of small units called morphemes, and sentence meanings vary based on morpheme combinations. This paper addresses blind spots in data collection, ensuring that the meaning of a sentence is not altered or rendered unnatural during augmentation.
Through CHEF, a data augmentation methodology grounded in Korean language characteristics, the paper proposes a technique to generate natural sentences by modifying Korean morpheme combinations using a generative language model.
Upstage's performance at EMNLP 2023 marks another success at global academic conferences. In June, Upstage led domestic companies by presenting seven papers at ICML 2023-DMLR, the most prestigious workshop in Data-Centric AI, achieving significant research results within three years of its founding.
Furthermore, Upstage accomplished the remarkable feat of publishing 100 AI papers domestically and internationally and having papers accepted at all of the top seven conferences in the NLP field, according to Google Scholar rankings.
Google Scholar Ranking, an authoritative indicator, assesses papers based on citations, measuring the influence of academic societies. The top seven conferences in the NLP field include ACL, EMNLP, NAACL, TACL, COLING, LREC, and WMT, with Upstage achieving thesis results in all conferences except TACL, classified as a dual journal.
Sung Ki, CEO of Upstage, said, "We are very pleased to contribute research results at various global academic conferences, including EMNLP 2023." He added, "Upstage will continue investing in continuous R&D to make the highest-performing AI more accessible to everyone. We are committed to making it happen."
-
Geun-Kyo Kim | Brand Communication General Director | keunkyo@upstage.ai
Seongbeom Bae | Brand Communication Manager | sungbae@upstage.aiDownload press release
-
Upstage is a leading domestic AI startup established in October 2020. Upstage stands out in the large language model (LLM) industry by taking first place on the Hugging Face leaderboard with performance exceeding ChatGPT's benchmark score for the first time in OpenLLM history. Based on these technologies, we present a reliable private LLM standard that maximizes data security and solves hallucination, helping companies conveniently use cutting-edge technology. In addition, Upstage's Chat AI 'AskUp' has over 1.4 million users, establishing itself as the largest AI service in Korea. Document AI Pack, another Upstage representative solution, utilizes AI OCR technology that has won the world's most prestigious OCR competition to automate documents by increasing efficiency and accuracy. By optimizing document processing through a pre-trained model with minimal data, cost and time are dramatically minimized compared to manual methods. Lastly, through the education program 'EduStage', we are also actively engaged in the educational content business that fosters differentiated professional talent who can be immediately put into AI business through hands-on education that incorporates AI business experience and solid AI basic education.
Upstage is comprised of members from global big tech companies such as Google, Apple, Amazon, NVIDIA, Meta, and Naver, and has participated in many world-renowned AI academic societies such as NeurlPS, ICLR, CVPR, ECCV, WWW, CHI, WSDM, and DMLR. We are solidifying our unrivaled leadership in AI technology by publishing excellent papers and becoming the only domestic company to win double-digit gold medals in the online AI competition Kaggle. While working as a professor at the Hong Kong University of Science and Technology, Upstage CEO Kim Seong-hoon won the ACM Sigsoft Distinguished Paper Award, the best paper award, four times for his research on bug prediction and automatic source code generation that combined software engineering and machine learning, and won 10 awards at the International Conference on Software Maintenance. He is considered a world-class AI guru who received the most influential paper award in 2018, and is also widely known as an instructor of 'Deep Learning for Everyone' with a total of more than 7 million views. Additionally, Upstage's co-founders include CTO Lee Tal-seok, who led Naver Visual AI/OCR and achieved world-class results, and CSO Park Eun-jung, who led the model team of Papago, the world's best translator.