Listen to the research selected for NeurIPS 2022 HCAI Workshop papers, one of the world's top AI societies, in more detail!
-
Inha Cha, Upstage AI Product UX Designer
Juhyun Oh, Upstage Data Manager
Cheolyoung Park, Upstage Data Manager
-
- People who understand the importance of data in building ai and want to make it better
- Those who are curious about the pipeline for building data
-
- We will tell you about the data construction pipeline, which steps to go through to create a dataset.
- We will inform you about the actual difficulties encountered in the process of creating data.
- You can think about what to do in the process of creating good data and know what you need to do well.
(The data covered in this upstage talk is unstructured data , and we are talking about unstructured data construction pipelines.)
Recently, attention to data quality is increasing. Although the importance is overlooked compared to the model, the output of the ML system cannot be good unless good data is put in, as in the saying, Garbage in, Garbage out. I dare to say that the process of creating such important data is really arduous and loose.
Data construction is not a process that must be endured as a preceding task of model development, but a process in which the expertise of many people (annotator, data manager, PM, ML engineer) involved in the creation process is integrated into the complex, and it is a research in itself. It's a field well worth doing.
Upstage's Data Managers Joo Hyun and Cheol Young, and UX Designer Inha put a lot of thought into making the OCR Pack and its included Annotation Tool. You might think of data as just simple, objective chunks of information, but data for AI involves countless human interventions in the process. Should systems be designed to reduce or enhance the impact of people in creating data? The research that started with the question was selected for the NeurIPS 2022 Workshop paper, one of the best AI societies.
Who creates good data, and how can we make it well?
To answer the question from the introduction, good data is created by knowing the role and influence of various people on data, and how to structure this part in the system that creates ai.
In this upstage talk, we talk about the various roles and influences of people in the process of creating data, and how data pipelines are structured.
🙋🏻♀️ Are you having trouble attending the live lecture? Don't worry!
We will send a 'replay' link to those who have registered, so register now!