Human-centered Proposition for Structuring Data Construction
-
- A PERSON WHO UNDERSTANDS THE IMPORTANCE OF DATA IN BUILDING AI AND WANTS TO MAKE IT BETTER
- Those who are curious about the pipeline for building data
Recently, attention to data quality is increasing. Although the importance is overlooked compared to the model, the output of the ML system cannot be good unless good data is put in, as in the saying, Garbage in, Garbage out. I dare to say that the process of creating such important data is really arduous and loose.
Data construction is not a process that must be endured as a preceding task of model development, but a process in which the expertise of many people (annotator, data manager, PM, ML engineer) involved in the creation process is integrated into the complex, and it is a research in itself. It's a field well worth doing.
Upstage's Data Managers Joo Hyun and Cheol Young, and UX Designer Inha put a lot of thought into making the OCR Pack and its included Annotation Tool. You might think of data as just simple, objective chunks of information, but data for AI involves countless human interventions in the process. Should systems be designed to reduce or enhance the impact of people in creating data? The research that started with the question was selected for the NeurIPS 2022 Workshop paper, one of the best AI societies.