Human-centered Proposition for Structuring Data Construction
Recently, attention to data quality has been increasing. Although its importance is overlooked compared to the model, as the saying goes, “garbage in, garbage out”, if you do not input good data, the output of the ML system cannot be good. I can boldly say that the process of creating such important data is really outdated and loose.
Data construction is not a process that must be endured as a preceding task of model development, but a process in which the expertise of many people (annotator, data manager, PM, ML engineer) involved in the creation process is integrated into the complex, and it is a research in itself. It's a field well worth doing.
Upstage's Data Managers Joo Hyun and Cheol Young, and UX Designer Inha put a lot of thought into making the OCR Pack and its included Annotation Tool. You might think of data as just simple, objective chunks of information, but data for AI involves countless human interventions in the process. Should systems be designed to reduce or enhance the impact of people in creating data? The research that started with the question was selected for the NeurIPS 2022 Workshop paper, one of the best AI societies.