Developing service-oriented AI models Ep.4 Efficient AI teaming for AI model development
2022/03/04
⏱ 10mins
EP.4 BEGINS
안녕하세요. 드디어 총 4편의 걸친 <서비스 향 AI 모델 개발하기>의 마지막 에피소드까지 오셨습니다. 이번 마지막 에피소드에서는 앞서 소개한 하나의 AI 모델 개발을 하기 위한 수많은 개별 테스크를 담당하는 인력은 어떻게 구성되어 있는지 알아보고자 합니다. 💡성공적인 AI 모델 개발을 위해 효율적인 시너지를 낼 수 있는 팀 구성은 무엇인지, 그리고 테스크 분담은 어떻게 이뤄지는지까지 자세히 살펴보겠습니다.
EFFICIENT ORGANIZATION: AI MODEL TEAM
The AI model team seems familiar as it is the team mentioned a lot in the contents of the previous three episodes. The main role of the AI model team is to develop an AI model that meets customer service requirements and restrictions from the service planning team. So, who makes up the AI model team?
Let's explain the AI model team by dividing it into four roles (modeler, data curator, IDE developer, model quality manager). Please note that there are differences in how each team is organized by each company.
1. Modeler
MODELERS ARE RESPONSIBLE FOR DEVELOPING AI MODELS THAT MEET SERVICE REQUIREMENTS. THEY WILL PROPOSE AND DEVELOP AN AI MODEL STRUCTURE THAT MEETS THE CUSTOMER'S SERVICE REQUIREMENTS. THEREFORE, IT CAN BE SEEN AS A MEMBER WHO ANALYZES DATA AND MODELS AND IS CONSTANTLY THINKING ABOUT VARIOUS WAYS TO OPTIMIZE MODEL PERFORMANCE.
2. Data Curator
Data Curators are responsible for various tasks within the team. You can think of it as doing three main tasks.
✔️ First: Prepare the dataset needed to train and evaluate the model.
PREPARING THE TRAINING DATASET IS A STEP THAT REQUIRES CAREFUL DECISION MAKING, AS IT TAKES A LOT OF TIME AND RESOURCES, AS EXPLAINED EARLIER. (REFER TO EP.2) BECAUSE COMMUNICATION SKILLS AND CONSIDERABLE KNOW-HOW ARE REQUIRED TO CARRY OUT THIS TASK, IT IS BETTER TO HAVE A SEPARATE DATASET PREPARATION PERSON SUCH AS A DATA CURATOR FOR THE EFFICIENCY AND PERFORMANCE OF THE TEAM. THE DATA CURATOR COMMUNICATES WITH THE SERVICE PLANNING TEAM AND MODELER, DEFINES THE TYPE, QUANTITY, AND CORRECT ANSWER OF THE DATA SET, AND SECURES THE APPROPRIATE DATA SET.
✔️ Second: Create Data Annotation Guideline
AFTER DEFINING THE TRAINING DATASET, I EXPLAINED THAT MANY COMPANIES REQUEST DATA PRODUCTION THROUGH OUTSOURCED COMPANIES. (REFER TO EP.2) IN THAT CASE, THE DATA CURATOR CREATES GUIDELINES ON HOW TO CREATE DATA AND COMMUNICATES PROGRESS WITH OUTSOURCING COMPANIES. , Q&A RESPONSE, ETC.
✔️ Third: Plan and execute model and service evaluation
Data curators are probably the ones who have looked at data the most during the development of AI models. As he was in charge of continuous communication with the service planning team, he is also one of the people with the highest overall understanding of the service. Therefore, they are responsible for the planning and execution of offline tests (quantitative + qualitative) to select the service release version from among the AI model candidates. In addition to offline testing, you can also be responsible for continuously tracking feedback on model performance after the AI model is released into service. Based on these online and offline test results, we derive and reflect improvement measures for AI models and training and evaluation datasets together with members. (Refer to EP.3)
3. IDE Developer (Integrated Development Environment Developer)
The term IDE developer may be unfamiliar, but to put it simply, it is a person who develops various tools necessary for model development and data creation . In the development process, countless repetitive tasks occur, and it is very important to handle and manage these tasks quickly and conveniently. By leveraging the tools created by IDE developers, you can increase performance and efficiency within your team. So what tools are they developing?
✔️ Annotation Tool
One of the ways to improve model performance is to change the definition of model output. If the definition of the model output has changed, the annotations should change accordingly. Therefore, it is inevitable to change the annotation tool, and it is necessary to develop our own annotation tool to respond effectively to this situation.
✔️ Model and data analysis tools
Model performance is determined by the model structure, training data, and training methodology. Therefore, data analysis tools are needed to improve model performance. This is because multi-layered data analysis is required to capture better data.
In addition, analyzing the weight learned in the model, the log left in the training process, and the response of each data sample is one way to improve the model performance. Therefore, the development of related tools to help with such repetitive analysis tasks will also greatly contribute to increasing efficiency. If it is assumed that multiple models are used at the same time, detailed analysis and debugging work can be difficult with only the console window and Jupyter notebook. It is desirable to develop a separate analysis tool and use it appropriately for efficient model analysis and debugging .
✔️ Model development automation pipeline
A number of backend tasks are required outside of model development, such as linking between models and preparatory work before model deployment. Since these tasks are usually repetitive, the time from model experimentation to service application can be shortened through automation of the relevant pipeline.
4. Model Quality Manager
WHAT IS THE PURPOSE OF THE TASKS PERFORMED BY THE AI MODEL TEAM? DEVELOPING A HIGH-PERFORMANCE AI MODEL THAT MEETS THE CUSTOMER'S REQUIREMENTS. TO ACHIEVE THIS GOAL, YOU WILL ALWAYS NEED SOMEONE TO MANAGE THIS PROJECT. THAT PERSON IS THE 'MODEL QUALITY MANAGER'. IT SERVES AS THE PERSON IN CHARGE OF UNDERSTANDING AND COORDINATING THE PROGRESS OF VARIOUS TASKS WITHIN THE ORGANIZATION AND HOLDING THE KEY FOR A SUCCESSFUL VOYAGE TO THE FINAL DESTINATION, 'SUCCESSFUL AI MODEL DEVELOPMENT'.
EFFICIENT ORGANIZATION: AI MODEL SERVING TEAM
Among the service requirements of customers, there are cases where requests are made not only for AI model development but also for model service . At this time , the AI model serving team receives the model created by the AI model team and performs a separate operation according to the serving device . Let's take a look at the two roles that fill these roles: model engineer and app developer.
1. Model Engineer
MODEL ENGINEERS RUN MODEL OPTIMIZATIONS BASED ON THE DEVICE ON WHICH THE AI MODEL IS SERVED.
In the case of an AI service whose operating environment is mobile, a series of operations will be required to convert the model worked with PyTorch to TensorFlow and then to TFLite. If the model uses an operation that cannot be converted to TFLite, it may change the model structure, retrain, or implement a custom layer of the operation. In addition, CUDA programming for model lightweight such as distillation and quantization and high-speed GPU processing can also be performed.
2. Application Developer and BE Engineer
In the development stage, when an optimized model comes out according to the serving target device, separate engineering work is required for the actual model to be served . In the case of mobile services, the required work will vary depending on the mobile device environment such as Android - IOS. On the other hand, in the case of API serving, backend tasks are accompanied by various conditions such as CPU serving - GPU serving, Public Cloud - Private Cloud, and the serving environment. It is the app developer or BE engineer who is in charge of this.
EP.4 GOING OUT
지금까지 <서비스 향 AI 모델 개발>의 마지막 에피소드인 효율적인 AI 팀 구성법을 살펴보았습니다. 하나의 AI 모델을 개발하기 위해서는 위와 같은 많은 구성원의 노력이 들어가고 있습니다. 업스테이지의 모델 개발팀도 최고의 AI 기술력과 뛰어난 커뮤니케이션 능력으로 멋진 AI 모델(AI Pack)을 만들어 가고 있답니다 😊
I INTRODUCED HOW TO DEVELOP AN AI MODEL FOR A SERVICE IN A TOTAL OF FOUR EPISODES. WAS IT HELPFUL? WE LOOK FORWARD TO SEEING YOU WITH MORE USEFUL AND FUN TECH KNOWLEDGE IN THE FUTURE. THANKS FOR READING SO FAR.