Upstage

View Original

Developing service-oriented AI models Ep.3 Deriving test datasets, test methods, and model requirements

2022/03/04

⏱ 15mins 
See this content in the original post

EP.3 BEGINS

hello. last EP. 2 , we learned about the 'training dataset preparation' process for service-oriented AI model development. We have included a journey to create a learning dataset that is optimized for the service by reflecting the customer's service requirements and restrictions.

[Figure 1] Training dataset construction process

In this EP.3, we will talk about the test dataset of the 💡AI model from the service requirements and how to test the dataset . Also, let's look at what model requirements should be checked before the AI model is applied to actual service.

Test dataset and test method

1. Test dataset

For AI model performance validation, ✔️ test dataset and ✔️ test method must be prepared. The test dataset usually uses part of the training dataset. Of course, this requires the assumption that the distribution of the test dataset matches the distribution of the training dataset . (The case where the two distributions do not match will be dealt with next time)
Let's take a look at how to test a model that selects some data from the training dataset and uses it as a test dataset.

[Figure 2] Building a test dataset with service requirements

2. Test method

There are two test methods.

✔️ Offline Test : Evaluation in the development environment before application of actual service

✔️ Online Test : Evaluation in environment after application of real service

Here are some things to keep in mind. The point is that the environment at the development stage and the environment may be different when applied to the actual service . In this case, it is very likely that the offline test and the online test will yield different results. After all, since a service-oriented AI model is a model that implements excellent performance and results when applied to real services, it can be said that the more similar the results of offline testing and online testing, the better the model. Therefore, quality assurance of service is only possible with careful design of offline testing.


📍 What is the difference between offline and online test results?

👉There are usually two causes. The first is 'dataset incompleteness '. The dataset used for model development and offline evaluation does not sufficiently reflect the real world. Among the data coming into the AI model in real service, it is very likely that things that were not considered during the development process will also come in.

The second is ' model drift' . Some services have frequent data changes over time. At this time, the model that has been trained on a fixed dataset has the potential to inevitably deteriorate over time as data accumulates.

Two test methods and types

Now let's look at the types of offline testing and online testing.

[Figure 3] Evaluation methods and types of offline and online tests

1. Offline testing

IN THE QUANTITATIVE EVALUATION STAGE OF OFFLINE TESTING BEFORE ACTUAL SERVICE APPLICATION, PERFORMANCE EVALUATION CANNOT ALWAYS BE PERFECT. INSTEAD, THE QUANTITATIVE EVALUATION OF OFFLINE TESTING IS INTENDED TO SELECT THE BEST MODEL FROM A DIVERSE SET OF AI MODEL CANDIDATES. THE SELECTED AI MODEL CANDIDATES ARE CAREFULLY ANALYZED THROUGH QUALITATIVE EVALUATION, AND ONE FINAL RELEASE VERSION IS SELECTED.

2. Online test

Apply the AI model determined through offline testing to actual service. At this time, another evaluation, an online test is conducted. In the case of online quantitative evaluation, the environment must be set up properly in advance so that evaluation indicators can be automatically output in service scenarios. This is to discuss later when and how to update the model through model performance monitoring .

If the basic data required for model renewal has been secured through quantitative evaluation of the online test, the qualitative evaluation of the online test is a step to find specific improvement plans based on the voices of customers who use the actual service.

If I had to pick the most important of the four evaluation items, it would be the normal evaluation of the online test . This is because the qualitative evaluation of the online test is based on the inconvenience or error experienced by users who use the actual service ( Voice of Customer ). Various opinions from users can be a big hint to improve the AI model. Modelers and dataset development members collect and discuss these hints periodically to derive improvements for the model, training, and evaluation dataset, and reflect these hints in the model and dataset to ensure better service quality.

Derive model requirements

So far, we have talked about the process of deriving training datasets, test datasets, and test methods from service requirements. One last thing to check. That's the job of specifying ✔️ model requirements from service requirements. AI model requirements can be divided into five main categories. Of course, items may change depending on the service and modeling environment.

[Figure 4] Five model requirements

1. Processing time

YOU NEED TO CONSIDER THE TOTAL PROCESSING TIME IT TAKES FOR ONE INPUT TO COME IN AND THE RESULT TO COME OUT AS AN OUTPUT. AT THIS POINT, THE DEFINITION OF PROCESSING TIME MAY BE DIFFERENT FOR OFFLINE TESTING AND ONLINE TESTING. IT IS IMPORTANT TO DEVELOP A MODEL THAT CAN ACHIEVE OPTIMIZED TURNAROUND TIMES TAKING INTO ACCOUNT USER EXPERIENCE AND QUALITY OF SERVICE PERSPECTIVES. WHEREAS THE SERVICE PLANNING TEAM IS RESPONSIBLE FOR COMMUNICATING PROCESSING TIME-RELATED REQUIREMENTS, THE AI TECHNICAL TEAM IS RESPONSIBLE FOR DEVELOPING MODELS BASED ON THESE COMMUNICATIONS.

📍 Please explain how the definition of processing time differs between offline and online tests.

👉LET'S TAKE AN EXAMPLE OF AN AI SERVICE THAT AUTOMATICALLY DETECTS A FORMULA AREA BY TAKING A PICTURE OF A FORMULA. IN OFFLINE TESTING, THE PROCESSING TIME CAN BE RELATIVELY SHORT. BECAUSE YOU DON'T HAVE TO CONSIDER THE TIME THE USER TAKES A PICTURE WITH THEIR CELL PHONE CAMERA AND WAITS FOR THE RESULT. BUT ONLINE TESTING TAKES ALL OF THESE THINGS INTO ACCOUNT.

If it takes more than 5 minutes for the information in the recognized formula image to be implemented, the user will feel uncomfortable. As a result, it is necessary to achieve fast turnaround times in order to remove the anxiety factor that will cause users to churn from the service.

✔️ Offline test : Total time for information to be output in the formula area after inputting an image in the model

✔️ Online test : Total time from when a user takes an image to when the image enters the formula area and the derived information is displayed on the service screen

[Figure 5] Example of processing time test

2. Target Accuracy

CRITERIA AND REQUIREMENTS FOR QUANTITATIVE ACCURACY IN AI TECHNOLOGY MODULES ALSO EXIST. IT IS ONE OF THE REQUIREMENTS TO CLARIFY WHAT LEVEL OF ACCURACY MUST BE ACHIEVED BEFORE A SERVICE CAN BE LAUNCHED. EVEN WITH THE SAME AI MODEL, TARGET ACCURACY MAY VARY DEPENDING ON INDIVIDUAL SERVICE SCENARIOS, SO ACCURATE COMMUNICATION WITH THE SERVICE PLANNING TEAM IS REQUIRED.

📍 Explain with an example of target accuracy.

👉Let's say you are developing a service that takes a credit card image, automatically recognizes the card number and expiration date, and then inputs it. The model accuracy in the development stage can be expressed as the edit distance for the difference between the model output value, the actual card number, and the validity period (correct answer) numeric string.

On the other hand, from the point of view of the actual service, the fact that the result value is correct means that the user does not separately modify the result value. In the process of recognition and derivation, incorrect results do not come out. You should not give the user the inconvenience of having to manually modify the value. Therefore, the accuracy of the AI engine in the actual service application stage can be defined as 'the probability that the service user will modify the result of the AI model '.

THE AI TECHNICAL TEAM SHOULD PAY ATTENTION TO THE DIFFERENCE BETWEEN OFFLINE AND ONLINE TESTING, DESIGN A PERFORMANCE MEASUREMENT METHOD OPTIMIZED FOR SERVICE SCENARIOS, AND TAKE CARE SO THAT THE AI ENGINE CAN REACH THE TARGET ACCURACY.

✔️ Offline test : Edit Distance for card number and validity period in the entered image

✔️ ONLINE TESTING : Probability that user will modify the result of AI model

3. Target QPS (Queries Per Second)

IN ORDER TO PROVIDE HIGH QUALITY OF SERVICE, IT IS NECESSARY TO INCREASE THE NUMBER OF REQUESTS THAT CAN BE PROCESSED PER SECOND OF A SERVICE. FOR EXAMPLE, IF 100 IMAGES PER SECOND ARE UPLOADED TO A SERVICE USED BY 100 USERS PER SECOND, 100 QPS WILL BE REQUIRED. IF SUCH AN ENVIRONMENT IS NOT PROVIDED, ERRORS SUCH AS STUTTERING AND STUTTERING IN THE SERVICE WILL CONSTANTLY OCCUR, CAUSING INCONVENIENCE TO THE CUSTOMER EXPERIENCE. SO HOW TO IMPROVE QPS?

✔️ How to increase the equipment : Assuming that the number of requests that can be processed per GPU is x, if we increase the GPU by n units, the QPS will be n * x . If you increase the number of equipment, the number of request processing will naturally increase, but it cannot be guaranteed that the number of requests will increase by a factor of n according to the number of equipment. Although adding equipment is the easiest way to improve QPS, the decision must be made taking into account the cost issues and bottlenecks created by the distributor because of the need to distribute requests that will increase by n times.

✔️ How to reduce processing time : Reducing processing time means faster processing of AI models. If the processing speed of the model increases by n times, then QPS will also increase by n times accordingly.

✔️ How to reduce model size : If the number of models on one GPU increases by n times, the QPS also increases by n times. In other words, by increasing the number of models, you increase the number of requests that can be processed. However, it should be noted that the QPS does not increase linearly as the model size is reduced. When the GPU memory is 10 GB and the model size is 8 GB, the maximum number of models that can be mounted on one GPU is one. There is no change to the maximum number, unless the size of the model is reduced to 5 GB or less. Only when it is reduced to 5 GB will the two models be able to go up to 10 GB of GPU memory.

4. Serving method

DEPENDING ON HOW THE AI MODEL IS SERVED, THE REQUIREMENTS OF THE MODEL ALSO CHANGE. FOR EXAMPLE, IT IS A LOGIC THAT THE MODEL DESIGN SHOULD BE DIFFERENT DEPENDING ON THE TYPE OF HARDWARE DEVICE. DEPENDING ON WHETHER THE TECHNOLOGY MODULE OPERATES IN A MOBILE ENVIRONMENT OR A WEB SERVER, THE PROCESSING SPEED AND MEMORY CONSTRAINTS OF THE MODEL CHANGE. IN ADDITION, CONDITIONS SUCH AS CPU SERVING AND GPU SERVING, LOCAL SERVER AND CLOUD SERVICE MUST ALSO BE REFLECTED IN THE REQUIREMENTS, SO YOU NEED TO LOOK CLOSELY AT THE SERVICE REQUIREMENTS.

5. Equipment Specifications

ALTHOUGH THIS IS OFTEN THE CASE, THERE ARE CASES WHEN A CUSTOMER DOES NOT HAVE ITS OWN SERVING EQUIPMENT AND SO REQUESTS TO BUILD THE EQUIPMENT AS WELL. AT THIS TIME, THE EQUIPMENT SPECIFICATIONS FOR RUNNING THE MODEL IN THE SERVICE ARE DETERMINED BY CONSIDERING CONDITIONS SUCH AS QPS IN THE SERVICE AND AVAILABLE BUDGET. PROCESSING SPEED AND MEMORY SPECIFICATIONS MAY VARY DEPENDING ON EQUIPMENT SPECIFICATIONS, SO IT IS IMPORTANT TO DEVELOP A MODEL THAT REFLECTS THESE CONDITIONS.

[Figure 6] Summary of model requirements

EP.3 GOING OUT

SO FAR, WE HAVE LOOKED AT THE 'TEST DATASET' AND 'TEST METHOD' FOR AI MODEL VALIDATION BASED ON THE SERVICE REQUIREMENTS. IN ADDITION TO THIS, THE POINT THAT THE MODEL DEVELOPMENT SHOULD BE CONSIDERED IN CONSIDERATION OF THE FIVE MODEL REQUIREMENTS WAS DISCUSSED IN DETAIL THROUGHOUT EP.3.

[FIGURE 7] SUMMARY OF SERVICE-ORIENTED AI MODEL DEVELOPMENT METHOD

<서비스 향 AI 모델 개발하기>의 마지막 편인 EP.4에서는 효율적인 AI 모델 개발을 위한 조직 구성은 어떻게 이뤄져야 하는가에 대해 다뤄보겠습니다. 이번 화도 유용한 정보가 되었길 바라며 이만 마치겠습니다. 감사합니다.


Episode composition

📌 EP.1 Differences from the AI model development environment

📌 EP.2 Preparing the training dataset for AI model development

📌 EP.3 Test dataset, test method, and model requirements derivation

📌 EP.4 How to Form an Efficient AI Team for AI Model Development