Why data matters in service-oriented ai models

September 14, 2022 Hailey (Park Seong-min) .

2022/09/15

⏱ 3mins

See this content in the original post

At a conference this year, world-renowned artificial intelligence expert Andrew Ng emphasized the importance of data-centric AI , saying that data is becoming more important than models in AI development. Why does AI ultimately boil down to data?

from past content After pointing out the things to consider when introducing AI technology to your business , this time, we would like to look at the process required to develop an AI model for a service and find out how important data plays in it.

SERVICE-ORIENTED AI MODEL DEVELOPMENT PROCESS

IN PRACTICE, THERE ARE FOUR MAJOR STEPS TO DEVELOPING A SERVICE-ORIENTED AI MODEL THAT CAN BE PROVIDED TO CUSTOMERS.

1. Project Setup

First, you need to establish the requirements of the model . It can be seen as a kind of project setup stage to set detailed conditions such as processing time, target accuracy, target qps, serving method, and equipment specifications. It sets the direction of development, so it becomes the foundation.

2. Data Preparation

Next, you need to prepare your data to achieve your goals. Recently, there is a methodology to train a model without data, but most of the ai models used in the service are still supervised learning basics . In other words, you need a dataset with correct answers to train your model.

So, the second step is to discuss what kind of data you need, quantity or labeling, and prepare a data set for it.

3. Model Training

The third stage is the Model Training stage, which enters the modeling work . This is the model training stage, where you think about the model structure, think about how to optimize learning, and create a model that achieves the requirements set in the first step above.

4. Deploying

When the requirements of the model are achieved, the final step of launching the service to which the AI model is applied is executed. After deploying the AI model, as with other software, unexpected issues may occur, so it is necessary to monitor the performance and also work to resolve issues when issues arise .

This is a series of processes for developing a service-oriented AI model. To describe this process in one sentence, it is ultimately about creating a model that meets the requirements. However, as mentioned earlier, variables that we did not think of occur even after deploying the AI model to the service. In the end, the entire process of developing an AI model for a service is a process of continuously meeting the requirements of the model to maintain performance. there is.

Two Ways to Maintain AI Model Performance (Source: https://www.deeplearning.ai/wp-content/uploads/2021/06/MLOps-From-Model-centric-to-Data-centric-AI.pdf )

TWO APPROACHES TO MAINTAINING SERVICE-ORIENTED AI MODEL PERFORMANCE

HOW DO YOU MAINTAIN THE PERFORMANCE OF THESE CRITICAL AI MODELS? THERE ARE TWO MAIN APPROACHES.

Model-Centric: Optimizing the model structure, fixing data and improving model performance
Data-Centric: How to improve model performance by modifying or adding data, leaving other code untouched

How to achieve model performance for service launch

So what about the weight of data and models in achieving model performance for the first release? Before the AI model is released into service, the importance of Data-Centric and Model-Centric is considered to be 50% vs. 50% respectively.

This is because many of the factors discussed when setting up model requirements (processing time, target qps, serving methods, equipment specifications) are dictated by the power of the model, but both data power and model power are required to ensure accuracy. Therefore, it is common for both of these methods to occupy the same weight until the model is released into service.

How to improve model performance after service launch

On the other hand, when you want to improve the performance of the model you are using after the service is released, the effort devoted to the power of data (Data-Centric) takes up more than 80%.

This is because the most demanding performance improvement for accuracy after service launch. At this point, changing the model structure to improve accuracy is expensive because requirements for processing speed, qps, memory size, etc. must also be re-validated. Therefore, after the release of the service, the power of data is important because the model structure is not changed if possible and the performance is improved only by the power of the data or the performance is improved by slightly changing the model training method.

THE IMPORTANCE OF DATA CANNOT BE OVEREMPHASIZED SEVERAL TIMES BEFORE CREATING AI MODELS THAT CAN BE USED IN SERVICES LIKE THIS. WHAT PROBLEMS WILL YOU FACE IN SOLVING REAL BUSINESS PROBLEMS, AND HOW WILL YOU SOLVE THEM WITH DATA?

If you are curious about real business cases, you can get more insights through the Upstage Talk , which shares digital innovation experiences in the financial sector, or the AI Tech lecture video that will be released on the Upstage website in the future.