Upstage

View Original

Introducing ‘Layout Analyzer’.

2023/12/14 | 3 mins

If you want to know more about Layout Analyzer, receive update notifications!

See this form in the original post

WHEN WE READ A DOCUMENT, WE CAN NATURALLY RECOGNIZE MANY THINGS, SUCH AS WHICH PART IS THE TITLE, HOW TO READ THE TABLE, AND UNDERSTANDING THAT THE SMALL TEXT UNDER THE IMAGE IS A CAPTION, WITHOUT MAKING ANY EFFORT. HOWEVER, EXISTING OCR TECHNOLOGY SIMPLY RECOGNIZES LETTERS AND CANNOT AUTOMATICALLY UNDERSTAND THE DOCUMENT OR READ THE CONTEXT IN WHICH EACH LETTER IS PLACED.

We introduce the features of Upstage Layout Analyzer, a powerful API designed to automatically understand and easily extract document structure beyond simple character recognition. Layout Analyzer extracts elements such as paragraphs, tables, pictures, comments, formulas, headers, and footers to determine the structure of the document, and finally converts the document into HTML format by arranging the elements according to the context order. We will explain how Document AI technology, which understands the structure of documents, can help your work by comparing it to existing OCR.

Layout Analyzer's strengths can be broadly divided into three categories: detecting elements in a document, reading the order appropriate for the context of the document, and recognizing relationships between paragraphs (image-caption / table-caption).


Element detection in a document

Recognizes Header (repeated upper letters in the document), Footer (bottom letters repeated in the document), Paragraph, Caption (captions for images, tables, etc.), Table (table recognition), Image (images, graphs), etc. and saves them separately. It is also possible to do so. The feature is that data can be extracted cleanly because each element is known and text can be detected. In particular, when tables and charts exist together in multiple columns, it is difficult to recognize them as one line and extract the correct data, but using Layout Analyzer makes extraction much easier.

Additionally, you can download the results of document structure recognition as HTML code. Any document can be converted to HTML code by going through Layout Analyzer. Because HTML code can be returned on an element-by-element basis, it can also be modified on a per-unit basis. Furthermore, because font sizes can be recognized differently, elements of that size can be tagged and modified in batches.

Since font size can also be distinguished by the font-size element, not only can large and small letters in a document be stored in a database, but font sizes can also be stored as numbers. Make your document a database that contains a visual hierarchy rather than a database with simple text extraction.

Context-aware serialization

(Invoice image source: invoicehome.com)

Upstage Layout Analyzer can extract data in the order in which letters are read in context, just as a person recognizes a document. Existing OCR only recognizes text regardless of the structure of the document, so even text that should not be read as a single line due to the unit of information would be read as a single line. The Layout Analyzer function, which analyzes document structure and recognizes chunks of information, can omit complex data preprocessing tasks that had to be applied after text extraction.


Relation extraction between elements

Layout Analyzer extracts relationships between elements, specifically detecting relationships between tables and captions, and between figures and captions. Detecting a relationship means that the caption of a table or figure is cross-referenced, so that when a table is recognized, the description for that table is labeled with a caption, and when you specify a caption, you can immediately render and view the table for that table. Thanks to these advantages, it is easy to understand the context of the entire document even if only the text is extracted.

How to use Layout Analyzer

BEYOND THE LIMITATIONS OF OCR, HERE ARE SOME WAYS TO MAKE THIS TECHNOLOGY EVEN MORE USEFUL.



USED FOR CORPORATE LLM DEVELOPMENT

When building a corporate LLM (Large Language Model), it is also important to create a good knowledge base by learning the data our company owns. The data to be included in LLM will not only include text, but more than 80% will be documents of various formats such as reports, tables, and emails accumulated within the company. When converting documents in various formats into digital assets, you can obtain richer information by using Layout Analyzer. Digital assetization for LLM development and utilization goes through the process of document structure analysis → markdown → vectorization → query embedding and LLM inference. Learn more about the digital capitalization process for developing and leveraging your LLM here



UTILIZED IN COMBINATION WITH GENERATIVE AI

In 2023, when generative AI is a hot topic, everyone will be thinking and working on how to utilize it more efficiently. Generative AI answers questions quite systematically and creatively. However, rather than asking generative AI a text question, it is difficult to properly hear the answer to a document containing complex visual information such as tables, graphs, and paragraph dependency relationships. Even if we use a combination of OCR and generative AI, we will now change the experience of not receiving a proper answer due to manual work or not properly understanding the document. Easily reproduce your documents as another form of knowledge with Layout Analyzer features such as document summarization, reorganization, report creation, and answering questions about more complex data.



Work automation

Automatically recognizing document elements through Layout Analyzer can save you a lot of repetitive work. You can extract only the recipient's address from invoices in various formats at once, and you can automatically extract major and subtopics from documents in various formats at once and easily organize them. Important economic news that appears in newspapers every day can also be crawled and converted into data more easily and conveniently into reports.




Sign up for update notifications

If you're curious about how you can create your own knowledge base to streamline your work and increase business value, sign up for Layout Analyzer's update notifications. We will introduce you to various use cases that can innovatively improve your products and services using new technologies.

📑 Layout Analyzer

Sign up for Layout Analyzer update notifications

Get the fastest updates on Layout Analyzer, which automatically understands document structure beyond the limitations of existing OCR.


←Back to Blog