Thanks to the development of artificial intelligence technology, research in the field of communication between humans and computers has been vitalized, showing tangible results. Research on the processes in which a computer understands and translates human speech and generates and responds to a sentence, based on analysis and reasoning about it is still being actively conducted. However, in this process, the representative feature that the computer has not yet completely extracted from human language is the sentiment. The sentiments, contained in human language are expressed in all areas of society and used in various ways. For example, the author of an academic thesis expresses his own theories or his claims about existing theories with positive or negative sentiment in the text. We are actively expressing our sentiments on social media that we face and participate in in our daily lives, and influence economic activity through positive or negative evaluations of purchased products. In this respect, automatic extraction of human text sentiments is a very important research field, and it is a major technology that can create economic utility as a new business model and attracts attention as a new industrial field.
In general, the sentiment contained in a sentence in the speech act performed by humans can be divided into positive, negative, and neutral. In this paper, we will study the triplet extraction method, which is newly introduced in the process of developing the technology to automatically extract the above three speaker sentiments in the sentence, and BIO tagging as a basic labeling technique for this. To this end, in this paper, we will analyze the triplet-related research results announced in 2020 and 2021 in depth, track their strengths and weaknesses, and apply them to the new triplet method we are developing. Our company has researched and developed Triplet that automatically extracts the emotion of a sentence based on Korean data. As shown in <Figure 11>, the head office built and tested the Triplets for the Hangul data predicted by the GTS model trained with the Multilingual Bert model. The accuracy of Aspect and Opinion terms is 0.7 and 0.6, respectively, which is not a very high score, but it is judged that the accuracy will increase with additional data acquisition and update. The triplet accuracy is 0.3, which is still a low score. It is speculated that the reason why the triplet accuracy is low is because the training was conducted with the ‘Multilingual’ model rather than the Hangul-specific model. Therefore, we plan to train with a Korean-specific model by building a Korean training data set, and it is expected to show high accuracy soon in the future.