Query Suggestion on Drugs e-Dictionary Using the Levenshtein Distance Algorithm

.


Introduction
The decision to use a drug (medication drug) always raise concern on the benefits and risks so that a Pharmacist needs a drug dictionary to search for previously unknown terms of medicine [1].Besides, the drug dictionary becomes one of the learning tools that are used by Pharmacists, Students, and the Indonesian community in learning medicine or foreign terms about medicine.The drug dictionary that is used nowadays is in the form of a thick physical dictionary book.It turns out to have drawbacks, such as it is too heavy to be carried so that it is not practically handy.This is one of many reasons for Indonesian developers to compete in creating an electronic dictionary of drugs or what we know as the term drug e-dictionary.Most of the available drugs e-dictionaries that have been developed so far are still in the form of a letter-index based dictionary.It makes users have to search for words or terms one by one in a sequential fashion.This has become so inefficient and ineffective that it is necessary to add a search function to the drug e-dictionary.The search function on drugs e-dictionary is very important because it can be used as a shortcut when searching words or terms needed so that users can search for words effectively and efficiently [2][3].
The drug e-dictionary search function needs to be optimized with the addition of the Query Suggestion facility.Query Suggestion is some interface between a user and a search engine [4].This facility is an effective and efficient approach to help the user in the process of finding information by providing a suggestion for the user when mistyping is happened in the search form [2][3] [5][6].This feature is very important to be applied since it can improve the usability factor of searching [7] false query in a database [9].This feature can be a solution for preventing the user from typing the wrong name of the drug.The Query Suggestion can be used in a search application by implementing the Levenshtein Distance Algorithm.
Research on query suggestion has been done by Jiang et al. (2008), namely Query suggestion by query search: a new approach to user support in web search [3].Meanwhile, research on the Levenshtein Distance algorithm was conducted by Ngafidin and Wibawanto (2015), namely the Implementation of the Autocomplete Feature and the Levenshtein Distance Algorithm to Increase the Effectiveness of Word Search in the Indonesian Big Dictionary (KBBI) [10].This study aims to build a query suggestion facility using the Levenshtein Distance algorithm on drugs e-dictionary.This research is critical to do so that pharmacists, students, and the public can easily search for drug terms in the drug e-dictionary.

Research Methods
The research method used in this study consists of several stages, as shown in Figure 1, which is described below: 1. Development of Web-based e-Dictionary Drugs Development of Web-based e-Dictionary Drugs using the SDLC (System Development Life Cycle) method that has been adapted to the needs of Web-based Drugs e Dictionary [11][12].The stages are Plan, Analysis, Design, Code, Testing.

Implementation of the Levenshtein Distance Algorithm
The implementation is done by adding the Levenshtein Distance algorithm in the PHP programming language.

Testing Query Suggestion
Testing is done by inputting drug terms in the search form as many as 100 terms.The number of terms entered consists of 50 correct terms, 50 incorrect terms, or incorrect terms.

Usage
Drugs e-Dictionary that has been tested is then hosted to be used by users.In the planning stage, data collection is carried out.Data was collected from the ISO Indonesian Information Specialist book [13].The collected data consist of drug categories, drug names, indications, contradictions, side effects, drug interactions, dosages, packaging, and drugs warning.

Analysis
In this stage, system functionality requirements and non-system requirements are collected.There are 28 system functionality requirements, namely 10 front end system functionality requirements and 18 back end system functionality requirements.The non-functional requirements only produced 7 system non-functional requirements.

Design
Next, in this design stage, a search system flow will be developed.It is depicted in Figure 2. 1. User accesses drugs e-dictionary website 2. User searches for the name or drug term in the search form  If the user's query is empty, then the system will show empty query notification or "query has not been inserted". If the inputted query is in the database, then the system will show search results. If the inputted query is not available in the database, then the system will proceed with Levenshtein Distance Algorithm followed up with the query suggestion

Code
In this implementation stage, the system is developed in PHP language with MYSQLi for the database connection.The result is a web drug e-dictionary.Drugs e-dictionary consists of the main searching page, which searches based on drugs term as depicted in figure 3; searching based on disease indication, as shown in Figure 4; and A-Z index-based searching, as depicted in Figure 5.

Testing
A black box is used in the testing stage.It is usually called a system functional test [11].Based on the test, 28 functions from the system is running as expected.

Levenshtein Distance Algorithm Implementation
The Levenshtein Distance algorithm is an algorithm created by Vladimir Levenshtein in 1965 [14].This algorithm looks for the distance between the words entered by the user and the words stored in the system database by the method of calculating the number of differences between the two strings in the form of a matrix [15] [16].It works by calculating the distance between the two strings and then look for the minimum number of change operations to change from string A to string B. The calculation is represented using the Levenshtein distance calculation table, where the last value in the lower right corner is the final value of the second distance string.In the Levenshtein distance algorithm there are three operations performed, namely the operation of changing characters, adding characters and deleting characters [17 [18] [19].Figure 6 is a pseudocode Levenshtein distance algorithm.The pseudocode of the algorithm, as depicted in figure 6, can be computed manually, as shown in figure 7

Testing the Drugs e-Dictionary with Query Suggestion Added Facility
Tests carried out in the form of validation testing by inputting 100 test queries into the search form.Table 1 summarizes the results of the validation test on the drug e-dictionary.Figure 9 shows an example of query suggestion testing.Whenever an inputted query is not in the database, the system will show notification of "The inputted query does not exist in the database", which then will show Query suggestion by generating some terms that are closer in the database.Let's take "diparin" as an inputted query (unknown term in the database).The system will show "dapyrin" as the suggestion (Table 1).
The developed system uses a non-case sensitive query checking.Hence it will not affect the output, whether the inputted query is in an uppercase or lowercase.In addition, if the inputted query is a meaningless word, such as "ZZZZ", then the system will show a word with initial letter Z that has the fewest number of words in the database, in this case "Zalona".The system will search for any terms with minimal Levenshtein Distance algorithm operation.
The evaluation of accuracy represented in a confusion matrix (Table 2), which has four classification process results, namely: True Positive (TP), True Negative (TN), False Positive (FP) dan False Negative (FN) [20].

Conclusion
The drug e-dictionary search function needs to be optimized with the addition of the Query Suggestion facility.The Query Suggestion facility was developed using the Levenshtein Distance algorithm.Based on the results of the implementation, the Levenshtein Distance algorithm runs from the top left corner of a two-dimensional array that has been filled with several initial string characters and target strings and is given a cost value.The cost value at the lower right-hand end is the Distance edit value that represents the number of operations that the algorithm has to process.Based on the test results, the system can evaluate words that are not in the database with the query suggestion function closest to the database.It reaches 90% accuracy of the inputted query, with 90% precision and 90% recall in the confusion matrix.The future work is the implementation of n-gram on drugs e-dictionary and performing a comparative analysis of Levenshtein distance algorithm with n-gram.

LONTAR
[8].It works by looking for the similarity between a correct query and a

Figure 1
Figure 1.Research Method

Figure 2 .
Figure 2. Developed Search System Flow

and figure 8 .Figure 7 .
Figure 7.The First row and column initialization  For each character, compare each character from inputted word with an actual word in the database.If it is a match, then the cost is 0. Otherwise the cost will be 1  Check the minimum, d[i,j] Top = d[i,j]+1 Side = d[i,j]+1 Diagonal = d[i,j]+ cost  Compare character P with P, put cost = 1 if differ, otherwise cost =0 Check all values in d [i,j]

Figure 8 .
Figure 8. Manual computation process of The Levenshtein Distance algorithmIn Figure8, the distance generated is a value that is in the lower-right corner of the matrix, which is 1.The value of one means there is 1 operation performed.The value of one is generated from the operation of the sum of the cost values with a minimum diagonal value.The distance value obtained from the diagonal side means that the operation that works is a

Figure 9 .
Figure 9. Result of Levenshtein Distance algorithm implementation on drugs e-dictionary

Table 1 .
Example of Words in query suggestion validation testing

Table 1 ,
validation tests are categorized into an insert, delete, and substitution operations.