Proposing a novel community detection approach to identify co-interacting genomic regions

: Modern next generation sequencing technologies produce huge amounts of genome-wide data that allow researchers to have a deeper understanding of genomics of organisms. Despite these huge amounts of data, our understanding of the transcriptional regulatory networks is still incomplete


Introduction
Nowadays, in medical bioinformatics science, because of the complexity of the biomedical data [1], the diagnosis of disease related factors is challenging. Specially, for those genomics factors that are working together to perform a biological function [2]. The detection of these interacting genomic elements is very important for better understating of disease factors. Chromosome Conformation Capture (3C) assays are now the method of choice to study the role of DNA looping in transcriptional regulation. These assays directly identify genomic loci that are brought in close enough proximity to each other in living cells to be cross-linked. This new technology allows for the mapping of chromatin interactions on a whole genome level. Cabreros et al. used a community-based algorithm on Hi-C data to detect community of interacting genomic regions in mice and humans. Their proposed algorithm was able to detect a variety of communities. Also, this algorithm could detect communities of neighboring DNA locations [3]. In 2016, Fotuhi et al. presented a multivariate clustering algorithm for the chromosome configuration data analysis to identify patterns of chromosomal interactions [4]. In 2016, Li et al. presented an optimal multiobjective algorithm based on the Particle Swarm Optimization algorithm (PSO) to detect communities in social networks [5]. In fact, this algorithm was able to detect the communities of nodes in each run. To test the effectiveness of this algorithm, the authors performed extensive experiments on artificial and real data. Finally, their experiments showed that their proposed method works better that those previous methods found in the literature [5]. In 2019, Zhou et al., proposed a graph-based clustering approach called "AR-Cluster" to identify communities in a complex network [6]. In this method, nodes in the graph are grouped together by a K-medoid framework.
As it was mentioned earlier, the detection of the communities in a complex network is challenging in most research fields such as computer science, social networks, biology, physics and medicine. Many of the proposed methods are typically related to the topological issue, the similarities between the attributes, or the degree of input and output of each vertex [7][8][9][10][11][12][13][14][15]. However, when the graph is widespread and complex, the identification of the communities would be either inefficient or time consuming [16][17][18][19][20]. Therefore, community detection in complex graphs has always been challenging [21][22][23][24][25][26]. In order to resolve the challenge ahead, in this study, a Paretobased Multi-Objective Optimization Genetic Algorithm is proposed to identify communities in the complex networks. We performed our proposed method on Hi-C interactions in mouse genome to identify interacting genomic regions. Our benchmarking results demonstrate that our proposed method work better than existing methods found in the literature to identify genomic interacting regions.

Material and methods
In the following sections, our proposed genetic-based multi-objective optimization algorithm to identify communities in Hi-C interactions of genomic regions is explained. In the network, regions of genome are demonstrated as the nodes and edges demonstrate the interactions between them. In addition, the weight of each edge is interrelated with these vertices. In this study, the Hi-C data obtained from NCBI database (GSE35156 and GSE69600) and analyzed by HiC-Pro package [27].

Problem statement
A non-oriented weighted graph provides a network with nodes and edges which can be represented as G = (V, E). Here the graph components are: V which is the set of nodes and E which is the edge set. The non-oriented weighted graph G consists of |V| N nodes with V v , v , … , v and |E| M as E e , e , … , e and W w , w , … , w . Also, the set of communities of the graph is represented as C c , c , … , c , in which any c ∈ C represents a community of the graph G.

Detect of community in the graph
In this section, our proposed method for exploring and extracting community in the genomic grid network is described. The proposed approach is a multi-objective genetic optimization algorithm based on Pareto optimization. To explain our model, assume that the graph G = (V, E) is the input of the algorithm in accordance with what was explained in the previous section. Below the objectives of our proposed method are explained one by one.

First objective: Modularity function
The modulatory function f in G is defined as follows: In this function, nc is the number of total community, nei the total number of edges in the community i, dvi the sum of the degrees of the vertices in the i-th community, and M the total number of edges in the graph [28][29][30][31]. In the proposed algorithm, the value of the f function lies in the interval [0,1], where the best mode of the function f is when its value is maximal [19].

Second objective: Average weight vertex function
The average weight function of community f is defined as follows: Here, Ni is the number of vertices in the i-th community, Wjk the weight between vertex j and k. f is the weighted average of the community obtained by the proposed algorithm. The best mode for f is when it is maximal. Therefore, in genomic graphs, both objectives (f and f ) must be considered to detect community. So, in each run, both objectives to be optimum in the sense of maximal.

Multi-Objective Optimization (MOO) concept
A multi-objective optimization problem with m decision variable and n objective is defined in relation 3 [32].
where x = (x1,…xm)ϵX is a vector of the m-dimensional decision and X is the search space, and y = (y1,…yn)ϵY is the target vector and Y is the target space. In general, in MOO, there is no single optimal solution for all purposes. In such cases, the optimal solution is a set of optimal solutions for one or more goals [25,[33][34][35][36]. This set is known as the optimal Pareto collection. Some of the Pareto concepts used in the multi-objective optimization are explained below.

Concept of Pareto dominance
To compare the qualities of the two solutions X and Y, we shall use the concept of dominance. For two decision vectors x1 and x2, the dominance (represented by ≺) is defined as Eq (4): The decision vector x1 dominates x2 if and only if x1 is better than x2 for all targets, and x1 is exactly higher than x2 for at least one target [34].

Pareto optimal collection
The collection of all optimal Pareto decision vectors is referred as the PS optimal Pareto collection.
The decision vector x1 is called the optimal Pareto when it is not dominated by all the other decision vectors x2 of the set.

Optimal Pareto front
The optimal front of the Pareto PF is the optimal Pareto image in the target space.

Crowding distance
The next concept used in multi-objective optimization based on Pareto is the crowding distance. Here we calculate the crowding distance for each of the objective functions separately. For example, if we have two objective functions, for each solution i, we calculate the crowding distance from i to all the other solutions j on the common front with i for both objective functions f1 and f2.We then consider the sum of these two distances as the crowding distance of the solution i. The crowding distance for the solution I is calculated as: To calculate the crowding distance i for each objective function, we also use the following formula: were f max and f min are the minimum and maximum of the target function, respectively, and f i-1 and f i1 are the solutions before and after the solution i, respectively. In other words, for each objective function, first, the solutions are arranged in descending order, and then the maximum value is considered as f max and the minimum value is considered as f min . Afterwards due to sorting of solutions, one can also easily identify the previous and next solutions. The Eq (8) is computed for each i, and finally, after calculating di, we can calculate the distance of crowding for all target functions [33,34,37,38].

Non-dominated Sorting (NS) algorithm
We use this algorithm to sort the paths and determine the Pareto fronts. This algorithm works in the following manner. 1) For all members of the population, we define a set called sp with null value and one variable called np with zero initial value. Hence, we will have: sp = The set of answers that dominated by p. np = the number of times the solution is dominated by the other solutions. 2) For each possible pair p and q of the population members we have: If p dominates q, then add q to sp If q dominates p, then add one unit to np 3) Add all the members of the population with np = 0 to F1 (the first Pareto Front). Using the actual Pareto Front (Fk) the next Pareto Front (Fk+1) is created. For this purpose, by eliminating the effect of the members of the Fk, the members are not dominated form the Fk+1 members. 4) We put the counter of fronts or fronts equal to 1, that is, k = 1 5) Consider Q as a draft of Fk+1. 6) For each member of Fk, such as p, and for each member of sp such as q (all qs that are dominated by p), one unit of nq is subtracted (i.e. the effect of p to q is not considered) 7) If we get nq = 0 while decreasing, then add q to Q. 8) If Q is empty (that is, nothing is left to add), the sorting process is over. And if Q is not empty, consider Fk+1 as Q and add one unit to k (Pareto front counter) and go to step 5. This will allow us to complete our Pareto fronts gradually [32,39].

Multi-objective genetic optimization algorithm based on Pareto
As mentioned before, the proposed algorithm for community in the genomic grid network is a multi-objective optimization algorithm based on Pareto optimization. Here, the optimal evaluation mode function is when this function is maximized. In fact, the maximum of the evaluation function is obtained when the values of both f1 and f2 are maximal. max max (9) In this research, we have converted the GA algorithm into the multi-objective algorithm to discover community by adding the following steps: 1) The quality of the solution based on the concept of dominance and using the Non-dominated Sorting algorithm or the NS algorithm.
2) Arranging the solutions based on the concept of crowding distance. In fact, multi-objective operations of the algorithm can be achieved by adding the following steps in the selection section of the solutions: a) Non-dominated sorting b) Calculating the Crowding distance c) Sorting the answers In multi-objective optimization, two criteria of the quality of solutions and their order are important: i) We look for an appropriate approximation of the Pareto front, which means that the answers we receive are surely non-dominated. ii) These answers cover virtually all of Pareto Front.
The goal of solving a multi-objective optimization problem is to find a form that has the quality and the order at the same time. An algorithm can be suited only when it has, first and foremost, a good quality, and second, provides order. Here, our primary criterion is to compare Dominate answers (i.e., which solution will dominate).
If, based on the dominance, we were not able to choose one of the two solutions, then the second factor would be the order. The proposed method is described in Figure 1.
The highlights of the proposed algorithm are: i) The answer with no other answer better than that has more points. The answers are ranked and arranged based on how many answers are better than them.
ii) The fitness for the answers is based on their ranks and failure of dominance by the other answers.
iii) The fitness crossover method is used for close answers so that the distribution of the answers is optimally adjusted and the answers are distributed uniformly in the search space.

Main components
In this section, we introduce the main components of the proposed algorithm and describe each of them.
1) The process of deleting and re-selecting Generally, the selection of parent members for the operation of the crossover operator occurs probabilistically. In other words, each member of the population with a specific probability of pc may be involved in the creation of a child member. Also, it is necessary to consider the following when choosing the parent particles. a) Because of the probability of selecting the parent members, a member of the population may be selected twice as a parent member. In other words, a certain member may have the role of both parents at the same time. In this case, the child member will be the same as his male parent. For this reason and to avoid unnecessary crossovers, a combination test should be used. b) Sometimes a member may have a role in creating a parent member several times. Alternatively, one member may be selected many times as a parent member. This is problematic when using the fitting pattern appropriately.
Before we introduce this component, we must first describe a comprehensive random selection method because ideas have been taken from this method.

2) Comprehensive random selection method
Using comprehensive random selection, it is possible to select members of the population based on their target function. In other words, the probability of chromosome selection is proportional to the value of the objective function of the chromosome. By this method, the time to find optimal solutions can be reduced. However, this method has its own disadvantages. For example, in the early generations, there is a tendency to dominate a number of superior chromosomes over the selection process while in the latter generations when the population converges completely, the competition between the chromosomes is not very serious but almost randomized. In the early generations, usually there are a 1.
Create an initial Population

2.
Calculation of fitness criteria

3.
Sorting the population based on dominance conditions

4.
Calculate the distance of crowding 5. Selection: As soon as the initial population is sorted according to the dominance conditions, the distance of the crowd will be calculated and the selection starts from the initial population. This selection is based on two elements: 5.1. POPULATION: Population is selected from lower ranks.

Calculation of crowding distance:
Assuming that p and q are two members of the same rank, that member is selected which has a greater crowding distance. It should be noted that the priority of the selection is first with the rank and then based on the distance of crowding.

6.
Performing of crossovers and mutations to produce new offspring.

7.
Composing the primary population with the population obtained from crossover and mutation.

8.
Replacing the parent population with the best members of the population integrated in the previous stages. In the first step, lower-ranking members replace older ones and are then ordered according to the crowding distance. Primary population and the population induced by crossover and mutation are first categorized by rank, and then, some of those ranked lower are eliminated.

9.
The remaining population is arranged according to the distance of crowding. Here the sorting is done in one front.
10. All stages are repeated until reaching the desired generation (or optimal conditions). lot of differences in fitting values. Hence, the likelihood of the presence of chromosomes with greater fittings is far higher. In the late generations, since the fittings of chromosomes are closely matched, choices are roughly random and the chances of choosing most of the chromosomes are equal. In this process, the proposed algorithm initially selects two parents to perform the crossover process similar to the general genetic algorithm. Parents are selected using binary tournament selection method. The goal here is to select the high-quality chromosomes immediately after the parents are selected. However, these two parents may not be the best of the population. The idea of a comprehensive random selection method is taken here. Here we have a control parameter for the substitution of the worst chromosome. The goal is to select the widest chromosomes each to carry out the crossover process. The value of this parameter in tests was 0.005. If the difference between the two selected parents exceeds the control parameter, the chromosome is worse than the crossover cycle and another parent is selected. The process of removing and re-replacing continues as long as the difference between the parents is less than the control parameter value. Parent comparison is based on a fitness function, and the parent who has a lower fitness value will be selected for the removal process. Fitness function is the sum of the functions f1 and f2. Figure 2 shows pseudo code of the process of deleting and re-selecting.

Participation of the best chromosome in different generations in the crossover process
The crossover process in the genetic algorithm creates children's chromosomes from parent chromosomes. A crossover operator is applied on one or more parent chromosomes at a time and creates one or more children. In practice, operators are defined in terms of the type of problem and are fully dependent on the ability of the analyst. The efficiency of these operators in giving the optimal solution varies from problem to problem. Some operators consider only one chromosome and based on their information create new chromosome. However, others do further operations on some or even all of the chromosomes in the population. In addition to choosing parent chromosome and the crossover process, the crossover operator takes into account an alternative policy so that after creating a child member, this one can replace the worst parent member. This type of replacement can be the source of the restriction that a child member should be better than parent member. Accordingly, the crossover operator must be executed so that the worst member in the population is replaced by the child member.
In this process, the proposed algorithm utilizes the position of the best population chromosome in the current generation to carry out the crossover process. The purpose of using this component is to produce new opportunities near to the global optimal one. In this component, we use the one-point crossover method, but with the difference that the position of the best chromosome in this process will be considered. In this component, first, the one point is selected along the parent chromosomes, and then first the parent and after that the best chromosome takes the first child's position. The second child is also produced in the same way, but with the difference that first the second parent genes and then the genes of the best chromosome make up the child's chromosome. Figure 3 shows the crossover method with the participation of the best chromosome.

The three-point mutation process
The goal of the mutation is to express a genetic property that increases the diversity of the population's responses. In three-point mutation method, as in the usual methods in the general genetic algorithm, a member of the population is randomly selected and entered into the mutation process. In this case, three points are chosen randomly along the chromosome. Then, using the uniformly continuous randomized mutation operator, these three points will be changed in a way that the values of the two points of the three selected points are modified by the pattern of the best current chromosome. Here, according to Figure 4, three genes are selected randomly along the chromosome. Then these three genes are modified using the mutation operator, but with the difference that the position of the best chromosome is involved in the mutation process. In fact, two randomly selected genes are modified by the pattern of the best chromosome, and the other gene changes in accordance with the random procedure.

Structure of chromosome in the proposed algorithm to detect community
The structure of a chromosome in the proposed algorithm, as a 1 × N vector, contains the N genes. The N genes in this structure represent the vertices in a graph ( Figure 5). The content of each gene in the chromosome represents the community number that its vertex belongs to. In this structure, nc is the number of communities in each chromosome structure the amount of which is variable in each structure. Therefore, the desired chromosome is an N element array with each element indicating a vertex in the graph and its content denotes the community number to which it belongs. An example is given below for further explanation. Suppose there is a graph with 5 vertices and 5 edges as shown in Figure 6.
Then a chromosome structure can be defined as shown in Figure 7.
In this case, the chromosome can be a solution to the problem of discovering the community in the graph with 5 vertices. Accordingly, the vertices 1 and 2 are in the first community and the vertices 3-5 are in the second.

Parameters
The parameters of the proposed algorithm were adjusted according to experiments with different values, and also by analyzing the researchers conducted in [21,33,[39][40][41][42]. Figure 8 shows the diagram of the results pertaining to 100 executions of the algorithm upon the 5 Kbp graph concerning the data set ESCs with regard to different values of crossover operation, mutation percentage, and initial population. The diagram depicts the value results of the evaluation function which is the sum of two functions f1 and f2 for the 100 executions. Accordingly, for any of the three parameters, four different values were examined. The results show that the best values for crossover percentage, mutation percentage, and initial population are 0.8, 0.3 and 50, respectively. As observed, the algorithm gives similar results close to the 100 iterations. Hence, in the proposed algorithm according to Table 1, the maximum iteration is equal to 100, the number of sub iteration is 30, the population size is 50, the crossover rate is 80%, and the mutation rate is 30%. Also, this algorithm chooses roulette wheels to select people for crossover and mutation operations. After the crossover and mutation operations, the children obtained from these operations are evaluated using the Pareto optimal frontier. After that, these children are merged with the previous population and 50 members that are better than the population are chosen as the new population. After 100 iterations, the best member of the population is considered as the answer to the problem in accordance with Pareto optimization, which consists of all community detected in the input graph (Figure 8).

Results and discussion
We evaluate our proposed method using three new benchmarks. These three benchmarks are the genomic interaction graphs namely, GM12878, CD34 + , and ESCs. In this section, the multiobjective optimization algorithm is used to find community in 10, 100, 500 kb, and 1 Mbp graphs resulting from interactions in the GM12878 and CD34 + blood cells and the 5 kb graph from the existing interactions in the Embryonic Stem Cells (ESCs) of mouse. Also, the efficiency of the proposed algorithm has been analyzed compared to multi-objective particle swarm optimization algorithm in community detection [5,32,40]. In Table 2, the detail information is provided for each of these graphs.

Computational complexity
In this section, our proposed algorithm, Multi Objective Genetic Algorithm Optimization Community Detection (MOGAOCD), is compared to Multi Objective Particle Swarm Optimization Community Detection (MOPSOCD) algorithm [32] from the viewpoints of CPU usage, RAM usage, and execution time. Here, the graphs are sorted increasingly according to the number of nodes as 5, 10, 100, 500 kb, and 1 Mbp. Both algorithms are run on a same HP server and in the same conditions according to Table 3 with the following specifications.

CPU usage
In this part, the CPU usage in the five graphs to detect the community in both algorithms are compared. Figure 9 shows the CPU usage in the MOGAOCD and MOPSOCD relative to execution time. As shown in this figure, the greater graph the less CPU usage due to longer execution time. Accordingly, both CPU usage and execution time are greater in the MOPSOCD algorithm compared to the MOGAOCD.
To provide more insight, the CPU usages in both algorithms for all graphs are demonstrated in Figure 10. As shown in this figure, the CPU usage in our proposed algorithm is also less in average compared to other algorithms.  Figure 10. Average number of CPU used, 3 CPU: In MOGAOCD and MOPSOCD to community detection in five genomic graphs.

RAM usage
In this part, the RAM usage in the five graphs to detect the community in both algorithms are compared. According to Figure 11, in general, more RAM is used when the graph is bigger. As a result, RAM usage is greater in the MOPSOCD algorithm than in the MOGAOCD.
To provide more insight, the RAM usages in both algorithms for all graphs are portrayed in Figure  12. As shown in this figure the RAM usage in our proposed algorithm in average is also less than that other algorithms. Figure 13 shows the execution times in the MOGAOCD and the MOPSOCD algorithms for five graphs. As observed, the execution times of all graphs in the MOGAOCD are shorter than those in the MOPSOCD which is a token of the superiority of the MOGAOCD algorithm over the other. This preference due to a shorter execution time is more apparent in bigger graphs of 10, 100 and 500 kb which are more computational complex in community detection.

Execution time
We next compare the number of CPU cores used in both algorithms in the 5kb graph. Figure 14 shows that the number of CPU cores used in both algorithms in this graph. According to this figure, the number of CPU cores used as well as the execution time in the MOPSOCD algorithm are greater than the MOGAOCD algorithm. Therefore, the MOGAOCD algorithm performs better than the MOPSOCD algorithm in number of CPU cores consumption and execution time.     Figure 15 shows the scalability of the proposed algorithm in each of the five graphs. In this experiment, the graphs are given to the system individually in five stages where the execution times are computed, respectively. As it is shown, at each stage, 20% of the graph enters the system and the resultant execution time is recorded. As illustrates in Figure 15, the system is able to achieve better execution times through MOGAOCD when the number of nodes in each graph is increased. Figure 15. Scalability of MOGAOCD to community detection in five genomic graphs.

The performance of the proposed method in community detection for GM12878 and CD34 + genomic graphs
In this section, the performance of the proposed algorithm using GM12878 and CD34 + graphs (in both graphs, the inter-genomic interactions are found at the same points of the genome) in the 10, 100, 500 kb and 1 Mb size fragmentation are investigated and analyzed. Here, our proposed algorithm (MOGAOCD) is compared and analyzed along with the MOPSOCD algorithm [32] based on three criteria namely, the number of community detected, modularity value, and the mean weight of the vertices. The aim in each benchmark is to maximize these three criteria. Here both algorithms are implemented in MATLAB. In the evolutionary algorithms, the result of a single run is usually not enough to conclude generality. Hence the algorithm is executed 100 times and the average is derived from the obtained results. In each run, for the archival collections, the values of the two objects (modularity, the mean weight of the vertices) have been calculated. The 10 kbp graph contains 3715 vertices and 2117 edges. Here, the results of the implementation of the proposed algorithm and the Pareto-based comparison algorithm are depicted in order to optimize the two objectives. Also, the average results of the two target values in each run are also given. In order to display the results for two purposes, a two-dimensional diagram is considered, each dimension of which represents the amount of a target.
In Figure 16, the solutions produced by the MOGAOCD and the MOPSOCD algorithms are shown in accordance with the Pareto front. The diagram consists of a number of red and blue points. The red dots represent the Pareto front solutions generated by the MOGAOCD and the blue dot, representing the Pareto-particle algorithm solutions in the MOPSOCD. As shown in Figure 16, the red dot contains the best responses as it has the highest modularity and the average weight of the vertices. By viewing the position of each solution, including red and blue points, the modularity value and the average weights of vertices in each solution can be observed. Figure 16 illustrates the preference of the MOGAOCD in community detection in the 10 kb graph over the MOPSOCD.   As shown in Figure 17, the red dot contains the best responses, since it has the highest modularity and the average weight of the vertices. By viewing the position of each solution, including red and blue points, the modularity value and the average weights of vertices in each solution are observable. The diagram in the Figure 18, illustrates the promising performance of the MOGAOCD in community detection in a 100 kb graph.
In Figure 18, solutions generated by the MOGAOCD and the MOPSOCD algorithms in 500 kb graph are shown. In accordance with Figure 11, the red dot represents the Pareto front solutions via MOGAOCD, and the blue points represent the Pareto front solutions in the MOPSOCD. As the generated diagram shows, the red dot contains the best responses since it has the highest modularity and the average weight of the vertices. By viewing the position of each solution including red and blue points, the modularity value and the average weights of the vertices in each solution can be observed. Note that some of the points (solutions) overlap each other which indicated the proximity of their values. Figure 19 shows the solutions generated by the MOGAOCD and MOPSOCD algorithms in 1 Mbp graph. According to this figure, the red dots indicate the solutions of the Pareto front created by the MOGAOCD and the blue points, representing the Pareto front in the MOPSOCD algorithm. This figure depicts the optimization of the Pareto front for both algorithms. According to Figure 19, MOGAOCD is able to perform better than MOPSOCD algorithm. As shown in this figure, MOGAOCD and MOPSOCD algorithm for the values f1 = [0, 0.6] have the same functionality. However, for the values f1 = [0. 6,1] the proposed algorithm has better performance than MOPSOCD in discovering the optimal solutions. Therefore, in general, the performance of the MOGAOCD algorithm is better than the MOPSOCD algorithm in community detect in the 1 Mbp graph. Figure 19. The Pareto front diagram in two MOGAOCD and MOPSOCD algorithms on 1 Mbp graph to detect community.

The performance of the proposed method in community detection on the ESCs genomic graph
The Figure 20 shows all the community detected by the MOGAOCD and MOPSOCD algorithms in the 5 kb graph. The graph has 333 vertices and 202 edges. According to the figure, the red points represent the solutions of the Pareto front produced by the MOGAOCD and the blue points represent the solutions of the MOPSOCD algorithm. According to the Figure 20, the proposed algorithm is able to perform better than the MOPSOCD algorithm. As seen, the MOGAOCD and MOPSOCD algorithms for the values f1 = [0, 0.3] have the same functionality. However, for the values of f1 = [0. 3,1] the MOGAOCD algorithm has better performance than MOPSOCD in discovering optimal solutions. Therefore, it can be concluded that the performance of the MOGAOCD algorithm is better than the MOPSOCD algorithm in community detection in the 5 kb graph.

Analysis of results in community detection
In this section, the average results obtained from 100 implementations of MOGAOCD and MOPSOCD algorithms are analyzed on five benchmarks (5, 10, 100, 500 kb, 1 Mb) according to the three criteria of f1, f2, and the number of communities detected. According to Table 4, the MOGAOCD algorithm in 5 and 10 kb graphs in all three criteria has a better performance than the MOPSOCD algorithm.
Meanwhile, the MOGAOCD algorithm in the 100, 500 kb, and 1 Mb graphs in accordance with the three evaluation criteria yields a slightly better performance than the MOPSOCD algorithm. Also, in many implementations, according to the results represented in the previous Sections, the MOGAOCD algorithm demonstrates better results compared to the MOPSOCD algorithm. As a result, it can be concluded that the MOGAOCD in graphs with smaller-size fragmentation has a better performance than genomic graphs with larger-size fragmentation in community detection.

Innovation of research
This research deals with the current unsolved challenge in Genetics, that is, community detection in the genomic graph arisen from the inter-genome interactions. In view of that, we presented a Paretobased genetic multi-objective algorithm. In the genomic graphs, nodes, edges and weights are respectively regions of genome, interactions between nodes, and the number of interactions. The related challenge is that the number of communities is not known in advance, with the corresponding graph having no definite topology. Also, there should be graph regions in the detected community with maximum weights, namely, the most interactions. This means that detection of the community hinges on the edge's weights. In these conditions, an algorithm that is capable of detecting the community when the nodes have the greatest weights is required. Thus, the weights of the edges between the nodes are put in the community. The present article offers a bi-objective heuristic algorithm based on genetics to solve the problem by detecting the community in five genomic graphs using two objective functions f1 and f2. In the following, benefits and drawback of proposed algorithm is described.
Benefits of proposed algorithm: (1) Consideration of objectives in decision of a solution.
(2) Optimization operations to decide the best solution. (3) Detecting of community without knowing the number of communities at first, and taking in to account the sum of weights of edges between the nodes. (4) Helping the science of Genetics to detect and treat diseases by detecting genomic communities which interact strongly. We believe the drawback of our method is that despite better performance, it still suffers from high computational and time complexity and further improvement is required.

Conclusion
Transcriptional regulatory elements can target protein coding and non-coding genes in different genomic distances through chromatin interactions. Chromosome conformation capture technique (Hi-C) enables researchers to study the three-dimensional (3D) conformation of chromosomes in the cell nucleus and identify such regulatory interacting regions. Here, we proposed MOGAOCD as a new algorithm for community detection in chromosome conformation capture (Hi-C) data. MOGAOCD is able to identify sets of genomic interacting regions from Hi-C data, acting as a co-interaction regions. This would to study spatially colocalized genomic regions that are functionally relevant. Identified clusters by MOGAOCD share transcription factors and are enriched for transcriptional machinery, suggesting that chromosome intermingling regions play a key role in genome regulation. Our method provides a unique quantitative framework that can be broadly applied on chromosome conformation capture from different cells/tissues.