Active collaboration in relative observation for multi-agent visual simultaneous localization and mapping based on Deep Q Network

This article proposes a unique active relative localization mechanism for multi-agent simultaneous localization and mapping, in which an agent to be observed is considered as a task, and the others who want to assist that agent will perform that task by relative observation. A task allocation algorithm based on deep reinforcement learning is proposed for this mechanism. Each agent can choose whether to localize other agents or to continue independent simultaneous localization and mapping on its own initiative. By this way, the process of each agent simultaneous localization and mapping will be interacted by the collaboration. Firstly, a unique observation function which models the whole multi-agent system is obtained based on ORBSLAM. Secondly, a novel type of Deep Q Network called multi-agent systemDeep Q Network (MAS-DQN) is deployed to learn correspondence between Q value and state–action pair, abstract representation of agents in multi-agent system is learned in the process of collaboration among agents. Finally, each agent must act with a certain degree of freedom according to MAS-DQN. The simulation results of comparative experiments prove that this mechanism improves the efficiency of cooperation in the process of multi-agent simultaneous localization and mapping.


Introduction
A critical task of a mobile robot is to determine its pose (position and orientation) under the given environment map, which is also the basis of other type of tasks. However, the environment map does not exist at the beginning. When the mobile robots enter an unknown part of environment, they need to construct the map (3-D point cloud map, topological map, 2-D map) to satisfy their tasks through its own sensors, and at the same time determine their own localization in the map, this is called simultaneous localization and mapping (SLAM) problem. Mapping and localization are mutually complementary. In order to build accurate maps, it is necessary for the robot to have an accurate estimation of its pose, vice versa, localization also requires established high-quality maps, it is the difficulty of SLAM. Visual SLAM refers to the SLAM that only the camera acts as the only visual sensor. Compared with the traditional laser, the visual sensors can obtain more abundant information and can be deployed on mobile robots at very low cost. Multi-agent system (MAS) refers to that agents in one system collaborate and negotiate with others to accomplish goals that cannot be accomplished by a single agent or to increase the efficiency of task execution. Each agent's ability, attributes, and structure are different, which provides a great space for collaboration. MAS has attracted many researchers to investigate the applicability of it in many pertinent areas closely related to human livelihood and industry such as security system, 1 exploration and rescue, 2 surveillance, 3 humanitarian assistance, 4 environment protection, 5,6 and health care. 7,8 Traditional SLAM has been considered as an independent behavior. With the continuous application of MAS, multi-agent SLAM has become the focus of scholars' research.
Multi-agent SLAM can be divided into two types: offline mode and online mode. Off-line mode means that each agent's SLAM process is independent and the results are fused together after all agents stop. Online mode refers to that agents can cooperate with each other in SLAM process, such as cooperative loop closure detection and cooperative localization, cooperative map fusion, and so on. Relative orientations in MAS are important on account of that it is a basis for combining maps built with data from two agents. The difficulty of online multi-agent SLAM is how to cooperate in the best way under the condition that each agent has different attributes with different states. This issue is particularly prominent in the relative localization between robots. For instance, the localization ability of wheeled robot is stronger than that of unmanned aerial vehicle (UAV) because of the odometer, but the pose of UAV is more abundant, which leads to its strong mapping ability. If there are these two kinds of robots in the system at the same time, an appropriate collaboration model will make full use of the advantages of both. 9 This type of relative localization is controlled manually. If MAS wants to optimize the efficiency of automatic cooperative positioning between agents, each agent needs to perceive all the states and their attributes related to SLAM to determine whether it has the optimal condition to observe the agent in need of assistance or not. But in most cases, the taskrelated attributes and states of robots are not quantifiable which brings difficulties to collaboration.
In order to solve the above problem of SLAM of agents in MAS, this article proposes a cooperative mechanism applied to multi-agent SLAM, which can learn the main attributes related to each agent and extracts the state feature vectors of all agents in MAS based on ORBSLAM, 10 each agent will determine its specific behavior outputted by MAS-DQN. In the "Experiment" section, we use model of isomorphic robots with different attributes to simulate various robots, and the validity of our algorithm is proved.
The article is organized as follows: in the second section, related work about coalition formation for MAS and multi-agent SLAM is discussed. In the third section, the DQN and "Rainbow" which we applied to drive MAS-DQN are introduced. In the fourth section, the structure of MAS-DQN including the novel observation function and reward function is described. In the fifth section, we introduce relative observation between agents based on the proposed mechanism. In the sixth section, the effectiveness of the algorithm is discussed. In the seventh section, the complete algorithm of the approach is shown.

Related work
The relative localization among agents in MAS refers to the location of an unknown agent whose pose is unknown using others limited knowledge and ability. This process is mainly composed of two steps: first, select the agents that can provide location-based services. These agents can be determined by a general control center or freely negotiated by the agents according to their own demands, interests, and attributes. This step mainly involves multi-agent task allocation (MRTA). In the second step, the selected agent uses its own pose and different types of map such as graphbased map, 11,12 landmarks with covariance matrix, 13 and grid-based map 14,15 to measure or observe the pose of the agent to be localized. Next, we will introduce the related work of these two points separately.
MRTA has always been a hot topic in MAS research. Task allocation in MAS (MRTA) can be formulated as an optimal assignment problem where the goal is to optimally assign a set of agents to a set of tasks in such a way that optimizes the overall system performance under a condition of satisfying a set of constraints. According to team organizational paradigm, MRTA can be divided into two categories: centralized approaches and distributed approaches.
There is a central agent that monitors and assigns tasks to other agents in a centralized MAS. Agents who are assigned tasks need to send all the information they obtain to that central agent. They need to keep communication with each other at all times. The center-based algorithm is very general for solving the MRTA problem as a result that monitors can directly control each agent. 16 Coltin and Veloso 17 proposed centralized approaches for mobile robot task allocation in hybrid wireless sensor networks. In the literature, 18 a centralized multi-robot task allocation for industrial plant inspection by using A* and genetic algorithm is introduced. Higuera and Dudek 19 proposed a fair division-based MRTA approach as another centralized algorithm to handle the problem of allocating a single global task among heterogeneous robots. In order to solve the problem of curse of dimension caused by multi-agent reinforcement learning, Wadhwania et al. got inspiration from distillation and value-matching and proposed a new actor-critic algorithm and method for combining knowledge from agents with the same structure. 20 In configuration of distributed approaches, agents do not need to report to a central agent, and they do not need to be online at all times. They can freely communicate with other agents and assign tasks through negotiation with different demands. This type of algorithm is also widely used to solve MRTA problem. Farinelli et al. proposed a distributed online dynamic task assignment for multi-robot patrolling. 21 Luo et al. 22 proposed distributed algorithm for constrained multi-robot task assignment for grouped tasks. Various ambient assisted living technologies have been proposed for MRTA in Healthcare Facilities. 23 Emotional robots model use artificial emotions to endow the robot with emotional factors, which makes the robot more intelligent and adjust its behavior choice through the emotional mechanism. The introduction of emotional and personal factors improves the diversity and autonomy of robots. 24 Jonathan How's team focused on using consensus-based auction algorithms to solve decentralized task assignments 25,26 and utilized game theory to prove that the consensus-based bundle algorithm converges to a pure strategy Nash equilibrium. 27 Johnson et al. used implicit coordination to satisfy allocation constraints. 28 Some relative localization algorithms for MAS use some prior information such as initial positions of the agents and exact positions of landmarks. 29 Shames et al. 30 presented the theoretical basis of relative localization between agents by using prior knowledge in maps. Zhou and Roumeliotis 31 proposed a mechanism for relative localization in which agents try to observe each other in-time if they have a chance with unknown initial correspondence. Zkucur and Levent 32 establishes the relationship between relative observation and map fusion. However, in most cases, agents cannot get enough prior knowledge for relative location. Some statistical estimation algorithms, such as Kalman filtering, 33 particle filtering, 34 Markov localization perform better in this case. 35 Most of the relative observation approaches serve for map fusion. In the mechanism proposed by this article, the data from relative observations between agents are also applied for optimizing the pose graph of the observed agents which improves the efficiency of data utilization.

DQN and rainbow
Q-learning is an off-policy learning method in reinforcement learning. It uses Q-table to store the cumulative value of each state-action pair. According to Bellman equation, when the strategy of maximizing Q value is adopted in every step, the Qðs t ; a t Þ can be calculated as equation (1) Qðs t ; a t Þ ¼ r t þ g Ã ðargmax a 0 Qðs tþ1 ; a 0 ÞÞ ð1Þ where Qðs t ; a t Þ refers to the cumulative discounted reward when agent's action is a t at state s t . s t is the state on time t and s tþ1 is the state on time t þ 1, a t is the action adapted by agent at time t. The state of the system is transformed to s t þ 1 from s t for the action a t taken by agent.r t is the immediate reward for the transition from s t to s tþ1 . g is discounted factor. When the state and action space are high-dimensional or continuous, the maintenance of Qtable Qðs; aÞ will become unrealistic. Therefore, with the problem of updating Q-table transformed into a problem of fitting function, the basic idea of DQN is to use the neural network to make the Q function Q q ðs; aÞ approach the optimal Q value by updating the parameter q which refers to all weights of the deep neural network. The specific sequence of DQN 36 is shown in Algorithm 1. 1 where x is the state of the system that has not been purified by observation function and is the observation function, M and t respectively represent the number of iterations and the number of training samples. However, the combination of deep learning (DL) and reinforcement learning (RL) will inevitably lead to some problems, such as: 1. DL needs a large number of labeled samples for supervised learning; RL only has the reward of state-action pairs as return value. 2. The sample of DL is independent, and the state of RL is correlative. 3. The distribution of DL targets is fixed; the distribution of RL is always changing, that is, it is difficult to reproduce the situation that was trained before. 4. Previous studies have shown that the use of nonlinear networks to represent value functions is unstable.
This article used "Rainbow" to build the training model of MAS-DQN. Rainbow 37 is an integrated DQN. It mainly integrates the following algorithms to try to solve the above problems: 1. Double Q-Learning constructs two neural networks with the same structure but different parameters: behavior network and target network. When updating the model, the best action at t þ 1 time is selected by behavior network, then the target value of the optimal action at t þ 1 time is estimated by target network. By decoupling these two steps, it can reduce the impact of Q-Learning method on overestimation of value. 2. Priority replay buffer makes the model choose more samples that can improve the model to use the replay memory more efficiently. 3. Dueling DQN decomposes the value function into two parts, making the model easier to train and expressing more valuable information. 4. Distributional DQN changes the output of the model from computational value expectation to computational value distribution in order to let the model can learn more valuable information. 5. Noisy DQN adds some exploratory ability to the model by adding noise to the parameters, which is more controllable. 6. In order to overcome the weakness of slow learning speed in Q-learning earlier stage, multistep learning uses more steps of reward, the target value can be estimated more accurately in the early stage of training, thus speeding up the training speed.
Rainbow model integrates the advantages of the above models which proves the possibility of integrating the above features successfully.

Observation function for MAS-DQN based on ORBSLAM
In deep RL, the local information obtained by an agent is called observation. Unlike previous deep RL algorithms, the observation function of the proposed mechanism models all agents and determines the behavior of each agent at the same time instead of for only one agent. In this section, ORBSLAM feature vector and output of the observation function are introduced.
ORBSLAM feature vector. ORBSLAM 10 is one of the most commonly used system among modern visual SLAM frameworks. It runs in three threads: In tracking thread, the system tracks motion of the platform by using tracked feature points; local mapping thread is used for constructing global map and optimizing local map, loop closure thread takes responsibility for detecting loop when agent revisited to an already mapped area, which can prevent increase of the accumulative errors.
ORBSLAM is deployed on each agent independently to simultaneously construct the map. In our system, the state of the environment consists of all pose estimation states of agents which are mainly determined by three items: map points, key frames, and loop closure detection. Some variables related to the three items are used to compose the state vector of the agent called ORBSLAM feature vector whose members are introduced as follow: 1. The first of component of the ORBSLAM feature vector is a variable related to the map point. The map point is constructed from the matched point features. We include the number of map points observed from current frame in feature vector. 2. The second of component of the feature vector is a variable related to the key frames which are the selected frames according to strict requirements to ensure that they contain sufficient map information.
Some key frames proved to contain insufficiently accurate map information will be gradually removed. In order to estimate the content of map information at nearby locations which play a decisive role in pose estimation, the number of the generated new key frames and that of the eliminated old key frames from our last sampling time in the state vector are included in the state vector. 3. The third of component of the feature vector is a variable related to the result of loop closure. Loop closure detection is the problem that a mobile agent determines whether itself has returned to a previously visited location. If a closed-loop is detected, it will impose a strong constraint on global optimization. In feature vector, we add the time interval of the frame from last loop closure detected to measure the impact of loop closure to the current frame.
4. Another critical factor is the collection of distances from the agent to other agents. The longer the distance from the agent to target agent is, the bigger difficult to it to assist the target. The distance is calculated by D* algorithm. The characteristic of D* algorithm is that when the terrain between the target point and the source point is known, the shortest path and the shortest distance can be obtained. If not fully known, the shortest path in the known terrain and the estimated distance between the target point and the source point can be output. An ORBSLAM feature vector contains m À 1 (m is the number of agents) such distances, representing the distance between the agent and the other m À 1 agents, respectively.
An ordered arrangement of all the above variables is an ORBSLAM feature vector. On the next subsubsection, the observation function base on the ORBSLAM feature vector will be introduced.
The observation function. The output of the observation function represents the perception of the overall SLAM state of the whole MAS. The ORBSLAM feature vectors of all agents at the same time form an observation frame. Considering the continuity of the SLAM process, the previous states should be taken into count when the system makes macrodecision for each agent, so that the output of the observation function should include the current frame and the fixed number of preceding frames. These frames consist of an observation frame set, which eventually flattens into a vector as the output of the observation function. The specific steps of the observation function are shown in Algorithm 2.
While the parameter n is the number of observation frames, m is the total number of agents in the system, and Oði; jÞ is the ORBSLAM2 feature vector of agent j at time i.

Reward function
The reward function is generally used to reflect the immediate benefit of an agent reaching a new state through action. In our proposed algorithm, the transition error is used to measure the reward, which is shown in the equation (3) loss ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ðx 0 À xÞ 2 þ ðy 0 À yÞ 2 þ ðz 0 À zÞ 2 q ð3Þ In order to embody the collaboration between agents, the reward function also adds the influence of agent's target on his decision-making. The final reward function of agent is as equation (4) rðiÞ ¼ ÀðlossðiÞ þ eð1 À Þðloss k ðtÞ À loss kÀ1 ðtÞÞÞ ð4Þ where loss k ðiÞ represents the loss of agent i at time k. ðloss k ðtÞ À loss kÀ1 ðtÞÞ denotes the contribution of agent i to his target t to assist. As the selfishness of the agent, is used to control the frequency of agent reorganization, e depends on the result of the assistance, and its value, such as the equation (5), is discussed in detail in the next subsection e ¼ 1 When the target agent is successfully assisted by this agent e ¼ 0 When the target is itself or the position for collaboration is not reached e ¼ À1 When the collaboration is failed The output formation of the proposed DQN There are m Ã ðm þ 1Þ neurons in the output layer, where m refers to the number of agents, and the output of ði À 1Þ Ã ðm þ 1Þ þ j th neurons represents the cumulative reward value of the assistant behavior to the jth agent of the ith agent learned by DQN. The jth is 0 means that no action will be taken for the agent which is waiting for the assistance of other agents. SLAM is carried out independently for agent i when j ¼ i. Each agent will be given an order with the maximum cumulative reward argmax a Qðs; aÞ with a certain probability. How agents can assist other agents will be described in the next section. The structure of the whole neural network is shown in Figure 1.

Relative observation between agents in the proposed mechanism
In the proposed mechanism, agents execute the order given by MAS-DQN through mutual observation with each others. As shown in Figure 2, red agents are assigned to assist blue agents. First, they shall reach the predetermined observation position. If all the predetermined positions cannot be reached, the observation will be judged as a failure, agent obtains the pose of the target through the relative localization of the target agent. The obtained position and posture will provide a strong constraint for the target agent to optimize the pose graph. In another word, it is used to improve the accuracy of correcting pose. This observation actually has the same effect as the loop closure detection. The way of calculating the pose of the target agent is introduced consequently. Assume that agent i observes a feature point p ij of target agent j whose coordinate in agent i's camera coordinate system is p ij c , then the coordinate p ij w in the world coordinate frame based on agent i's pose can be obtained by equation (6) where R i refers to agent i's rotation matrix which converts camera coordinates into world coordinates and t i refers to translation matrix. R and t together constitute the pose of robot camera. The world coordinates of one feature point are the consistent, the relationship between the world coordinates of point p and the coordinates of point p in the camera coordinate system of agent j is shown in equation (7).
where R j and T j is the pose of agent j, p 0 c is the coordinate of feature point p in agent j's 3-D model. Because there may be multiple agents observing agent j at the same time, p w may have multiple values, which makes it impossible to  obtain the pose of agent j directly. Nonlinear optimization method can be used to continuously approximate the true value of pose. Firstly, the error of pose estimation is defined as equation (8) where p ij k is the coordinate of Kth point of agent j observed by agent i. R 0 , t 0 is the estimated value of agent j 0 s pose matrix at this moment. R 0 ,t 0 can be expressed by Lie algebra expðxÞ, then the optimization of the target function can be expressed as The derivative of error term with respect to pose can be obtained by using Lie algebraic perturbation model, and the value of R 0 ,t 0 can be obtained iteratively when e is the minimum value by using nonlinear optimization methods such as the gradient descent method. There are different solutions to solve this problem, such as gradient descent algorithm, Gauss Newton algorithm, Levenberg-Marquardt algorithm, and so on. In this article, Gauss Newton algorithm is adopted. Its time complexity is O (TM), where T is the total number of iterations, and M is the total number of error terms.

Effectiveness proof of the proposed algorithm
In this part, we will prove the effectiveness of our algorithm. That is to give the demonstration that after experience replay, the proposed DQN can make macro-decision to improve the efficiency of the whole MAS for each agent after training. We use the utility theory to model the performance of the whole MAS. As shown in the previous section, the effect of SLAM of a single agent x can be regards the immediate utility value of x which is associated with lossðxÞ U ðxÞ ¼ ÀlossðxÞ ð 10Þ The expectation of the immediate utility value of a noncooperative MAS is as follows The calculation of that in collaborative MAS is relatively complex since the effect of collaboration between agents on the overall system utility value should be considered, we use a graph to represent the cooperation of the whole system. An agent will be regard as a node in the graph and a directed edge from the itself to the target will be established as shown in Figure 3. If the collaboration is not executed at that time, the edge is the solid line. If the collaboration is just completed, the edge is a double-line edge. If the collaboration fails, the line is a dashed line, and if there is no collaboration object, the arrow points to itself.
The whole MAS can be divided into several maximally connected subgraphs, as shown in the Figure 3, where one subgraph is a group which can represent the collaboration relationship between agents. The total utility value of a group U ðgÞ be represented as equation (12) uðgÞ x2g ¼ ðuðxÞ þ cðx; x 0 ÞÞ ð12Þ where x 0 is the target of x and Cðx; x 0 Þ is the effect of assistance ðloss k ðtÞ À loss kÀ1 ðtÞÞ. If x belongs to g, then x must not belong to any subgraph of G À g.The utility of MAS is represented as equation (13) EðuðX ÞÞ ¼ S g2G U ðgÞ jxj ¼ S x2X rðxÞ jX j ð13Þ In summary, to calculate the cumulative reward of all agents is to calculate the total cumulative utility value. MAS-DQN will gradually learn the cumulative utility function U ðS; AðX ÞÞ. Where S and AðX Þ respectively refer to the state space and action space of MAS.
The last layer of MAS-DQN is the decision-making layer. All the layers before the decision layer are called feature extraction layer. The feature extraction layer is actually a high-level abstraction of ORBSLAM feature vector. The weights connected with the decision-making level determine the discrepancy of different decisionmaking of agents. That is, in the face of the same extracted high-level characteristics, different action produce different cumulative reward. This situation is caused by the different attributes between agents, which DQN learns through the cooperation between agents.

Sequences of the proposed mechanism
The implementation steps of the proposed overall cooperation mechanism for multi-agent SLAM are shown in Algorithm 3. The system consists of mobile agents and organizer which MAS-DQN is deployed on. The following is a detailed explanation of this algorithm: Initialize the whole algorithm and all variables (01). Organizer broadcasts the requirements for synchronizing agent information and waits for the reply of all the agents (03-04). Each agent respond organizer with their world coordinates, built local maps, ORBSLAM feature vectors, and assistance status (05-07). The local maps are not the final form of maps. They are only 2-D maps that provide information for agent global navigation. After the organizer obtains the information returned by all agents (08), the first thing to do for it is to merge the local maps of all agents and feedback the global maps to agents (09-10). The second mission is to get the output of the observation function according to the set of ORBSLAM feature vector of all agents (11). Next, the immediate reward of each agent is calculated (12).
Finally, the output is imported to MAS-DQN to get the action of each agent which will be broadcasted (13)(14). T ð tÀ1 ; a tÀ1 ; r t ; t Þ is added to experience reply buffer (15)(16). After obtaining enough sample information, MAS-DQN will be trained (17)(18)(19)(20)(21). Agent x decode its own goals from the organizer's instructions a (23). Agent x's target means is the agent which it assist, if the target of it is itself x, x will SLAM independently (24)(25). If x is not assigned a target, it will stop to wait for the assistance (26)(27). If x get one target, it will follow the calculated path to approach his intended location and update his assistance status . Lif e is a constant, which is set up to prevent assistance from going on indefinitely (30)(31)(32)(33)(34)(35)(36). This loop will continue until time runs out (02-57).

Experiment
The simulation experiments introduced in this section is to evaluate the performance of the proposed mechanism and MAS-DQN with different parameters. The experiment was implemented on the Robot Operating System (ROS) platform which can be used to simulate the real physical environment. We build one simulation mobile agent with the "telobot" model which equips "Kinect" as vision sensor. The initial pose of each robot is known and ORBSLAM is deployed independently on each robot. The experimental world is shown as Figure 4.
The main indicators to measure the effect of SLAM in MAS are translation root mean square error (RMSE), orientation RMSE of the tracks of agents. Although agents in experimental environments have the same model, we can embody the differences of their attributes by changing their ability values related to motion and vision sampling. Their maximum angular acceleration, linear acceleration, maximum angular velocity, and maximum linear velocity obey  normal distribution N ð; s 1 Þ, in which s 1 is used to control the discreteness of the ability values of agents, so that the performance of the algorithms can be tested when the value of agents' attributes vary with different degrees. When it comes to visual sensors, we add noise that obeys the normal distribution N ð; s 2 Þ to the images captured by visual sensor, and the variance of the camera noise of different agents s 2 still obeys the normal distribution N ð1; s 1 Þ.
We The proposed algorithm reduce translation RMSE (m) by 35.68% orientation RMSE ( ) by 32.02% compared with the mechanism without coordination We use these two different mechanisms to generate octomaps 38 of the same environment, which are shown in Figures 6 and 7. Experiment 2: The main purpose of this experiment is to compare mechanism that agents collaborate with each other without considering attributes and states. The   Experiment 5: This experiment mainly compares our algorithm and the multi-agent SLAM algorithm called MR-vSLAM which only uses the prior information of map to collaborate in the aspect of localization. 40 The experimental results are shown in the Figure 10, which illustrate that the proposed algorithm reduces translation RMSE by 7.81% and orientation RMSE by 13.80%. Table 3 compares the computational costs of several algorithms mentioned in this article. Where the n is the number of agent, k is the number of natural landmark, L is the number of hidden layers of the MAS-DQN, C l is the number of neurons of the lth hidden layer.

Conclusion
This article proposes a novel centralized multi-agent cooperative SLAM mechanism with considering the status and attributes of agents. The algorithm applies a new observation function based on ORBSLAM to apperceive the SLAM state of the whole MAS, a rainbow-based deep RL framework called MAS-DQN is designed to learn the overall utility function U ðS; AðxÞÞ of the system, and the effectiveness of the framework is proved. The simulation results show MAS-DQN reduces transition RMSE (m) by 13.00%, 13.06%, and 7.81% compared with emotional recruitment model, auction model, and MR-vSLAM, the orientation RMSE ( ) is also reduced by 24.42%, 29.64%, and 13.80% which is considered as great improvement on recently researched approaches. This algorithm inescapable brings huge computational cost to organizer, which makes the real time of the system unable to be guaranteed. In the future, we will focus on distributed learning system to reduce the burden.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a special fund project of Harbin Science and Technology Innovation Talents Research (grant no. RC2013XK010002) and the National Natural Science Foundation of China (grant no. 61375081).