Mask R-CNN with data augmentation for food detection and recognition

—In this paper, we focus on simple data-driven approach to solve deep learning based on implementing the Mask R-CNN module by analyzing deeper manipulation of datasets. We ﬁrstly approach to afﬁne transformation and projective representation to data augmentation analysis in order to increasing large-scale data manually based on the state-of-the-art in views of computer vision. Then we evaluate our method concretely by connection our datasets by visualization data and completely in testing to many methods to understand intelligent data analysis in object detection and segmentations by using more than 5000 image according to many similar objects. As far as, it illustrated efﬁciency of small applications such as food recognition, grasp and manipulation in robotics


I. INTRODUCTION
The industry of fruit and food [3] is currently the potential applications in nowadays while people are pay attention on health problems, such as food delivery system, logistics, food health and safety, and agriculture Robotics [7] based on implementing the dexterous grasp and robust manipulation [8], [9].Since, the food processing is recently concentrated by organizers or non-organizers based on marketing systems.By fast going deep learning based on computer vision given the robust solutions on image processing and segmentation, there are rapidly moving many research activities on food recognition by implementing the object detection and segmentation.All most the research here are unspecified really the food applications.
There are many food datasets existing as open source such as Food-101 Mining Discriminative Components with Random Forests [12], UEC FOOD 256 [13], and Dataset UPMC Food-101 [14].But it is unpracticed for many engineer and researcher to implement.And there unfortunately is not easy to use it during training for food recognition [11].The huge datasets will not ideal to apply specific domain.Hence, it is necessary to handle private data science.
Without taken many time for handle datasets, the transfer learning were are standardized for many applications using deep learning, which is essential to supervised to maximize the hierarchical representation to learn the targets.
One of the challenges on data science and machine learning was the convergence which is difficult to understand and interpretation on deep learning [10].The visualization of neural network and deep learning which are challenge 1 is with University of Bordeaux, Bordeaux, France.
than.ld@ieee.orgfor all most the scientists, whom will be dealing the the convergence and divergence problems.
In this paper we are represented the data augmentation in implementing the deep learning approaches for visualizing the data science.

II. RELATED WORK
In this section, we explore the concepts based on points, lines and conics.On the other side, we will also understand both the transformations and invariants in hybrid approach.
In order to removing projective distortion, we need to select the four points in a plane with known coordinates, and it can be represented by: Similarly, with y can define Fig. 1: [21] Mask R-CNN Architecture.
From equations 1 and 2, we can write equals:

A. Homogeneous Coordinates
Usually, the defining the datasets it is necessary to understand the principles of points, lines,and plane in geometry due to we support to all polygon for pre-processing of datasets.In this section, we address the fundamental concepts of geometry with 2D dimension.Home Homogeneous Coordinates representation of lines will be formed by or simply we can write a, b, c T .There is a constrain that the point x must lies on the line, formalizes x = (x, y) T  The homogeneous vectors The most information thing is we would like to know how the intersection between two lines.
To define the conic we should have equal of power 2.
According to parameters of this function, there exists five degree of freedom.And it can be simplify to basic formula by homogenizing x T Cx = 0 (6) or convert it to where C equals To define the a conic, there is must given by five points as constrains to x 4 y 4 1 x 2 5 x 5 y 5 y 2 5 x 5 y 5 1

III. THE HIERARCHY OF PROJECTIVE GEOMETRY AND TRANSFORMATIONS OF 2D
The hierarchical transformations is as the projective linear group with modifying the affine group.For instance, the last row of matrix will be followed to [0, 0, 1] The isometries, the matrices are defined by following: Where ε equals to ±1, it is the orientation preserving when ε = 1, otherwise is the orientation reversing.
From equation 10, it can be simplifying shortly In this case, the matrix rotation R or R T RequalsI In specifically, there are 3 degree of freedom, involving one rotation and two translations.In special cases, it will only contain the pure rotation and translation.There is constraints of properties in length, angle, and area that it required to invariants.
The second transformation is the similarities, where it allows to isometry and scale respectively.We can formalize like this: or equal generally 13.
It is probably four DOF by adding one more parameter comparing with the Isometries, namely one scale is plus.other name is known as shape preserving.Importantly, the is acquired the fixed parameterization such as ratios of length, angle, ratios of areas, parallel lines based invariants.

A. Projective Transformations
A projectivity is an invertible mapping h from (P 2) to it self such that three points

B. Representations of Affine Transformations
Next, we will explore the affine transformations that make the shape can be able to rotation and deformation.Therefor, it required to six-DOF by representing the two rotations, two translation, and two scales.
Similarly, it can be written by following: Which A chain of matrices products, and defined: where

C. Projective Transformation
In some case, your pre-processing of training phases, it's also essential to define the infinity, where it is out of indexed point of images.
The projective transformations will have with 8-DOF by describing two more parameters.That is two line at infinity.It is called the action non-homogeneous over plane.
in where, the parameter v is declared: The invariants will be contained in the cross-ratio of your points on a line.
Line at infinity stays at infinity, but points move along line.
It is easily to calculate Line at infinity becomes finite, that allows to observe vanishing points, horizon.
we can decompose the projective transformations Where A can be generalized by: in case, K is an upper-triangular, and detK = 1

D. Infinity Geometry Transformation
There is also necessary to define the line at infinite, where it is useful in practices where somehow existing the exceptions of error out of index during the preprocessing image.
In previous subsection, (x 1 , x 2 T ), there is three degrees of freedom by following matrix: Where x 2 = 0, therefore, the cross ratio will define by: From definition l ∞ as fixed line under the projective H transformation, we can represent by: And the Projective metric is used by: IV. MASK R-CNN Mask R-CNN was generalized from Faster RCNN • Network Architecture (Convolutional Backbone): It is used the network depth feature based on standardized convolutional neural network, namely: ResNet50 and ResNet101.More detail is described at [5].It's used to provide the feature extraction.At the beginner, the purpose is to detect the low level feature such as corners, edges, etc, after that it will be target on detect all kinds for food as the high level features.The figure 5 shown the resnet architecture.Let consider a is the activation function with respect to the l is a linear function of next output.Deep network is said to be more capable of learning complex feature than shallow networks.However, adding more hidden layers to a sufficiently deep network may degrade the model accuracy due to vanishing gradient problem.This is a well-known issue in training a neural network where weights of the first layers can not be updated correctly through backpropagation of the error gradient.ResNet can avoid this issue by preserving the gradient during the backpropagation process.The basic idea behind gradient preservation is to backpropagate the error through the identity function such that the gradient would simply be multiplied by 1 (i.e., preserved).In detail, the transformation y = F(x) + x is manipulated instead of y = F(x) where x and y are input and output of stacked layers of which F is a nonlinear activation function, also called residual function.
Figure 4 shows structure of a residual block.Figure5, the Top-Left describe the block with beginning with the activation layer a l as input, and then through out the linear operation Where b is adding as the bias vector, and the Weight matrix W [l+1] is used at l + 1.
And then we apply the nonlinear layer, called ReLU function to govern the output.
For z [l+1] , it is updated: Finally, we add another nonlinear function ReLU According to Figure:5 in Top-Right, the short-cut function in ResNet will be rewritten equation30 after applying the short-cut: To access for both lower and higher level features, the Feature Pyramids Network is extended to improve standard feature extraction.

A. Data Configurations
The configuration of training based on food datasets.
Where i = 1, 2, 3, • • • , and there are two cases in this equation • If t = 0 is an initial state, the T T imeOf P erf ormance = 75 • If t = i, the result will be the products of ntimes according the constant is equal to 34.In our experiences, the most important things on training datasets by using data augmentation and transfer learning is One of most weak-points by using the data-driven approach is range of variance of data sets.

VI. CONCLUSION
In this paper, we focus on representing the visualization and detection analysis based on deep learning approaches in order to understanding deeper on food data recognitions.
In the future works, we will investigate the increasing variety of datasets.Secondly, it is necessary to improve the current time execution

Fig. 5 :
Fig. 5: ResNet backbone architecture: Top-Left: Residual Block; Top-Right: Designing the short-cut or skip connection; Bottom: Types of Plain Network and based on Residual Network.

Fig. 15 :
Fig. 15: Representation of example of the histogram distribution based on both training and modelling datasets.Firstly, dx and dy are represented the coordinate system of x axis and y axis.
Fig. 7: Top: it described the layer of activation.
[20]gion proposal networks (RPN)[20]is to share computation.The purpose is to take it as input for feature mapping, which is fed into fully connected layers, namely a box regression layer and box classification layer.•Feature Mapping: it is defined by sliding the • RoIAllign: • Fixed size feature mapping: • Mask branch: • Fully connected layer: • Box regression • Classification:

TABLE I :
First row shown the number of objects trained before augmentation data.The second row also provides the number of object labeled.And it is excitingly representing the last two rows respectively the maximum and minimum of detection accuracy.T imeOf P erf ormance = t 0 + 34×n i