Multi-Layered Basis Pursuit Algorithms for Classification of MR Images of Knee ACL Tear

Deep learning architectures have been extensively used in recent years for the classification of biomedical images to assist clinicians for diagnosis and treatment management of patients with different health conditions. These architectures have demonstrated expert level diagnosis, and in some cases, surpassed human experts in diagnosing health conditions. The automation tools based on deep learning frameworks have the potential to transform all stages of medical imaging pipeline from image acquisition to interpretation and analysis. One of the most common areas where these techniques are applied is knee MR image classification for different types of Anterior Cruciate Ligament (ACL) tears. If properly and timely managed, the diagnosis and treatment of ACL tear can avoid further degradation of patients’ knee joints and can also help slow the process of subsequent knee arthritis. In this work, we have implemented a novel classification framework based on multilayered basis pursuit algorithms inspired from recent research work in the area of the theoretical foundation of deep learning with the help of celebrated sparse coding theory. We implement an optimal multi-layered Convolutional Sparse Coding (ML-CSC) framework for classification of a labelled dataset of knee MR images with the coronal view and compare the results with traditional convolutional neural network (CNN) based classifiers. Empirical results demonstrate the effectiveness of the ML-CSC framework and show that the framework can successfully learn distinct features on a small dataset and achieve a good efficiency of more than 92% without employing regularization techniques and extensive training on large datasets. In addition to 95% average accuracy on the presence and absence of ACL tears, the framework also performs well on the imbalanced and challenging classification of partial ACL tear with 85% accuracy.


I. INTRODUCTION
One of the most common sports injuries in young adults is anterior cruciate ligament (ACL) tear. A study which spanned over 21 years, discovered an incidence of 68.8 per 100,000 person-years in general population [1]. The diagnosis requires surgical intervention such as reconstruction or enhanced primary repair to avoid further damage and degeneration of injury into osteoarthritis and subsequent chronic instability [2]- [4]. The frequent occurrence of ACL tear in sports community and general public requires accurate The associate editor coordinating the review of this manuscript and approving it for publication was Chintan Amrit . diagnosis of complete and partial ACL tears. This is also important for therapeutic decision making and avoidance of further damage. In addition to examinations by experienced sports medicine specialists for exams like pivot shift tests, magnetic resonance (MR) imaging is routinely used to complement and confirm clinical diagnosis and asses the status of associated injuries. MR imaging plays a crucial part in diagnosing, treatment planning, treatment delivery and followups. Consensus is building among researchers for stronger need of using automated tools in order to reduce costs, increase efficiency and provide higher diagnostic and prognostic accuracy for clinical decision making. Although for an experienced musculoskeletal (MSK)-trained radiologist, MR imaging is specific and accurate in diagnosing ACL tears [5], the diagnosis becomes challenging for non-MSK radiologists and clinicians without access to sub-specialty radiology. People who do not have access to specialists for diagnosing ACL tear injuries remain at risk of further deterioration of injuries without timely and proper diagnosis. Deep learning (DL) has emerged as powerful tool for image processing tasks in recent years complimented with the development of graphical processing hardware. The subsequent DL algorithms developed so far have been successfully used in tasks like object detection [6], MR image reconstruction [7], [8] and classification of biomedical images [9], [10]. The primary advantage of learning representations through deep neural networks is their ability to learn semantically meaningful patterns and features in underlying data without explicit human intervention. These models once trained successfully on training datasets, can be effectively used for solution of range of problems like image recognition and image classification on (unseen) test data. With tremendous success of DL architectures i.e. the Convolutional Neural Networks (CNNs) until recent years, their working has largely remained heuristic and a deeper understanding required to model their working and improve performance. Sparse coding theory [11] developed over the last decade has been applied to range of problems in image processing [12], [13]. The theory is based on constructing models that represent signals as linear combinations of few columns, called atoms from a given redundant matrix termed as dictionary. This theory applied successfully in array of image processing tasks over the last decade has been recently extended to explain theoretical foundation of DL. The convolutional sparse coding (CSC) and its multilayered version ML-CSC have been introduced to explain the theoretical foundation of DL and its association with the sparse coding theory. Specifically, the CNNs are interpreted as approximations of multi-layer basis pursuit problem [14], [15].
The wider availability of datasets and the learning ability of DL architectures demonstrate their capacity to become part of biomedical imaging workflow and help revolutionize the healthcare industry especially for communities with limited access to specialized facilities. The deep learning (DL) frameworks currently applied have limited applicability due to, • The frameworks are mostly applied on general imaging datasets for classification which require large number of labeled images. In case of biomedical images, availability of large datasets labeled from specialist radiologists is a challenging issue.
• Due to limited availability of datasets, the problem of imbalanced classification becomes even more challenging which is crucial for decision making for post diagnosis treatment of patients. Furthermore, recent works mostly address the binary classification problem of presence or absence of ACL tears, wherein the classification task of partial tear presents additional challenge for classification algorithms.
• The DL architectures are mainly trained with heuristic techniques which require theoretical analysis in order to improve feature learning for accurate classification, especially in case of biomedical images where the error margin should be as minimum as possible.
• Lastly medical imaging plays a crucial role in diagnosis, treatment planning, treatment delivery and follow-ups. To increase efficiency, there is an urgent need to integrate DL based automation tools in the form of machine learning, into all stages of the medical imaging pipeline ranging from image acquisition and reconstruction to analysis and interpretation.
In this work we address the above-mentioned gaps with the implementation of a DL classification framework optimally designed and tuned for gray scale MR images obtained at Hospital Kuala Lumpur (HKL) and labeled by expert radiologist for normal, complete and partial ACL tears. Specifically, this work makes following contributions.
• The framework has been optimized successfully on labelled knee MR dataset and an average test set accuracy of 92% has been achieved without adding regularization techniques.
• The unrolling ML-CSC framework with multi-layered iterative thresholding algorithm, its fast version FISTA along with multilayered Basis Pursuit (ML-BP) is demonstrated to perform better than the CNN based classification framework taken as baseline, without increasing network depth.
• The solution to challenging partial ACL tear classification problem, where the classifiers generally do not give good accuracies, is optimized with data augmentation techniques and accuracy of more than 85% on this specific class is achieved for multilayer iterative thresholding algorithm (ML-ISTA) framework, outperforming traditional CNN with same number of parameters.
• Data augmentation technique especially suited for training a DL framework on biomedical images is applied, which shows optimal performance as compared to other transforms used in image classification algorithms.
• Lastly, a classification framework based on a recurrent architecture with the same depth as the generative models described above is trained and analyzed for comparison. The accuracies of all models are compared with CNN, demonstrating the viability and effectiveness of the MR image classification framework.

II. PRIOR AND RELATED WORK
With improvements in hardware for processing of large number of images, it became feasible to train large neural networks for classification tasks on datasets of different sizes.
The seminal work of [16] significantly improved state of the art on classification of general images using graphical processing units. The work achieved an error rate of more than 15% on ImageNet dataset. This error rate has been improved significantly since then on general datasets VOLUME 8, 2020 available for research purpose. In addition to general datasets, P.Rajpurkar et al. in [17] implemented a 121 layer deep neural network for radiologist-level pneumonia detection on chest X-rays. The algorithm was trained on large ChestX-ray14 dataset [18]. P. Rajpurkar et al. in [19] developed a 33 layer CNN for detecting a wide range of heart arrhythmias from single-lead ECG records. F.Liu et al. in [20] used classification performance of DL networks as compared to clinical reports for binary classification (tear or nor tear presence), and concluded that there is no significant difference between the two. The study by P.D. Chang et al. in [21], demonstrated the feasibility of a high-performing CNN tool to detect complete ACL injury with over 96% test accuracy for binary classification problem. The study, which excluded cases with partial tear and mucoid pathologies demonstrated the feasibility of high performing CNN tool, with customized CNN architecture and dynamic patched based sampling with five-sliced 3-D input. The results of study in [22] suggested the usefulness of preoperative MRI-detected lateral meniscal extrusion (LME) for estimating lateral meniscus posterior root tear (LMPRT) in injured knees with ACL tear. Although, there is a significant improvement in application to classifications and inverse problems in context of DL architectures, the theoretical foundations of DL largely remain heuristic. One such very useful heuristic technique which is widely applied in DL architectures as regularization to avoid over fitting the model on test data, is dropout. This technique randomly discards activations to improve classification accuracy on tests sets and avoid over-fitting by the learning model. These regularization techniques have been improved recently with proposal of stochastic techniques to further reduce overfitting by DL networks [23]. The recent research addressing the limitations of DL architectures has focused on the theoretical explanation of the working of deep learning frameworks. In [14], [15], authors elaborated the significance of theoretical understanding of deep learning and proved connection between widely used CNN architectures and celebrated sparse coding theory. The sparse coding theory which has been successfully used in inverse problem in imaging and classification tasks, was shown in [14] to be tightly connected to CNNs. The work established connection between CNN and sparse coding theory and further gave insights to the multilayered version of sparse coding. A further work by [24] pointed out the suboptimal performance of model presented in [14]. The work in [24] analyzed the proposed multilayered basis pursuit in context of combination of synthesis and analysis. Further extending the work on multilayered basis pursuit and its application to explain CNNs and performance on applied problems of classification, J. Sulam et al. in [25] introduced a multilayered basis pursuit framework where in an l 1 norm penalty was proposed on intermediate representations of multilayered framework. Reference [25] showed that iterative thresholding algorithms can be used for multilayer basis pursuit and demonstrated the framework effectiveness on classification tasks of general datasets of MNIST, SVHN and CIFAR-10 with improved performance of thresholding algorithms as compared to the CNNs.
In this work we have implemented an optimal framework for multi-layered basis pursuit algorithms and demonstrate through experiments its applicability to classification of biomedical images. The novel architecture, which is optimized for classification of biomedical images, trained on original dataset of knee MR images, achieves a good average test accuracy of more than 92% and class wise test accuracy of 95%, outperforming traditional CNN without adding regularization parameters and computational complexity. The rest of the paper is organized as follows. Section III gives a brief introduction to clinical background of ACL and its tears, section IV gives brief overview of CNNs, the multilayered sparse coding model and image classification in context of CNN and ML-CSC. Section V gives overviews of iterative thresholding algorithms for single layer basis pursuit and its extended version in multilayer settings along multilayered basis pursuit. Section VI gives the ML-CSC model, its implementations for classification of biomedical image dataset and experimental results of image classification of knee ACL tear.

III. ANTERIOR CRUCIATE LIGAMENT TEAR -BACKGROUND
An anterior cruciate ligament (ACL) is one of the key ligaments that help stabilize the knee joint. These ligaments connect the thighbone (femur) to the shinbone (tibia) (Figure 1). Injuries of ACL are most often a result of low-velocity, noncontact, deceleration injuries and contact injuries with a rotational component. A complete tear is characterized by rupture of the ligaments and partial tear by stretching of the ligaments becoming loose and damaged. The MR images with a partial tear, normal knee, and complete ACL tears are given from dataset used in this work ( Figure 2) The diagnosis process involves an emphasis on history and physical examination of affected patients. The Lachman, pivot shift, and anterior drawer tests are three types of physical examinations performed on ACL tear patients for assessment of the injury. Out of these three tests, the anterior drawer test has the highest sensitivity of 94% [26]. MRI examination coupled with physical examination helps clinicians in identifying ACL tear types in addition to identifying bone bruising, which is present in most of the patients with an ACL tear. Once the ACL tear is diagnosed, the treatment plan is devised by clinicians for rehabilitation or surgical intervention according to patient condition and medical profile. Studies have been reported that, in some cases, an average radiologist must interpret one image every 3-4 seconds in an 8-hour workday to meet the workload demand. Under these conditions, the errors are inevitable for radiology tasks where visual perception and decision making are involved [27]. An integrated AI system in imaging pipeline, which enables the trained radiologist to receive pre-screened images would enable better decision making especially in heavy workloads in addition to helping in the diagnosis of ACL tear of knee injury patients in regions where trained MSK -radiologist are difficult to access. Besides, the ability of machines to scan large amounts of data enables them to generalize the classification algorithms for better decision making.

IV. REPRESENTATION LEARNING AND CLASSIFICATION
The ability of machines learning algorithms that can learn and improve based on experience of the complexity of the problem at hand and their adaptation to that specific problem gives tremendous opportunities in array of applications. The CNNs are fundamental part of representation learning and are briefly explained below.

A. CONVOLUTIONAL NEURAL NETWORKS
The forward pass is the fundamental part of the CNNs, where an input signal X, is convolved with set of learned filters of chosen size giving output as feature maps or kernels. In matrix vector form, this can be written as W T 1 X , where W 1 , is a convolutional matrix (transposed) with learned filters as columns with all their shifts. After convolution, a bias term b 1 is added to resulting vector and a nonlinear operation (here Rectified Linear Unit-ReLU) is applied. For a two layers forward pass of CNN, the operation is given by, The output of first stage/layer is then treated as input to another stage with convolutional matrix W T 2 and bias term b 2 . The operation is extended up to desired number of layers and feature maps are then used for classification or inverse problems. For the problem of classification, the output of last layer is fed to train a classifier which tries to predict the label h(X ) associated with given image X. For given dataset of images (X j ) j , the task of CNN including filters [ , parameter of the classifiers U can be written as, The task of the optimization algorithm is to minimize the mean of the loss function l.

B. CONVOLUTIONAL SPARSE CODING MODEL-THE MULTILAYERED BASIS PURSUIT
Sparse coding theory works on premise of first learning filters (weights/dictionaries) from given data and then finding their sparse representation from those dictionaries for representation of given images. Once the underlying structure is successfully modeled, the problems of reconstruction on images from noisy measurements, retrieving/reconstructing a signal in compressive sensing domain and classification of test sets on already training dictionaries and sparse maps can be done successfully with the help of different algorithms developed over the years and applied successes fully in different domains. In sparse coding theory, formally, a given signal y admits a sparse representation in terms of a dictionary D, if y = Dx, and x is sparse. Given dictionary D, the celebrated basis pursuit problem with l 1 norm penalty is formulated as, This modeling theory was extended [14] to multilayer settings, providing connection between sparse coding theory and state of the art DL architectures. The traditional sparse coding model assumes the dictionaries without any structure. Whereas in CSC, which is a special form of sparse coding [11], a special structure on learned dictionaries are imposed with filters banded together and concatenated in circulant form. In the multi-layered version of CSC, which is an extended version of CSC, the sparse feature maps thus obtained from one layer are then treated as input to the second layer, and dictionary learning and sparse coding steps are repeated for subsequent layers. The CSC model represents a signal of interest as multiplication of dictionaries D and sparse vectors x. The deep learning problem in context of sparse coding theory, which is shown as theoretical explanation of CNNs [14], can be formulated as follows. For a global signal X , convolutional dictionaries D and sparse vectors x, and k number of layers and cadianity s, the deep pursuit problem is defined as [24]:

VOLUME 8, 2020
A convex relaxation proposed in [25] for deep pursuit problem result in multilayered basis pursuit. For a two-layer model, the problem can be formulated as: In case λ 1 , λ 1 = 0 and λ 1 > 0, the above formulation is equivalent to traditional basis pursuit with global dictionary. With λ 1 , λ 1 > 0, analysis priors are imposed on set of sought after representations x with regularized solutions as a result.

C. ML-CSC FOR CLASSIFICATION
Given sparse vectors * and dictionaries D, the classifications problem can be formulated in deep sparse coding context as: where sparse representations are fed to the classifier after dictionary learning, multilayer basis pursuit and training of classifier.

V. ALGORITHMS
DL architectures and algorithms traditionally deal with high dimensional settings where second order methods result in prohibitive computational complexity and slow convergence rates. The proximal gradient descent which uses first order approximations for updating its optimization steps is therefore a suitable choice for multilayer basis pursuit due to its dependence on sparse prior terms instead of the convex term [28]. This algorithm only needs to calculate the sub gradients of convex term, and proximal mapping associated with update depends on sparse prior. The convergence analysis is done in terms of number of iterations of algorithm.

A. LAYERED BASIS PURSUIT
The layered basis pursuit given in [14], addresses sequence of pursuit of the form: where x 0 = y and i = 1 to k. These algorithms [29], [30], which present heuristic approximation do not minimize Equation (5) and each layer is required to explain next layer only so cannot be used to generate signal according to multilayer sparse model. Algorithm 1 for layered basis pursuit which seeks sparse Algorithm 1 ML-BP Input signal y, dictionaries D i Init Setx o = y, 1: for i = 1 : k do Set of representations 4: end for maps x, subject to constraints given in of Equation(4) for P 1 term and thresholding operator H at each layer of the neural network.

B. ITERATIVE THRESHOLDING ALGORITHMS (ISTA)
Iterative Shrinkage Thresholding Algorithm (ISTA) is a first order method for optimizing functions comprising composite terms originally proposed in [31]. A faster version of this algorithm FISTA, proposed in [32], introduced a momentum term, resulting in improved convergence rates. These algorithms require matrix vector multiplications, therefore are appealing due to low complexity. The ISTA provides convergence in function value in the order of O(1/k) and its fast version FISTA provides better convergence rate in the order of O(1/k 2 ). The proximal gradient method ISTA works by iterating the updates given by the proximal operator. As g(.) in equation (5) is sum of l 1 composite terms so, application of ISTA algorithm is not feasible. Another feasible alternative the generalized LASSO [33] can also be computationally expensive due to requirement of inversions of linear operators during optimization. The iterative algorithm employing re-weighted l 2 norm approaches proposed in [34], for compressive sensing also require iterative matrix inversions and thus is computationally expensive.

Algorithm 2 ISTA
Init x 0 ∈ f (x) 1: for any k = 0,1,2. do 2: x k+1 = prox 1 L g(x − 1 L k ∇f (x)) 3: end for C. MULTI-LAYER ISTA AND FISTA For a composite model comprising a smooth and convex term f (x) and convex and not necessarily smooth term g(x), the objective function is given by, The gradient mapping is the operator given by: where L is Lipschitz constant. The ISTA update step for Equation (8) is given by, The optimization problem of multilayer basis pursuit is given by, The sparse representations for second layer are given as, The update for the gradient mapping method is given by: Here c and t are constants with specific bounds for convergence of the subject algorithms. As g 1 (D 2 .) is composite term in Equation (10), in order to avoid calculating its proximal mapping, the term D 2 x 2 is approximated with x 1 that is The approximation results in calculation of proximal mapping of x 1 in in Equation (10). The update for Equation (10) becomes: Consequently, the proximal mapping of composite term after approximation becomes soft thresholding of x 1 which is equal to x 2 = T tλ 1 x 1 . The update step for ML-ISTA after above approximation becomes: Algorithms for ISTA and FISTA in multilayer settings are described in Algorithm 3 and Algorithm 4 respectively. for for i = 1:L do 4: Set of representations 7:

end for 8: end for
The FISTA algorithms incorporates the momentum term which improves the convergence rate. The framework for classification with iterative thresholding algorithms is given in Figure 3 and pseudocode is presented in Appendix-A. The ISTA module described in Figure 3 computes representations according to Algorithm 3. First the encoded feature maps are backward computed for the three layers framework and the iterations and unfoldings progress according to ML-ISTA algorithm. The (-) sign given in Figure 3 depicts subtraction of resulting representations after convolution and transposed convolution operation are carried out with dictionaries D. The number of unfoldings inside the ISTA module enables the shallow network to increase depth without having any impact on number of parameters.

D. UNFOLDED ITERATIVE ALGORITHMS AS NEURAL NETWORKS
Unfolded iterative algorithms are successfully used in recent research works [35]- [38], for solving sparse recovery problems. To speed up the computational cost associated with VOLUME 8, 2020

FIGURE 4.
A two layer ISTA model as illustrated in [25].
approximation algorithms, the work in [36] showed a combination of optimization and neural networks to produce deterministic functions to successfully approximate parsimonious/sparse models resulting in significant reduction in computational time for applications requiring real-time performance such as image modeling, robust face modeling, audio sources separation and robust speaker recognition. The work in [37], proposed a partial weight coupling structure to learned iterative thresholding algorithms (LISTA) and support selection for improved convergence rate with experimental demonstrations. A two layer ISTA network is given in Figure 4. The classification framework implements a multilayer ISTA and FISTA framework with two unfoldings.

VI. EXPERIMENTS AND RESULTS ON KNEE MR DATASET
We use dataset of 623 MR images comprising 205 (complete tear), 205 (normal), and 213 (partial tear) images with coronal view. Data collected in the study include adult patients, aged between 18 to 40 years (Male and Female), with Proton density (PD)-weighted images and fat saturation. The images were labeled by certified MSK-radiologist at Hospital Kuala Lumpur (HKL) and have been used in [39], for classification of MR images with CNN. An 80-20 split is applied for train and test. This work does not employ regularization techniques of dropout and batch normalization in order to provide clear experimental setup and demonstrate its effectiveness on a framework with application to biomedical image classification. All algorithms use three convolutional layers, with filter size of 5 in each layer, and number of feature maps of 16,32 and 32 size for layer one, two and three respectively. These parameters have been empirically experimented for optimal performance on this dataset. Similarly, all algorithms use learning rate of 0.001 and batch size of 3. The optimizer parameters of weight decay (an l 2 weight regularization) and learning rate scheduler values have also been kept same for all algorithms. All models have been trained with stochastic gradient descent. Table-1 gives precision, recall, average accuracies and F-1 scores for baseline CNN, and All-Free learning framework and proposed ML-ISTA,ML-FISTA and ML-BP with network unfoldings. The classification metrics of framework with highest average accuracy has been highlighted to emphasize the effectiveness and better accuracy of proposed frameworks as compared to baseline. Class-wise accuracies, average accuracies and test losses of CNN and proposed algorithms are given in Table-2, with emphasis on framework with better classification accuracies on complete ACL tear and partial ACL tear.

A. MULTI-LAYERED ITERATIVE THRESHOLDING ALGORITHMS WITH UNFOLDINGS
The results of classifier based on features extracted by multi-layered iterative thresholding algorithms are given in Figure 6 and Figure 7. The classifier performance is given for two unfoldings (1 and 2) and further increase in unfolding value results in divergence of algorithms. The training accuracy of implemented frameworks is given in Figure 9 for unfoldings 1 and 2. Train losses and validation losses for number of unfoldings are depicted in Figure 10 and Figure 11 respectively. The empirical results show improvement in learning performance of proposed classification framework, sharp decrease in loss curves and better classification accuracy as unfoldings are increased. The ML-ISTA framework with two unfoldings outperforms CNN and ML-FISTA with reasonable margins as shown in Table-1.  TABLE 1. Precision, Recall, Accuracy and F-1 scores of iterative thresholding algorithms with unfoldings.

B. LAYERED BASIS PURSUIT
The layered basis pursuit algorithm is incorporated with same architecture and hyper parameters of baseline CNN and results are added for comparison. The algorithm as proposed in [14], is implemented with each iterative shrinkage iteration unrolling at each individual layer. Whereas in case of ML-ISTA and ML-FISTA, unrolling of iterations is done for entire multi-layer basis pursuit problem. The experimental framework uses two iterations and results are given for comparison with CNN in Figure 8. The ML-BP framework with single unfolding has comparable results with ML-ISTA framework and outperforms baseline with reasonable margins in terms of average accuracy.

C. AN ADAPTIVE LEARNING FRAMEWORK
In addition to three generative models described above, an all free learning framework consisting three layers with same number of feature maps as CNN is also implemented. In this framework, the dictionaries and corresponding representations are adaptively learned for subject dataset. The all free model is trained on same number of layers and relevant parameters for a recurrent architecture. Framework for ML-ISTA, ML-FISTA and layered BP have same number of parameters as of CNN, whereas all free framework has O(LK ) parameters. L is number of layers (here 3 layers are used) and K is the number of unfoldings. The results for classification accuracies for baseline CNN and all-free recurrent architecture are given in Figure 5, Table-1 and Table-2. The number of iterations of the framework slightly improves the accuracies on given classes as observed in ISTA and FISTA architectures.

D. DATA AUGMENTATION
MR data used in the architecture is first center cropped for image size of 320 × 320 and then normalized with mean and standard deviation of the dataset. Of many transforms available in Pytorch, the center crop proves very effective in this architecture and learning curve for training dataset follows steady pattern of decrease with increase in number of epochs.

E. CLASS IMBALANCE
When there is an imbalance in classes, DL frameworks give poor classification accuracy for certain classes. For the knee ACL tear classification problem, the classifiers give poor accuracy for the challenging problem of partial ACL tear classification as compared to the other two classes. To circumvent this issue, the oversampling of the partial tear category is done during the training phase. This technique significantly increases test accuracy for partial ACL tear class, when equally applied for CNN as well as the proposed framework, resulting in improved results of the proposed framework with two unfoldings in comparison to other classification algorithms. The accuracies and effectiveness of biomedical classification algorithms can be further improved by incorporating the learning framework with different datasets comprising different age profiles (young and aged population) and gender profiles for a more specific and accurate diagnosis.

F. DEEPER LEARNING ARCHITECTURE AND THE CHALLENGE OF OVER FITTING
The DL architecture chosen for this problem of classification, the size of filters and number of feature maps were observed to be optimal for this dataset of knee MR images. Deeper architectures for CNN, with higher number of feature maps, resulted in under fitting of the learning model. The effect of performance of classifier is given to demonstrate the effectiveness of ISTA, FISTA and layered basis pursuit with unfoldings, which uses the algorithms iterations to extend the depth of network without increasing number of parameters. VOLUME 8, 2020

G. DISCUSSION
The challenging problem of identification of partial ACL tear, which is characterized by stretching and weakening of the knee ligaments is diagnosed by clinicians with clinical tests along with MR imaging and arthroscopic examinations. In our work, the MR images with coronal PD were used for training and testing the framework, as the coronal imaging plane is mainly used by radiologists to trace ACL fibers from origin to insertion. In the proposed framework, the partial tear class is successfully identified with 85% accuracy by ML-ISTA and 82% accuracy by ML-BP. The cases of complete ACL tear class which are characterized by the rupture of the knee ligaments are identified with 98% accuracy by ML-FISTA followed by ML-BP framework which has an accuracy of 97%. Overall, the ML-BP algorithm results in the highest average classification accuracy on all classes as shown in Table 1 and Table 2.
Generally, the presence of notch origin tears makes the diagnosis of a complete ACL tear difficult for radiologists to detect in clinical settings. Another possible reason for misclassification of complete ACL tear class is the mild focal intrasubstance degeneration rather than a complete tear.
As MRI based pathology is localized to small regions of interest, the image crop operation applied in our work significantly improves the learning network training accuracy. This insight can be used to further improve the framework with the incorporation of training images comprising of sagittal and axial planes, which are part of standard knee imaging protocol  used in clinical applications. The evaluation and interpretation of three-dimensional (3D) data is another unique feature associated with cross-sectional imaging. For musculoskeletal injuries, the combination of 3D contextual information of ligaments in the imaging pipeline is especially useful for  diagnosing ACL tears. Furthermore, the performance and generalizability of the framework may be improved with the incorporation of different magnetic field strengths, scanning protocols, and vendors of MRI scanners.

VII. CONCLUSION
We implemented a multi-layered convolutional sparse coding (ML-CSC) framework employing iterative thresholding pursuit algorithms and demonstrated their effectiveness in terms of classification accuracy in comparison to traditional CNN based frameworks. Algorithms of gradient mapping schemes like iterative thresholding algorithm (ISTA), fast iterative thresholding algorithm (FISTA) along with multilayered basis pursuit were implemented for feature extraction and training of the classifier. The framework was applied to a labeled dataset of knee MR images for classification and accuracies were given for different types of ACL tears. In absence of larger labeled datasets, this work demonstrated the effectiveness of the classification framework's learning capability with the same number of features as the baseline CNN, and without adding regularization hyperparameters and computational complexity to the neural network architecture. The framework also demonstrated the effectiveness of unfolding on neural networks' performance, improving classification accuracies on imbalanced classification problem of partial ACL tear. In future work, the research can be extended to design more generalized classifiers using transfer learning, which can adapt to different datasets without requiring training from scratch and use them for improvement in performance of classifiers of biomedical images.