Detection and Classification of Lung Nodule in Diagnostic CT: A TsDN Method Based on Improved 3D-Faster R-CNN and Multi-Scale Multi-Crop Convolutional Neural Network

Lung nodule classification has been one of the major problem relevant to Computer-Aided Diagnosis (CAD) system. Lung cancer for both men and women has been one of the leading causes of cancer related death. Deep learning models have produced promising performance in recent years, outperforming traditional methods in different fields. Nowadays, scientists have attempted numerous deep learning approaches to enhance the efficiency of CAD systems via Computed Tomography (CT) in lung cancer screening. In this paper, we presented a completely automatic lung CT system for cancer diagnosis named Two-step Deep Network (TsDN) and it contains two parts detection of nodule and classification. First, Improved 3D-Faster R-CNN with U-net like encoder and decoder is used for detection of nodule and then Multi-scale Multi-crop Convolutional Neural Network (MsMc-CNN) is proposed for the pulmonary nodule classification. The multi scale approach uses filters of various sizes to extract nodule features more efficiently from the local regions, and then multi crop pooling technique involves in extracting the important nodule information that cultivates various regions from convolutional feature map and then add numerous times for the maximum pooling. The proposed TsDN is trained and evaluated on LIDC-IDRI public dataset and achieved a sensitivity of 0.885 and specificity of 0.922 with AUC of 0.946.


Introduction
Lung cancer is now the world's highest cause of cancer mortality [1]. Early detection and effective treatment of lung cancer is important for declining mortality. The Screening Computed Tomography (CT) [2] tests indicate that non-invasive early detection of lung cancer in at-risk patients is growing significantly [3]. The National Lung Screening Trial had clearly presented that lose dose CT screening can decrease patient death by 20 % compare to the conventional chest x-rays [4]. Annual evaluation for lung cancer with CT for high-risk patients is recommended by the US Preventive Services Task Force (USPSTF). The highest number of deaths were induced by lung cancer (1.8 million deaths), with a low survival period of five years (18 %) [5]. The high prevalence is primarily attributed to the reality that in more than half the cases progressive lung cancer is diagnosed. In recent years, the detection rate of lung tumors has been improved by substantial technological advances in medical imaging, particularly computed tomography (CT) and it is also more efficacious in decreasing mortality [6]. However, the study and interpretation of these images requires time [7], and subject to variability between observers. In fact, tumor variation can lead greatly to diagnosing cancer and can't necessarily be noticeable to the human eye [8].
The abovementioned deficiencies of human involved treatment of cancer have facilitated the creation of the fairly recent area of medical images known as radiomic [9][10] [11] [12] that showed promising diagnostic results for cancer. Radiomic is the extraction by machines of thousands of denary features from radiographical images rely on the premise that such features will supply the unassured eye with more detail [13]. The purpose of radiomic analyzation in cancer is, for every case, to improve the detection, as well as predictive analytics. Recently, radiomic research was classified into two major categories, handcrafted and radiomic deep learning. A series of predefined designed functions are taken from medical images in the former group [14] [15]. These characteristics are therefore transferred to an analytical tool for determining the desired output, such as cancer probability. Given the significant connection with clinical findings that handcraft features have demonstrated [16], some drawbacks limit the relevance and performance. Firstly, although handcraft characteristics could be efficient reliable in certain processes, this set of features is mainly derived from the compiled region area of interest known as ROI, which only experts would provide. Thus the, calculation does not only take a long period of time and power also, its constancy are extremely reliable on data preprocessing and segmentation [17]. Furthermore, handcrafted radiomic is predefine and involves prior experience of the kinds of features that are sometimes not easy to extract. In addition, the extracted features are not ensured useful.
CAD systems were designed to help thoracic radiologists to make the distinction between malignant and benign nodules. These CAD systems typically transform the features of the nodule as benign nodule or malignant decisions through a classifier. Since the CAD increases radiologists' diagnostic precision, it is a reasonable option for the diagnosis and treatment. Deep learning has gained considerable progress in image processing, object detection, image classification and segmentation, and as well as natural language processing in past decades. Deep neural network (DNN) could even attain just close human intelligence in most of these areas [18]. CNN is the common DNN model, implements supervised learning techniques that allow it to collect high-level feature from unprocessed images and is encouraged to manage improve the efficiency of lung nodule classification [19]. A recent research adopts a deeper CNN with single convolutional layer for the classification of pulmonary nodules and demonstrates a higher accuracy related to the conventional extraction techniques of the feature. Because of customizing the CNN model [20], the transfer learning approach also demonstrates positive outcomes for the classification of nodules. [21] Nodule detection typically requires two phases, generation of regional proposals and false positive reductions. Current solutions typically involve manual features like morphological properties, clustering of voxels and thresholds of pixels [22]. Deep Convolutional, like Faster R-CNN [23] have recently been implemented in creating candidate region boxes [24]. More innovative features or complex feature, like well-design textures, remain used to eliminate false-positive nodules. To learn endogenous features efficiently we proposed a modified R-CNN nodule detector with 3D convolutional kernels and also have encoder and decoder structure similar to U-Net [25]. The structure of the U-Net is essentially an auto-encoder, enhanced by skipping connections among layers of encoder and decoder. While used in the sense of semantic image segmentation, it can also be very helpful for nodule detection to collect both spatial and local features. Prior to the period of deep learning, the general nodule classifying process was manual feature design accompanied by classifiers. After the broad LIDC-IDRI dataset [26] is available publicly, deep learning approaches [27] are now the prevailing framework for study in nodule classification. CNNs are among the most successful deep networks in the radiomic area. Convolutional neural networks for instance, these Networks exchange trained weights across the input that decreases the training time considerably and allows the model input to derive local feature from the entire image.
In this paper, a new Two-step Deep Network (TsDN) for lung nodule detection and classification model is used. Initially, we proposed a modified improved 3D-Faster R-CNN for lung nodule detection which is based on encoder and decoder same like U-net. Then for the classification task we proposed a Multi-scale Multi-crop Convolutional Neural Network (MsMc-CNN). Finally, by combined the nodule detection network together with the nodule classification network, we created a completely automated system for diagnosing lung CT cancer as shown in [ Figure 1].

Method
The proposed fully automated Two-step Deep Network (TsDN) method for diagnosing Lung CT cancer contains two major stages: (1) Detection of lung nodules using improved 3D-Faster R-CNN; (2) Multi-scale Multi-crop Convolutional Neural Network (MsMc-CNN) for classification of detected nodules

Improved 3D-Faster R-CNN
It remains an important and a challenging task to detect the small pulmonary nodules with the size of 1 cm 3 . For detecting and classifying small lung nodules in a CT scan, deep learning models have therefore been used. According to memory and processing constraints, the restored volumetrically CT image was split into patches (96 × 96 × 96). The outcomes of all patches remained together for the detection of nodules. The improved 3D-Faster R-CNN through U-Net structure like encoder and decoder was selected for the lung nodule detection. Two convolutional layers are being used for the production of features before the first maximum pooling. The features are analyzed inside the decoder network by de-convolutional layers and are eventually attached to the respective layers in an encoder network and so in second to the final layer there is a convolutional layer with drop-out probability which is 0.5. and in final layer, 3 various anchors 5, 10 and 20 are built for size comparisons depending on the allocation of the nodule sizes.
Every anchor has several loss functions factors. The loss of classification informs the box that whether the nodule is present or not, whereas the loss of regression defines the size of diameter (d) and as well as the nodule coordinates. Anchor boxes have been assigned to the Ground-Truth Boxes (GTB) depending on inter-section values. When the detected overlap region reached 0.5, the probability was defined as positive (p* = 1), and if GTB has a less probability then it found as per a negative (p*= 0). An anchor 's loss function is described as below: Where pi represents the prediction probability of a present anchor "i" which is containing the nodule and ui is the prediction relative coordinates of present nodule location which is given as follow: Where , , are the nodule's diameter and its coordinates, and , , are the anchor's "i" diameter and coordinates. The ground-truth nodule's position is represented by following equation: Where * * * * shows the diameter and the coordinates of the ground-truth nodule.

Multi-scale Multi-crop Convolutional Neural Network (MsMc-CNN)
Distinguishing between malignant and benign nodules based on small morphologicalfeatures is a very challenging and difficult task which needs new deep learning methods to solve. Multi-scale Multi-crop Convolutional Neural Network (MsMc-CNN) is proposed for the pulmonary nodule classification. The proposed architecture contains two main techniques: Multi-scale convolutional layer and Multi crop pooling process. To have multi scale functionality, a standard convolutional layer is substituted with the new multi scale convolutional layer. Rather of having a traditional convolution layer under which the extraction of the feature is implemented choosing fix-scale filters, the Ms convolution layer implements several filters of varying sizes to include more local-structure. We introduce a multi crop pooling system that will be customized pooling technique to produce multi-scale functionality to approximate the traditional maximum pooling process. Without the use of various networks, the suggested technique to one network is useful in terms of computational complexity. The proposed MsMc-CNN architecture is seen in [ Figure 2]. It has main feed forward path. That main path performs a multi-scale extraction of features, and the basic structure of the main path is focused on the systemic neural network consisting of four single scale convolution layers accompanied with three fully-connected layers (fc). The Multi crop network describes a sequence of feature extraction composed of concatenate convolutional layers and as well as pooling layers. Nonetheless, for collecting multi-scale functionality, a suggested multi crop pooling layer is used to substitute the maximum pooling layer and the multi scale extraction of the feature is carried out on 2 singular-scale convolution layers. To collect 12 output feature-map, the first single-scale convolution layer conducts a 7×7 convolution process. Likewise, the second convolutional singular scale layer having similar filter size to produce 24 maps of the output-feature. Instead, two multi scale convolutional layers adopt that and then it connected to 3 fc layers. The last output layers are used to classify the lung nodule as it will be benign or malignant. The first fc layer contains 256 neurons while second fc layer has 128 neuron and the last output fc layer has just 2 neurons. After the last totally linked sheet, the predicted results are simplified through SoftMax operation [28]. The dense fc leads towards overfitting problem because of fully connectivity design. A dropout strategy is added to the first two fc layers through a dropout-ratio of 0.5 [29], in order to prevent overfitting. The ReLU activation function is applied to every convolution layer as it showed good changes in gradients.

Convolutional Multi-scale Layer
A technique for enhancing the efficiency of convolutional neural network is to boost its complexity by extra convolution layer. This approach will however generate overfitting due to small dataset. A multi-scale approach is to help remove localized sparseness structure belonging to multiple receptive areas. The detail structure of convolutional multi-scale layer is shown in figure 3. Rather than using a one scale-filter, three various filter's sizes are configured to the filter part in the multi scale layer. This layer contains of three sections. The first section is performing an operation of 3×3 convolution. The second section is able to extracting the local structures while using a convolutional-layer of 5×5. likewise, the third section has just 1 convolutional-layer although the filter size is further extended to 7×7 to enable more local features. The feature-map produced by three section by concatenation of each other. To produce ad appropriate concatenation, the convolutional layer with 3×3 input has been padded by 1×1 bordering with zero, so although zero padding with 3 is used 5×5 convolutional layer. Convolutional multi-scale layer aim is to extracting the features through various filter sizes. Unless a one scale filter cannot characterize the local features by adding a greater filter size, that structure could be protected because of enhanced receptive area. Different filter sizes are used in our method to concurrently obtain multiple local features.

Multi crop pooling operation
We expand a standard maximum pooling layer to the proposed multi crop pooling layer strategy that enables to capture the visualization of elements based on the nodule. Standard network maximum pooling layers choose a subset of functionality and low the scale of feature maps. Even so, the maximum pooling process is carried out uniformly on every function map as well as the maximum pooling layer is therefore a singular-level function reduction process. For replace the standard maximum pooling, we suggested a multi crop strategy layer. It is a technique of repeated pooling functions and it is also design to take the center features of the nodule as shown in [ Figure 4]. A concatenate center feature of nodule (fc= fc1, fc2, fc3) consists of 3 patches (P1, P2, P3). Let assume, the size of P1 is d×d×n, where d×d is the feature-map dimension and n are the feature-map number.
Where P2 and P3 are the center area. Multi cropping process is the input of convolutional P1 features derived only from a pooling feature. P2 is the area centered to P1 and P3 is the area centered on P2. The multi crop pooling layer's design is an equivalent to that of a maximum pooling layer. So, for the goal of removing multi scale functionality it may substitute some maximum pooling levels. The pooling rate of multi crop pooling depends upon feature's place in the feature-map, whereas spatial pyramid pooling technique pools similarly in various locations.

Experiment
To test the TsDN method we perform comprehensive experiments. Using the detectors upon on LUNA16 dataset, we perform 10-fold cross validation. We use the LIDC-IDRI for nodule classification, and use the separated patient level data set of the LUNA16. After this, we further validate an existing network relying upon on nodules detected, both on diagnosis of nodule and as well as patient level.

Dataset
The LUNA16 dataset is a sub-branch of LIDC-IDRI, the largest publicly accessible data collection for pulmonary nodules. LUNA16 dataset has just detection's annotation, whereas LIDC-IDRI includes all other relevant details for low dose lung CTs including annotations from multiple doctors about nodule sizes, positions, and other detail. LUNA16 dataset excludes sliced thickness. CTs greater than 3-millimeter, slice width contradictory or insufficient slices as from LIDC-IDRI dataset as well as provides the patient's 10-fold cross validation splitting of the data set specifically. The dataset for LUNA16 includes 888 low dose lung CTs but also 1,018 for LIDC-IDRI. LUNA16 dataset extracts smaller than 3 millimeter of the annotated nodules and for nodule classification nodule annotation as from the LIDC-IDRI dataset is extracted. We eliminate a nodule if the maximum average value is equivalent to 3. If the nodule size is greater than three and consider it as a positive or else, we would mark it as a negative. To keep our experiments reproducible, we just retain the CTs in the LUNA16 dataset and use for classification the similar split cross validation for LUNA16.

Lung nodule detection
In this section, we trained our detection model on LUNA16 dataset. During preparation, we expand the dataset by flipped the image arbitrarily while using cropping scale from 0.75-1.25. In the testing stage, we used the threshold probability just before the sigmoid function as -2. We use a deep 3D ResNet as a contrast to verify the efficiency of proposed network for detection. An encoder component of this reference network is an 18-layer 3D ResNet18, that is an expansion of 2D ResNet18 [30]. The output of FROC on LUNA16 is seen in figure 5. The dark line is compressed FROC base on true-prediction. Using 2 stage training, the 3D-Faster R-CNN attains a FROC score of 85.3 percent without false-positive nodule reduction stage. The 3D ResNet18 and proposed 3D-Faster R-CNN has both a recall rate of 95.7% for every nodule detected whereas the improved 3D-Faster R-CNN has had a recall rate of 96.9%.

Lung nodule classification
In this section, we test our classification model performance on the LIDC-IDRI dataset using the split concept of LUNA16. For lung nodule classification the proposed MsMc-CNN were used and, in that model, multi-scale architecture has three sections for the following convolutional filter sizes: 3 × 3, 5 × 5, 7 × 7, provides superior classification output relative to the configurations of the two separate categories. We explain the affectivity of the multi crop pooling strategy. Firstly, although multi crop pooling on feature-map helps to obtain higher classification accuracy so, we then apply multi crop pooling strategy to an image space directly. Rather than using maximum pooling within the multi crop pooling layer we then apply average pooling that simulate the procedure of image down sampling. Multi crop pooling implemented to input of the image patches, the data reduced an image resolution resulting in detail loss of all information and diminished classification accuracy. The performance of TsDN model was assessed according to sensitivity, specificity, accuracy and AUC and the achieved values are 0.885, 0.922, 89.66% and 0.946 respectively.

Comparison with previous approaches
In order to determine the performance of the proposed model against the relevant approaches, it is also compared with some other successful models with recent good results shown in [Table 1]. liu et al. [31] introduced a LeNet and AlexNet hybrid-CNN is designed by merging LeNet layer parameters and AlexNet parameter settings. Zhao et al. [32] designed a hierarchical learning system in which they used the VGG16 multi scale system to derive exclusionary features from alternate stacked layers. Tafti et al. [33] produced a new deep 3dimensional multi scale CNN method for the classification of benign and malignant nodule in CT pulmonary images. Yu Gu et al. [34] suggested a 3D Deep CNN for multi scale prediction lung nodules were detected after segmentation of the lungs from chest CT scans using a systematic process. A 3D CNN can use rich spatial 3D contextual knowledge relative to a 2D CNN and produce more discriminant information after being equipped with 3D sample to better reflect lung nodules. In addition, a prediction technique for the multi scale lung nodules, involving multi scale cube-clustering, is also suggested to detect exceedingly tiny nodules. Shen et al. [34] used Multi-scale CNN to detect heterogeneity of the nodule through extracting the discriminative characteristics by alternately stacked strata. In general, to measure nodule characteristics adequately their system uses multi scale nodule patches in order to acquire the range of specific class characteristics concurrently through sub setting reaction neuron activations acquired from. input scale at the last layer. The ROC curve of proposed TsDN and other methods are shown in [ Figure 6].

Conclusion
Deep learning framework is a growing analytical framework in the invention of disease predictive architectures. In this paper, we presented a TsDN deep learning model to resolve the challenging problem of malignancy in the lung nodule. Our TsDN contains two sections, lung nodule detection and classification. We designed an improved 3D-Faster R-CNN with U-Net-like encoder and decoder framework for the detection of lung nodule. The nodules found are then fed into classification part for the lung nodule classification. We use Multiscale Multi-crop Convolutional Neural Network (MsMc-CNN) to extract features for classification. To know more efficiently about local structures, the suggested MsMc-CNN uses convolutional multi scale layers to obtain features at various scales, we also demonstrate that with the multi crop pooling approach, the trained deep features were capable of capturing nodule salient details. Finally, our model is fully trained to classify the lung nodule into benign and malignant. The experimental result on LUNA16 and LIDC-IDRI show the enhanced performance of proposed TsDN system.