An Identifying the Age of the Tiger Using Data Mining Techniques

Image mining is an astonishing to data mining concept. To understand the data mining concept prior knowledge is more important to image mining. Image mining deals with the extraction of implicit knowledge, image data relationships or other patterns not explicitly stored in the images. It is the process of analyzing large sets of domain-specific data and subsequently extracting information and knowledge in the form of new relationships, patterns, or clusters for the decision-making process. Tiger become a reserved animal. Conservation of tiger has been a challenging task. This work would add a small account to the herculean task of conserving the species. Several scientific researchers have carried out their research on the tiger reserve conservation. This research work proposes a method to find the age of the tiger, using color as a parameter. Color pixel-based image classification and clustering techniques has been used to identify the age of the tiger. This research work mainly focuses on RGB color spaces, which is implemented on the real time tiger images. The objective of the research work is to be done on assessing the age of the tiger using the color pixel-based image classification and clustering is the main task of the research work and to optimize the image filtering and enhancement methods that are used to remove the noise and to improve the quality of pixels or images and assessing the processing Time, Retrieval Time, Accuracy and Error Rate by generating the better results is real time tiger image database. 1


Introduction
Data mining is a demarcated with the nontrivial information to extraction of implicit, formerly unknown and hypothetically useful knowledge from a huge volume of actual data to the database. There are several data mining techniques, such as clustering, classification, association, regression and evaluation. Besides some techniques are discussed in this scenario. There are many useful methods like pattern recognition, time-series, OLAP, visualization and etc. It is an interdisciplinary endeavor that draws upon expertise from computer vision, image processing, image acquisition, image retrieval, data mining, machine learning, database and artificial intelligence. Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. Analysis of images will reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationships or other patterns not explicitly stored in the images [1] [2][3] [4]. It is the process of analyzing large sets of domain-specific data and subsequently extracting information and knowledge in the form of new relationships, patterns, or clusters for the decision-making process. This paper is focused on Fuzzy based Mountain clustering algorithm only. Fuzzy based Mountain clustering method is the last phase of this research work, it's moderately simple and more effective method. The Mountain clustering algorithm was proposed by Yager and Filve, in the year 1994. It is the simplest and effective method for approximate estimation of the cluster center based on the concept of a density function.
The research work is focused on the Mountain clustering algorithm which is modified and called the Fuzzy Based Mountain clustering algorithm. The Fuzzy based Mountain Clustering algorithm is work for the grid-based techniques, to employs the pixels of an image. The Fuzzy based mountain clustering algorithm works for the grid-based functions to applying the greater number of pixels in an image [5] [6].The Fuzzy based Mountain Clustering technique which determines the cluster centers by iterative destruction of the mountain function [7]. Destruction of the mountain function implies the reduction of potential values of each data point, which are nearer to the cluster center than a threshold value of pixels. This iterative reduction in the potential of all the data points concerning the cluster center leads to the loss of certain potential clusters. Because of iterative reduction, the potentiality of some of the data points which can become a potential cluster center reduces to a degree to an extent that they lose the potential to become cluster center and hence, to miss the corresponding cluster. The useful feature of the Modified Mountain Clustering is that its computational complexity is independent of the dimension and there is no need to specify a grid resolution [6].
Tiger is one of the largest animals in cat family. Tiger is the species among the felidae and categorized in the genus panthera group. Tiger is the national animal of India, Bangladesh, Malaysia, and South Korean. There are nine variety of subspecies of tiger's family likes Bengal, Siberian, Indochinese, South Chinese, Sumatran, Malayan, Caspian, Javan and Bali. The last three are extinct, Malayan tiger is extinct in the wild, and other subspecies is endangered. Tiger become a reserved animal. Conservation of tiger has been a challenging task. This work would add a small account to the herculean task of conserving the species. This work proposes an algorithm from which the age of the tiger can be inferred. This work combines the domain of image processing with data mining to infer the age of tiger. Taking into account the conservation of the species, this algorithm would help to forecast the crisis of conserving this species from extinction. Image processing techniques like image enhancement and segmentation plays a vital role in mining the image of the tiger. The image processing is complemented with data mining to find the age of tiger, where data mining plays the role of analyzing the statistical report of confirming the age of the tiger. Several scientific researchers have carried out their research on the tiger reserve conservation. This research work proposes a method to find the age of the tiger, using color as a parameter. Color pixel-based image classification and clustering techniques has been used to identify the age of the tiger. This research work mainly focuses on RGB color spaces, which is implemented on the real time tiger images [8]. Several clustering techniques are applied to infer the age of the tiger using a tiger image database. This thesis work, the most popular methods like K-Means clustering, ISODATA clustering, DBSCAN clustering and Mountain clustering methods are taken into concern. The enhancement of these four methods has been proposed for the tiger image database. Like Fuzzy Modified K-Means clustering, Fuzzy Based ISODATA clustering, Fuzzy Based DBSCAN clustering and Fuzzy Based Mountain clustering methods are discussed. The primary objective of this thesis is to predict the age of the tiger and it is supported with camera trap images and other various sources of collective images in real world. The objective of the research work is to be done on assessing the age of the tiger using the color pixel based image classification and clustering is the main task of the research work and to optimize the image filtering and enhancement methods that are used to remove the noise and to improve the quality of pixels or images and assessing the processing Time, Retrieval Time, Accuracy and Error Rate by generating the better results is real time tiger image database [9].
The image may be corrupted by random variations in intensity, variations in illumination, or poor contrast that must be dealt with in the early stages of image processing. The parameters are analyzing the real time tiger images to get an effective result by comparing the methods like PSNR values, MSE, MNCC, AD, SC, MD values and applying salt and pepper noise. Same parameters are used in the AHE and CLAHE methods. The Research work is based on the color pixel-based classification and clustering task. Clustering method plays the role of separating the pixels of the same color. For clustering k-means algorithm is used. This process will be completed to infer the age of the tiger, it helps to color pixel-based classification and clustering task is supported to find the age of the tiger. The clustering methods like Enhanced K-Means clustering Algorithm, Fuzzy Based algorithms as DBSCAN/ ISODATA/Mountain Clustering methods are used for carrying our proposed work and clustering the results to generate the better result [10] [11].The Wiener filter is a filter that has been proposed by Norbert Wiener during the year of the 1940s and published in the year of 1949. The main goal of the Wiener filter is to reduce the amount of present noise in the image. A Wiener filter is not an adaptive filter, because the theory following this filter assumes that the inputs are stationary. The unwanted noise is removed from images or pixels using Wiener filter techniques. The Wiener filter is based on Mean Square Error (MSE). It is an optimal stationary linear filter for images degraded by additive noise and blurring. Wiener filters are often applied in the frequency domain and execute an optimal trade-off between inverse filtering and noise smoothing. It removes the additive noise and inverts the blurring at the same time. The process of image enhancement comprised of group of methods which are utilized to enhance the visual appearance of the image. It is a process which improves the visual quality and complete appearance of image to extract the spatial information of the image. The important functionality involves in interpretability or perception of image information according to the view point of humans. It produces image output that subjectively better than the original image by altering the pixel's intensity of the input image under consideration. The vital role of image enhancement methods focused on images or pixels. These techniques are applicable from the real-world applications in which remote sensing, high-definition television, X-ray, and other pattern recognition systems are used. In this research work, image enhancement is used to improve the quality of image or pixels. In an image enhancement technique used a method called as CLAHE. It is mainly used for successfully enhancing low contrast tiger images.

Review of literature
Many works are published in the area of image segmentation by using innumerable methods and many of those literatures based on the different applications of image segmentation. Kmeans clustering algorithm is one of the simplest clustering algorithms, of which many approaches are implemented differing in the process to initialize the center. A lot of researcher has been done in this domain, which gives better-segmented results. Some of the recent methodologies proposed are as follows: Nameirakpam Dhanachandra et al. 2015, presented the segmentation of the image classification using different clustering methods and one of the most popular K-Means clustering algorithms are used in the segmentation process [7]. When this technique has been applied to the RGB color spaces have been processed on color image segmentation using K-Means clustering algorithm. The process that can be handled with the pixel value of the images is transformed using a cumulative distribution function [4]. K.A. Abdul Nazeer et al. 2014, described a modified algorithm to improve the accuracy and effectiveness of the K-Means clustering algorithm [9].Namrata et al. 2013, revealed contour analysis is a technique to describe the store, associated, [1] to find the object presented in the form of peripheral outlines and also to solve the main problem of a pattern recognitiontransposition, turn and a rescaling of the image object. S.M. Aqil Burney et al. 2014, followed by the K-Means cluster analysis for image segmentation process. As to perform the K-Means cluster process and to predict the number of clusters in the segmentation process to evaluate the matrix [9]. To calculate the K-cluster on color spaces into K>3 in RGB and L*a*b feature color spaces and the validation task to be done on the Ground Truth (GT) operation. L * a* b space is better in comparison with RGB color space for clustering in precision, recall space [12]. The work mainly desired the Ground Truth to estimate the performance of K-Means.
ISODATA method was developed by Ball & Hall, and others in the 1960s. Md. Sohrab Mahmud et al. 2012, had proposed an algorithm to compute the well initial centroids based on the heuristic method. The proposed clustering algorithm results in highly accurate clusters with a decrease in computational time. In this algorithm, to initially compute the usual score of each data point that consists of multiple attributes and weight factors [8]. Merge sort is applied to sort the output that was previously generated [13]. The data points are divided into k cluster (i.e.) to the number of the desired clusters. Finally, the nearest possible data points of the mean are taken as the initial centroid. Experimental results showed that the algorithm overcomes the number of iterations to allocate the data into a cluster [14]. But the algorithm still deals with the problem of transfer quantity of the desired cluster as input. M. Merzougui et al. 2013, had proposed an image segmentation method is based on pixel classification using the ISODATA clustering algorithm and parameter estimation to evolution strategies in the applications of quality control [13][14] [15]. This algorithm is also used with unsupervised data classification methods [15]. Oka Sudana et al. 2018, presented the Image clustering of complex Balinese character with DBSCAN algorithm, unique in its almost identical form, and some writings are eminent by a single line stroke [14]. The CBIS method (Cluster-based image segmentation) uses multi-dimensional data to group image pixels into multiple clusters. Generally, those pixels are clustered based on pixel distance proximity [11]. Kamran Khan et al. 2012, presented the summary information of the similar improvement of a density-based clustering algorithm called as DBSCAN [10] [11]. The purpose of these differences is to boost the DBSCAN to get the well-organized clustering results from the important datasets. Nishchal K. Verma et al. 2009, proposed An Enhanced Mountain Clustering (EMC) algorithm is based on the medical image segmentation. The enhanced method is a more powerful techniques to X-Ray imagebased diagnosing diseases like lung cancer and tuberculosis [5].
The piece of all these segmentation methods is compared in terms of cluster entropy as a measure of information. The segments obtained from the methods have been verified visually. Junnian Wang et al. 2011, proposed an improved mountain clustering algorithm based on hill valley function. Data space, which estimates the parameter of a correlation self-comparison method, and database's mountain function values are computed [14]. The maximum value of mountain function is selected to the cluster center points in the cluster. The mountain function is also constructed in the data space, with estimates the parameter by a correlation of selfcomparison method [5]. The algorithm proposed could obtained the cluster center numbers, cluster centers and the data patterns belong to each cluster centers automatically and accurately.

Methodology
The clustering performance of the original mountain method strongly depends on the grid resolution, with finer grids giving better performance. As the grid resolution is increased, however, the method becomes computationally expensive. Moreover, the original mountain method becomes computationally inefficient when applied to high dimensional data because the number of grid points required increases exponentially with the dimension of data. The Fuzzy Based Mountain clustering algorithm is shown in the [ Figure 1]: Figure 1. Fuzzy based mountain clustering algorithm

Fuzzy based mountain clustering algorithm
The modify mountain clustering algorithm is used to update the adopt boost classifier algorithm. Fuzzy Mountain clustering method approximates clusters center based on density measure of Data Mountain clustering method can be used either as a stand-alone algorithm or for obtaining initial cluster centers, which are required by more sophisticated cluster algorithm, such as FMC. The mountain clustering technique is a grid-based method for determining the approximate locations of cluster centers in data sets with clustering tendencies. The efficient approach to the approximate estimation of cluster centers on the source of a density measure called the mountain function. The rules that are associated with higher values of the peaks of the mountain function determined. From the centers of the clusters that are obtained by the mountain function, the grid process is determinant of the initial estimates of the parameter of the reference antecedent and resultant fuzzy sets of the principles. The method is based on what a human does in a visually forming cluster of a data set.
Where  is a mountain peak radius and the i th data point, is an application specific constant implies that each data point contributes to the height of the mountain function at v, and the contribution is inversely proportional to the distance between and v. The mountain function can be viewed as a measure of data density. The constants determine the height as well as the smoothness of the resultant mountain function. This procedure of updating the mountain capacity and decision to the following bunch focuses proceeds until the point when an adequate number of group focuses are accomplished. m is the number of clusters. Calculate the potential value of each data points of mountain clustering is and the distance of each points is , other data points are calculated with equation (3).
Where the equation (4) is denoted with first data points from the distance metrics from the first cluster is less than threshold value of .
Remove all those data points from the total dataset which are assigned to the cluster formed. Repeat all steps for the remaining data to make successive clusters. Similarly for selection of ( ) cluster center, revision of potential value is done for the reduced dataset and ( )cluster center is selected with the highest value of as * Where  is mountain radius, Re-estimate mountain function. * * <

(5)
Form required number of the cluster * , using Steps 2 to 5 and then separate out these clusters from the whole dataset. Rest of the data points are distributed among the formed clusters depending upon their Euclidean distances, i.e., nearness to the respective cluster centers. Where :the threshold value for the color pixels and to stop the cluster center selection. The above figure shows that the architecture diagram for the proposed clustering method as FBMCA. The image under consideration is a TIGER image database, which includes the querying facility namely the CBIR (Color Based on Image Retrieval). After the image has been retrieved the pre-processing operation begins in which the filtering techniques or feature extraction methods are applied. This method is used for improving the image or pixel quality of the real-time TIGER image database. The clustering method then plays the role of separating the pixels of the same color. For clustering proposed FBMC algorithm is used. It clusters to classify the color pixels to infer the age of the tiger. When this process is over, to retrieval the same image from the image database or relevant images are retrieved using distance-based similarity measures function to help that retrieve the same group of images as clustered. Color feature is an essential component for image retrieval for huge image databases. The image retrieval process is using the color feature is more successful and effectively.
The tiger images are converted into fuzzy representation based on color characteristics. Then generate an arbitrary fuzzy partition. Computer the color histogram of each image and determine the cluster density. Then apply the strong uniform fuzzy partition concept. To compute the mountain function is based on the density measures of the clusters. Adjust the mountain function and update the cluster density and recomputed the mountain function till the entire tiger image dataset falls under a specific cluster.

The results of FBMCA in tiger image database
The database image used in the proposed model is the tiger image database and it helps to implement it to the Matlab tool. This database contains 500+ camera trap images in all formats and also the different sizes of images are used. It consists of only one class which includes the different age groups of tiger images. The retrieval effectiveness of the proposed method is measured by using a different kind of age group in a single class. The proposed clustering methods are used on the color feature to an extraction process to obtain the value of vector in RGB based in a base image and calculate the difference in distance using the formula to a similarity metric.
The precision, recall, and F-measure are used to measure the performance based on age-wise image retrieval to the image database. Where N represents with the number of retrieved images, and the irrelevant images are represented with B, while the C is represented with the number of relevant images that are not retrieved.

Computational complexity
Computational complexity refers to the number of steps involved in the algorithm of a clustering method. The computational complexity of various clustering methods is compared to obtain their relative efficiency in terms of time complexity i.e., O (.). The number of steps involved in clustering via Fuzzy Based Mountain clustering is less than other clustering techniques.
Where N represents the total number of color pixel, m represents the number of clusters, and r is number of iterations on t. The improvement in computational complexity of modified mountain clustering is achieved to a greater extent as the clusters are being made from the dataset, they are removed from the original data set and consequently we achieve reduced computation time for each successive cluster. A comparison table for the computational complexity is given. This can also be realized by looking at equation. The above [ Figure 3] shows the graph plot of each record based on the distance value and the formation in k=2, k=4, and k=6 cluster using the proposed algorithm to be implemented in tiger image database. When each cluster is plotted in different colors, the formation of k=2, 4, and 6 cluster is displayed on different colors shown in [ Figure 3].

Calculate the pixels on image
Pixel is the smallest element of an image. Each pixel corresponds to any one value. In an 8bit gray scale image, the value of the pixel between 0 and 255. Each pixel store a value proportional to the light intensity at that particular location of x and y. The mathematical formula to calculate the pixel value is as: Where x denotes the rows and y the columns coordinates that the total no of pixel to be calculated. The value of the color pixel at any point denotes the intensity of color image at that location, in more detail about the value of the pixels in the image storage and bits per pixel value of color image. Each pixel can have only one value and each value denotes the intensity of light at that point of the image.

Accuracy calculation and predict the age of the tiger
The accuracy is the difference between the true value and the mean value of the underlying process that produces the data. The cells located in the any diagonal of the error matrixes are the number of correctly classified pixels ( ). A measure for the overall classification accuracy that can be derived from this original image by counting the how many pixels were classified as the same age in the tiger image database and the ground (∑ ), dividing these values by total number of pixels ( = ∑ = ∑ ). The following equation is given below.
Where:∑ represents the total number of correctly classified pixels, N represents the total number of pixels in the error matrix. The fabricator accuracy is a reference-based accuracy that is computed by reviewing the predictions produced for a class and by establishing the percentage of corrected prediction.

= (12)
Where represents with the number of correctly classified pixels in row j, represents with the total number of pixels in row j. The user accuracy is an accuracy is a plot-based accuracy that is computed by reviewing the reference data for a class establishing the percentage of corrected predictions for these samples. The following equation given below.

= (13)
Where represents with the number of correctly classified pixels in column i, represents with the total number of pixels in column j.

= ∑ (14)
Where 1: i=j=k, 0: ≠ ≠ , i, j, k > 0. To predict the age of the tiger, the values of thresholding for color pixel classification with clustering is based on to infer the age of the tiger. This value is based on training dataset of the tiger image database and compared with real time captured camera trap image of tiger in the wildlife forest.
Where d represents with the Euclidean distance, N represents the total number of pixels in an image, m for the number of RGB classes, ∑ the total number of correctly classified pixels in a tiger image. Besides, threshold value of each color pixel has set to an individual tiger, when the process to select the particular age of the tiger image. , Represents with the number of pixels in the row and column.

Root-Mean-Square Error (RMSE) calculation
The root-mean-square error (RMSE) is a frequently used measure use to discover the differences between values predicted by a model or an estimator and the values observed.
RMSE is the measure of difference between the actual values and the predicted values ̂.  The above [ Table 1] shows the prediction of the age of the tiger using measures the similarity based clustering accuracy and using some clustering metrics as precision, recall and f-measure and find out the similarity functions as city block, Chebychev distance, Minkowski, and Euclidean distance and it helps to predict the age of the tiger image. The experimental results are shown in [ Figure 4].  According to the [ Table 2], the data is classified year wise. The number of clusters are taken as 3 uniformly. In the 2 nd year, the highest precision is recorded as 0.96 and the lowest precision is 0.94. The highest Recall is 0.97 and the lowest Recall is 0.94. The highest f-measure is 0.965 and the lowest is recorded as 0.945. When each similarity measures are compared, the similarity measures in the table are slightly elated. The highest precision 0.96 and highest recall 0.97 and the highest f-measure is 0.965 in Euclidean and the lowest is found in city block. The above [ Table 2] shows the prediction of the age of the tiger using measures the similarity based clustering accuracy and using some clustering metrics as precision, recall and f-measure and find out the similarity functions as city block, Chebychev distance, Minkowski, and Euclidean distance and it helps to predict the age of the tiger image. The experimental results are shown in [ Figure 5]. According to the [ Table 3], the data is classified year wise. The number of clusters are taken as 3 uniformly. In the 15 year, the highest precision is recorded as 0.96 and the lowest precision is 0.91. The highest Recall is 0.97 and the lowest Recall is 0.90. The highest f-measure is 0.96 and the lowest is recorded as 0.92. When each similarity measures are compared, the similarity measures in the table are slightly elated. The highest precision is 0.96 Chebychev and highest recall 0.97 and the highest f-measure is 0.96 in Euclidean and the lowest is found in city block and Minkowski. The above [ Table 3] shows the prediction of the age of the tiger using measures the similarity based clustering accuracy and using some clustering metrics as precision, recall and f-measure and find out the similarity functions as city block, Chebychev distance, Minkowski, and Euclidean distance and it helps to predict the age of the tiger image. The experimental results are shown in [ Figure 6].   Table 4]. The accuracies offered by each existing and proposed algorithm are tabulated in [Table 5] accuracy of both existing and proposed clustering algorithms, RMSE value, time calculation, and image retrieval time. The proposed algorithms, when executed results in high performance, the clustering results is displayed in plot format and then result is much effective and efficient. The results of the proposed methods have the highest accuracy rate on FBMC.  Existing Proposed

Performance Comparison
The above [ Figure 8] shows that the overall performance of Accuracy, RMSE, Time, and Image Retrieval on tiger image database and as to compare with the proposed and existing methods on Accuracy, RMSE, Time, and Image Retrieval. The proposed techniques chart is on [ Table 5].

Conclusion
The ultimate goal of the process is to predict the age of the tiger is based on the image databases. This paper work is mainly focused on the proposed methods that have collected the more than 500+ real-time tiger images are collected in the wildlife forest, the different kinds of images of different adult tiger images were tested. The image is differentiated by colors. Clustering is done on the different age groups of tigers and with different skin colors and stripes. It is segmented based on different ages and colors of the tiger. By clustering, each image is grouped by its difference in age and color. Fuzzy Clustering models as discussed in the above section are involved in age detection of tiger based on their color of the image of tigers. The proposed algorithms are also more robust. Experiments on real images show the efficiency in terms of both accuracy and computation time of the proposed algorithm is compared to recent methods of the results in high performance. The clustering result is much effective and efficient and discussed in results section.