Sabtu, 08 Oktober 2011

Final Project proposal DESIGN OF LUNG CANCER DETECTION SYSTEMS USING X-RAY IMAGE SEGMENTATION AND ADAPTIVE NEURO FUZZY INFERENCE SYSTEMS (3)


5.    Research Metodology
5.1 Literature Review
       In lung cancer diagnosis, Imaging tests are performed to determine if a lung tumor is present.  Some imaging inspection can provide information that can help to determine whether a lung tumor is likely to be benign or malignant.  The final determination as to whether a tumor is cancerous can only be made by examining a tissue sample under a microscope.  Imaging tests are useful to look for enlargement of regional lymph nodes, which could indicate cancerous spread. A chest x-ray may also show enlarged lymph nodes, pneumonia, or blocked airways that are restrict air from reaching part of the lung. A lung tumor can be missed on chest x-ray if it is small or hidden behind a rib, collar bone, or the breastbone   
       Image segmentation can be applied in processing lung image from x-ray scan. Image segmentation divide images into its constituent regions. The level to which subdivision is carried depends on the problem being solved. Therefore, segmentation would stop when the object of interest in operations have been isolated. Some method for segmenting image are edge detection, line detecton using Hough transform, thresholding, region based segmentation, and watershed transformation.
To construct automatic detection whether lung cancer is present or not, ANFIS is used for Artificial Intelligence software. Adaptive-Network-based Fuzzy Inference System (ANFIS) is a Sugeno-like fuzzy system in a five-layered network structure. Back-propagation strategy is used to train the membership functions, while the last mean square algorithm determines the coefficients of the linear combinations in the consequent part of the model. Takagi and Sugeno type fuzzy if-then rules (TSK) are used in ANFIS model.



5.2 Data Collection
       In the proposed research, Scan result of lung image from X-Ray scan are collected as data input. Image processing stage is needed to convert image before it is used in ANFIS software for lung cancer prediction in imaging test phase.

5.3  Image Processing Phase
a.    Scanning
The purpose of scanning is to convert the original data to digital data. In the process of scanning, X-ray image of lung separated into left and right lung. This process is aimed to see the average detail of each side of the lung.
b.  Resizing
Scanned image of X-Ray lung cancer should be resized. Original image data will be resized in to 640 x 480 pixels. Objectives of image resizing is to reducing picture size and reducing time of processing.

c. Greyscaling
       Output from scanner can be loaded on software and detected as matrix. It will appear in the software matrix colour scale of x-rays. At this stage, grey scaling is needed to facilitate the computation on software by dividing RGB with 3.

e. Segmentation
The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain visual characteristics.

f. Normalization
Normalization is the process of dividing all grey scaling value matrix with the largest value of the matrix to make all images input in the software has the equal size though brightness levels from different input, so that the mean results may apply to all image. All numbers are in matrix normalization ranged from 0 to 1

Figure 3 Image Processing Flowchart



5.4 ANFIS Design and Validation
The objective of ANFIS design is to obtain most suitable premise and consequent parameters applied in the software. Normalized image of lung used as input in the software. Image input is divided into approaches with 3 membership functions.. In order to make sure that the designed model is valid or not, data training process obtained to find the smallest error, then premise and consequent parameters can be determined in FIS editor


5.5 Software Design and Validation
After smallet error approached, the validated model parameter’s is implemented to design lung cancer detection software and validate it by  comparing doctor's diagnosis and the results of prediction software. Objective of the software validation is to measure the accuracy of the ANFIS software to predict lung cancer based on imaging test.



Figure 4 Flowchart of ANFIS

6.  Schedule of Research
       Timeline of Research is shown in Table 1 below

Table 1 Research Schedule
No
Activities
Month
1
2
3
4
1
Literature Review




2
Data collection




3
Image processing software design




4
ANFIS software design




5
Training




6
Software validation




7
Data Analysis




8
Report Writing






















7References
1.        American Society of lung cancer. Lung Cancer Non-Small Cell Overview. American Cancer Society. 2011: 1.
2.        Dhillon D Paul, Snead David RJ. Advanced Diagnosis of Early Lung Cancer. 2007 : 57
3.        Floche. Background information Non-small Lung Cancer.[pdf]
(URL:http://www.roche.co.id/fmfiles/re7229001/Indonesian/media/background.library/oncology/lc/Lung.Cancer.Backgrounder.pdf accesed on July 30, 2011)
4.        Le Kim. Automated Detection of Early Lung Cancer and Tuberculosis Based on X Ray Image Analysis. International Conference on signal, speech, and Image Processing WSEAS. 2006. 110
5.        Al Daoud Essam. Cancer Diagnosis Using Modified Fuzzy Network. Universal Journal of Computer Science and Engineering Technology. 2010, 1(2) : 73
6.        Paryono Petrus. Citra Digital. [pdf]
(URL http://www2.ukdw.ac.id/kuliah/si/erickblog/...10E92/CitraDigital.pdf accessed on September 5, 2011)
7.        Feng Ding. Segmentation of Bone Structures in X-ray Images. School Computing.National University of Singapore. 2006 : 4
8.        Emy. Peningkatan Mutu Citra (Image Enhancement) pada Domain Spatial. 2007
9.        Kundra Haris, Verma Monika, Aashima. Filter for Removal of Impulse Noise by Using Fuzzy Logic. International Journal of Image Processing. 2005, 3(5) : 195-196
10.    Handburry Allan. A Short Introduction to Digital Image Processing. [html]
(URL http://cmm.ensmp.fr/~hanbury/intro_ip/ accessed on September 5, 2011)
11.    C Gonzales Rafaels, E Woods Richards, L Eddins Steven. Digital Image Processing using MATLAB.  Upper Sddle River. Pearson Prentice Hall
12.    Andhi Yudha M. Restorasi Citra Bintang Ganda Visual dengan Metode Blind Deconvolution Seddara. Institut Teknlogi Sepuluh Nopember : Engineering Physics Department,. 2010
13.    American Cancer Society. Lung Cancer. URL:http://cancer.org accesed on July 30, 2011
14.    American Society of Clinical Oncology. Guide to Lung Cancer. Alexandria. Conquer Cancer Foundation. 2011: 2.
15.    Anonymous. Kanker Paru Pedoman Diagnosis dan Penatalaksanaan di Indonesia. Perhimpunan Dokter Paru Indonesia. 2003
16.    Ayu Pradanawati Sylvia. Pengembangan Sistem Kecerdasan Buatan Berbasis Adaptive Neuro Fuzzy Inference System untuk Diagnosa Penyakit Kanker Paru. Institut Teknologi Sepuluh Nopember : Jurusan Teknik Fisika : 2010
17.    Reeve Dana. NCCN Guide Line for Patient. National Comperhensive Cancer Network. Fort Washington. 2010 : 9-11
18.    Kakar M, et all. Respiratory Motion Production by Using Adaptive Neuro Fuzzy Inference Systems (ANFIS). Institute of  Physics Publishing. 2005, 50 : 4722.
19.    Tahmasebi Pejman, Hezarkhani Ardeshir. Application of Adaptive Neuro-Fuzzy Inference System for Grade Estimation; CaseStudy, Sarcheshmeh Porphyry Copper Deposit, Kerman, Iran. Australian  Journal  of  Basic  and Applied  Sciences,  4(3): 2010 : 411
20.    Cruz Adriano. ANFIS : Adaptive Neuro Fuzzy Inference Systems. Mestrado NCE. 2006 : 6.








Final Project proposal DESIGN OF LUNG CANCER DETECTION SYSTEMS USING X-RAY IMAGE SEGMENTATION AND ADAPTIVE NEURO FUZZY INFERENCE SYSTEMS (2)


4.      Fundamental Theories

4.1 Image Definition
An image may be defined as a two-dimensional function,  f(x,y)  where x and y are  spatial coordinates, and the amplitude of any pair of the coordinates is called the intensity or gray level of the image at that point.When x, y, and the amplitude values of are all finite, discrete quantities, the image is called digital image. The field of digital image processing refers to processing digital images by digital computer. Note that a digital image is composed of a finite number of elements, each of which has a particular location and value. These elements are referred to as picture elements, image elements, pels, and pixels. Pixel is the term most widely used to denote the elements of a digital image [6].
Vision is the most advanced of human senses, so images play the most important role in human perception. However, unlike humans, who are limited to the visual band of the electromagnetic (EM) spectrum, imaging machines cover almost the entire EM spectrum, ranging from gamma to radio waves.They can operate also on images generated by sources that humans are not accustomed to associating with images. These include ultrasound, electron microscopy, and computer-generated images. Thus, digital image processing encompasses a wide and varied field of applications [7].

4.1.2 Digital Image Processing
Digital image processing is electronic data processing on a 2-D array of numbers. The array is a numeric representation of an image. A real image is formed on a sensor when an energy emission strikes the sensor with sufficient intensity to create a sensor output. The energy emission can have numerous possible sources (e.g., acoustic, optic, etc.). When the energy emission is in the form of electromagnetic radiation within the band limits of the human eye, it is called visible light. Some objects will reflect only electromagnetic radiation. Others produce their own, using a phenomenon called radiancy. Radiancy occurs in an object that has been heated sufficiently to cause it to glow visibly. Visible light images are a special case, yet they appear with great frequency in the image processing literature. Another source of images includes the synthetic images of computer graphics. These images can provide controls on the illumination and material properties that are generally unavailable in the real image domain [8].

4.1.3 Image Processing Operations
An image is digitalized to convert analog data in to digital data which can be stored in a computer's memory or on some form of storage media such as a hard disk or USB-Flash. This digititalization procedure can be done by a scanner, or by a video camera connected to a frame capturer board in a computer. Once the image digitalized, it can be operated by various image processing operations [9]. Image processing operations can be roughly divided into three major categories : (1) Image Compression, (2) Image Enhancement and Restoration, and (3) Measurement Extraction. Image compression is familiar to most people. It involves reducing the amount of memory needed to store a digital image .
Image defects which could be caused by the digitization process or by faults in the imaging set-up (for example, bad lighting) can be corrected using Image Enhancement techniques. Once the image is in good condition, the Measurement Extraction operations can be used to obtain useful information from the image. Some examples of Image Enhancement and Measurement Extraction are given below. The examples shown all operate on 256 grey-scale images. This means that each pixel in the image is stored as a number between 0 to 255, where 0 represents a black pixel, 255 represents a white pixel and values in-between represent shades of grey. These operations can be extended to operate on colour images [10]. Some basic operations in digital image processing are described below [11,12] :

1.    Image enhancement
Image enhancement is used to enhance an image by manipulating image parameters. From this operation, special characteristic of an image can be highlighted. Some examples of image enhancement operations are :
a. Contrast enhancement
b. Edge enhancement
c. Sharpening
d. Pseudocoloring
e. Noise filtering

2.    Image restoration
The objective of image restoration is to improve an image in some predefined sense. Although there are overlaped area between image enhancement and image restoration, image restoration is very important in objective process. The Restoration attempts to reconstruct or recover degraded image by using basic theories of the degradation phenomenon. Then, the restoration operation are concerned in modeling the degradation phenomenon and applying the inverse process in order to restore the real image. Some examples of image restoration are:
a. Deblurring
b. Noise removing

3.    Image compression
The purpose of image compression operations is to solve the problems in reducing the amount of data required to represent a digital image. Compression is achieved by removing one or more data redundancies e.g. (1) coding redundancy which is present when less than optimal code words are used, (2) interpixel redundancy which is results from correlations between the pixel of an image, and (3) psychovisual redundancy which is due to data that is ignored by the human visual systems

4.    Image segmentation
Segmentation is operations divide images into its constituens regions. The level to which subdivision is carried depends on the problem being solved. Therefore, segmentation would stop when the object of interest in operations have been isolated. One example is in the automated inspection of ectronic assemblies, interest lies in analyzing image of the products with the purpose in determine the specific anomalies in a product, such as missing components or broken connection paths. There is no point in carrying segmentation past the level of detail required to identify those elements

5.    Image Analysis
Objective of image analysis is to measure an image quantitatively and present its descriptions. This techniques extracts some characteristics similiar to the purpose in object identifications.  Sometimes, segmentation process is neccesary in localize the object. Examples of image analysis :
a. Edge detection
b. Boundary extraction
c. Region representations

4.2  Lung Cancer
Lung cancer is a disease characterised by uncontrolled cell growth in tissues of the lung. It is also the most preventable cancer.  Cure rate and prognosis depend on the early detection and diagnosis of the disease. Lung cancer symptoms usually do not appear until the disease has progressed. Thus, early detection is not easy. Many early lung cancers were diagnosed incidentally, after doctor found symstomps as a results of test performed for an unrelated medical condition [13].
There are two major types of lung cancer: non-small cell and small cell. Non-small cell lung cancer (NSCLC) comes from epithelial cells and is the most common type. Small cell lung cancer begins in the nerve cells or hormone-producing cells of the lung. The term “small cell” refers to the size and shape of the cancer cells as seen under a microscope. It is important for doctors to distinguish NSCLC from small cell lung cancer because the two types of cancer are usually treated in different ways. Lung cancer begins when cells in the lung change and grow uncontrollably to form a mass called a tumor (or a lesion or nodule). A tumor can be benign (noncancerous) or malignant (cancerous). A cancerous tumor is a collection of a large number of cancer cells that have the ability to spread to other parts of the body. A lung tumor can begin anywhere in the lung [13].

(a)

(b)
Figure 1 X-Ray image of (a) normal lungs and (b) lung cancer

Once a cancerous lung tumor grows, it may or may not shed cancer cells. These cells can be carried away in blood or float away in the natural fluid, called lymph, that surrounds lung tissue. Lymph flows through tubes called lymphatic vessels that drain into collecting stations called lymph nodes, the tiny, bean-shaped organs that help fight infection. Lymph nodes are located in the lungs, the center of the chest, and elsewhere in the body. The natural flow of lymph out of the lungs is toward the center of the chest, which explains why lung cancer often spreads there. When a cancer cell leaves its site of origin and moves into a lymph node or to a faraway part of the body through the bloodstream, it is called metastasis [14].       
       The stage of lung cancer is determined by the location and size of the initial lung tumor and whether it has spread to lymph nodes or more distant sites. The type of lung cancer (NSCLC versus small cell) and stage of the disease determine what type of treatment is needed.
      
4.2.1 Lung Cancer Classification

1.    Non-small cell lung cancer 
       About 85% to 90% of lung cancers are non-small cell lung cancer (NSCLC). There are 3 main subtypes of NSCLC. The cells in these subtypes differ in size, shape, and chemical make-up when looked at under a microscope. But they are grouped together because the approach to treatment and prognosis (outlook) are very similar [15].

a.    Adenocarcinoma
       About 40% of lung cancers are adenocarcinomas. These cancers start in early versions of the cells that would normally secrete substances such as mucus. This type of lung cancer occurs mainly in people who smoke (or have smoked), but it is also the most common type of lung cancer seen in non-smokers. It is more common in women than in men, and it is more likely to occur in younger people than other types of lung cancer. Adenocarcinoma is usually found in the outer region of the lung. It tends to grow slower than other types of lung cancer, and is more likely to be found before it has spread outside of the lung. People with one type of adenocarcinoma, sometimes called bronchioloalveolar carcinoma, tend to have a better outlook (prognosis) than those with other types of lung cancer.

b.    Large cell (undifferentiated) carcinoma
       This type of cancer accounts for about 10% to 15% of lung cancers. It may appear in any part of the lung. It tends to grow and spread quickly, which can make it harder to treat. A subtype of large cell carcinoma, known as large cell neuroendocrine carcinoma, is a fast-growing cancer that is very similar to small cell lung cancer

c.    Other subtypes
       There are also a few other subtypes of non-small cell lung cancer, such as adenosquamous carcinoma and sarcomatoid carcinoma. These are much less common.

2.    Small cell lung cancer 
       About 10% to 15% of all lung cancers are small cell lung cancer (SCLC), named for the size of the cancer cells when seen under a microscope. Other names for SCLC are oat cell cancer, oat cell carcinoma, and small cell undifferentiated carcinoma. It is very rare for someone who has never smoked to have small cell lung cancer. SCLC often starts in the bronchi near the center of the chest, and it tends to spread widely through the body fairly early in the course of the disease

4.2.2 Lung Cancer Risk Factor

1.    Tobacco smoke
       Smoking is by far the leading risk factor for lung cancer. Tobacco smoke causes nearly 9 out of 10 cases of lung cancer. The longer a person has been smoking and the more packs a day smoked, the greater the risk. If a person stops smoking before lung cancer starts, the lung tissue slowly repairs itself. Stopping smoking at any age may lower the risk of lung cancer. Cigar and pipe smoking are almost as likely to cause lung cancer as is cigarette smoking.  Smoking low tar or "light" cigarettes increases lung cancer risk as much as regular cigarettes. There is concern that menthol cigarettes may increase the risk even more since the menthol allows smokers to inhale more deeply. 
       Secondhand smoke: People who don't smoke but breathe the smoke of others may also be at a higher risk for lung cancer. Non-smokers who live with a smoker, for instance, have about a 20% to 30% greater risk of developing lung cancer. Non-smokers exposed to tobacco smoke in the workplace are also more likely to get lung cancer [16].

2.    Radon
       Radon is a radioactive gas made by the normal breakdown of uranium in soil and rocks. Uranium is found at higher levels in the soil in some parts of the United States. Radon can't be seen, tasted, or smelled. It can build up indoors and create a possible risk for cancer. The lung cancer risk from radon is much lower than that from tobacco smoke. But the risk from radon is much higher in people who smoke than in those who don't [16].

3.    Asbestos
       Asbestos exposure is another risk factor for lung cancer. People who work with asbestos  have a higher risk of getting lung cancer. If they also smoke, the risk is greatly increased. Both smokers and non-smokers exposed to asbestos also have a greater risk of developing a type of cancer that starts in the lining of the lungs (it is called mesothelioma). Although asbestos was used for many years, many countries has now nearly stopped its use in the workplace and in home products. While it is still present in many buildings, it is not thought to be harmful as long as it is not released into the air [16].

4.2.2 Lung Cancer Staging
       Lung cancer staging is the process of finding out how far a cancer has spread. Patient’s tratment and prognosis depend on the cancer stage.  Lung cancer staging can be described in TNM systems. The system used to describe the growth and spread of non-small cell lung cancer (NSCLC) is the American Joint Committee on Cancer (AJCC) TNM staging system. The TNM system is based on 3 key pieces of information:
·    T indicates the size of the main (primary) tumor and whether it has grown into nearby
areas.
·         N describes the spread of cancer to nearby (regional) lymph nodes. Lymph nodes are  small bean-shaped collections of immune system cells that help fight infections. Cancers often spread to the lymph nodes before going to other parts of the body.
·         M indicates whether the cancer has spread (metastasized) to other organs of the body. (The most common sites are the brain, bones, adrenal glands, liver, kidneys, and the other lung.)
       Numbers or letters appear after T, N, and M to provide more details about each of these factors. The numbers 0 through 4 indicate increasing severity [17].


4.3 ANFIS
       The ANFIS is the abbreviation for adaptive neuro-fuzzy inference system. Actually, this method is like a fuzzy inference system eith a back prpagation that tries to minimize the error. The performance of this method is like both Artificial Neural Network and Fuzzy Logic. In both ANN and FL case, the input passes through the input layer (by input membership function) and the output could be seen in output layer (by output membership functions). Since, in this type of advanced fuzzy logic, neural network has been used. Therefore, by using a learning algorithm the parameters have been changed until reach the optimal solution. Actually, in this type the FL tries by using the neural network advantages to adjust its parameters [18].
       Several  fuzzy  inference  systems  have  been  described  by  different  researchers  (Mamdani,  E.H.,  1974; Sugeno, M. and G.T. Kang, 1988; Sugeno, M. and K. Tanaka, 1991; Takagi, T. and M. Sugeno, 1985; Zadeh, L.A.,  1965).  The  most  commonly-used  systems  are  the Mamdani-type  and  Takagi–Sugeno  type,  also  known as  Takagi–Sugeno–Kang  type.  In  the  case  of  a Mamdani-type  fuzzy  inference  system,  both  premise  (if)  and consequent  (then)  parts  of  a  fuzzy  if-then  rule  are  fuzzy  propositions.  In  the  case  of  a  Takagi–Sugeno-type fuzzy  inference  system  where  the  premise  part  of  a  fuzzy  rule  is  a  fuzzy  proposition,  the  consequent  part  is a  mathematical  function,  usually  a  zero-  or  first-degree  polynomial  function  The  advantages  of FL  for  grade  estimation  is  clear  because  it  prepare  a  powerful  tool  that  is  flexible  and in  lack  of  data  with  its  ability which  is  if-then  rules would  able  to  solve  the  problems. As  discussed,  one  of the  biggest  problems  in  FL  application  is  the  shape  and  location  of  membership  function  for  each  fuzzy variable which  solve  by  trial  and  error method  only.  In  contrast,  numerical  computation  and  learning  are  the advantages  of  neural  network,  however,  it  is  not  easy  to  obtain  the  optimal  structure  (number  of  hidden  layer and  number  of  neuron  in  each  hidden  layer, momentum  rate  and  size)  of  constructed  neural  network  and  also this kind of artificial  intelligent  is more based on numerical computation rather that  than symbolic computation [19].
       Both FL  and NN have  their  advantages,  therefore,  it  is good  idea  to  combine  their  ability  and make  an  strong tool and also a  tool which  improve  their weak as well  as  lead  to  least error.  Jang  (1992, 1993) combined both FL  and NN  to  produce  a  powerful  processing  tool  named NFSs which  is  a  powerful  tool  that  have  both NN and  FL  advantages  and  the most  common  one  is ANFIS.


Figure 2 ANFIS structure

From Figure 2 above, neuro-fuzzy systems consist of five layers with different function for each layer. One layer is constructed from several nodes represented by square or circle. The Square symbolizes adaptive node. It means that value of parameter can be changed by adaption. The cirlce is non-adaptive node and has a constant value. [20] Equations for each value are described below :

a.  First Layer :
All nodes in first layer are adaptive node (changed parameter), node function for first layer is :

O1,i =  μAi 1/[(x-c)/a]^2b (x)     for i = 1,2
(1)

O1,i =  μBi-2  1/[(x-c)/a]^2b (y)    for i = 3, 4
(2)

Where x and y are input of node i, Ai or Bi-2 are membership functions of each input concerning fuzzy set of A and B. Used membership function is generalized bell type (gbell).

b.  Layer 2
       All nodes in this layer are non-adaptive (fixed parameter). Node function of second layer is :
    
O2,i = wi = μAi  (x) . μBi  (x), i = 1,2
(3)


Each output stated the firing strength of each fuzzly rule. This function can be expanded when the premises consist more than two fuzzy sets.

c.  Layer 3
All nodes in layer 3 are non-adaptive type which show normalized firing strength function. output ratio at node-i from previous layer toward all previous output. node function of layer 3 is :

O3,i = wibar = wi/(w1 + w2) , for i = 1,2
(4)

if more than two membership functions are constructed, function can be expanded by dividing with total number of w for all rules

d.  Layer 4
Each node in layer 4 is adaptive node with node function as follows:

O4,i =  fi =  (pix + qiy + ri )
(5)


Where w is normalized firing strength from layer 3 and p, q, and r parameters represent adaptive consequents paramaters

e.  Layer 5
In this layer, there is only one fixed node for summing all input, function of layer 5 is:
O5,i wibar fi = Σ wifi/wi
(6)

Adaptive network with five layers is equivalent with fuzzy inference systems of Takagi Sugeno Kang