CD Skripsi
Implementasi Metode Clustering Untuk Mengidentifikasi Daerah Genom Pembeda Varietas Kopi Liberika
Liberica coffee (Coffea liberica) is a type of coffee that can be grown in peatland and is resistant to disease. The Ministry of Agriculture has introduced two superior varieties of Liberica coffee: Liberoid Meranti 1 (Lim 1) and Liberoid Meranti 2 (Lim 2). Currently, Lim 1 and Lim 2 can only be distinguished when the plants are two years old based on fruit size. Therefore, a DNA-based identification method is needed for young plants. This study aims to identify genomic regions that differentiate Lim 1 from Lim 2, as well as differences between Liberica and Robusta species. It utilizes Single Nucleotide Polymorphisms (SNP), a type of genetic mutation involving a single base change. The mutation data is stored in a VCF file containing 3,766,805 SNP records. K-Means clustering is applied to group genomic regions based on genetic variation patterns. The best clusters are determined using the Silhouette Coefficient, which indicates that k=5 is the optimal number. Visualization with Boxplots shows that cluster 2 represents the genomic region distinguishing Liberica and Robusta coffee. Meanwhile, cluster 3 identifies the genomic region that differentiates Lim 1 and Lim 2.
Keywords: DNA, Genome, K-Means Clustering, Liberica Coffee, Mutation.
Tidak tersedia versi lain