HyperSNP
Hypergraph Modeling for Large-Scale SNP Data Analysis
HyperSNP developed a hypergraph-based computational analysis system for identifying disease risk factors from large-scale Single Nucleotide Polymorphism (SNP) data and cohort datasets. The project applied hypergraph modeling to genome-wide association studies, enabling multi-locus interaction analysis that goes beyond conventional pairwise statistical approaches.
Overview
SNPs (Single Nucleotide Polymorphisms) are the most common form of genetic variation in the human genome. Identifying which combinations of SNPs act as risk factors for complex diseases requires modeling higher-order interactions among many genetic variants simultaneously — a task that pairwise graph models cannot adequately represent.
The HyperSNP project addressed this challenge by constructing a hypergraph-based analysis framework that models multi-SNP relationships as hyperedges (edges connecting more than two nodes). The system was applied to genome-wide SNP data as well as cohort study data to discover compound genetic markers associated with disease phenotypes.
The project produced two main system components:
- A research process pipeline for systematic hypergraph-based SNP analysis (illustrated in the project’s process diagram)
- A visualization interface for exploring the hypergraph structure of discovered genetic associations
Research Objectives
- Develop an analysis system for discovering disease risk factors using hypergraph-based computational models
- Implement feature selection and classification methods leveraging hypergraph representations of multi-locus genetic data
- Provide interactive visualization of important genetic factors identified through hypergraph analysis
- Apply the hypergraph analysis framework to genome-wide association studies (GWAS) and cohort data
Methodology
- Hypergraph modeling of genomic data: Represent multi-locus SNP relationships as hyperedges, capturing complex epistatic interactions that pairwise graph models miss
- Explorative analysis and visualization: Develop tools for visually navigating hypergraph structures to support biological interpretation
- Cohort-based SNP analysis: Apply the framework to real cohort datasets to discover statistically significant multi-SNP associations
- Comparative algorithm evaluation: Benchmark the hypergraph approach against existing genome-wide association analysis algorithms
- Compound marker discovery: Use hypergraph modeling to identify compound genetic markers composed of multiple interacting SNPs
Expected Outcomes
- Improved early detection of disease biomarkers
- Increased usability of analysis systems through integrated visualization
- Standardized analytical framework applicable to diverse large-scale genomic datasets
- Support for whole-genome analysis through efficient association algorithms
- Expanded collaboration with external research groups and international institutions
- Practical tools for large-volume bioinformatics data analysis
Research Team
| Role | Name |
|---|---|
| Principal Investigator | Prof. Byoung-Tak Zhang |
| Researcher | Je-Keun Rhee (Contact) |
| Researcher | Jung-Woo Ha |
| Researcher | Soo-Jin Kim |
| Researcher | Min-Su Lee |
Contact: Je-Keun Rhee — Phone: +82-2-880-5890 / Fax: +82-2-883-9120