Research Projects
Deep Machine Learning for Protein Structure Prediction, Protein-Ligand Binding Scoring Function
We will cover 1) Novel LISF scoring function for rapid and accurate protein-ligand protein-protein binding affinity prediction and protein structure prediction; 2) High-throughput virtual screening for lead discovery, lead optimization and drug discovery. 3) Protein design. Students participating in this project will broaden their skills in developing algorithms and software supporting a range of scientific endeavors on classical and other advanced computing architectures. Students participating in this project must have taken a graduate level course in computational chemistry and proficient in python.
Keywords: Excited-state dynamics, fragment-based quantum chemistry method, green fluorescent protein
Mentors: Kenneth M. Merz Jr, Professor of Chemistry, MSU; and Xiao He, Professor of Chemistry and Molecular Engineering, East China Normal University
Deep Representation Learning in Image Processing
Images are one of the most useful forms of data in our daily lives, and rapid progress in imaging technologies has resulted in an explosion in the number of images captured. Due to the complex structure of images, it is difficult to develop a universal mathematical theory for solving real-world imaging problems. Mathematically, deep learning is a nonlinear mapping with multi-layer architecture, each layer basically consists of the composition of convolution, pooling, activation. Despite the numerical success of deep learning, there are still many open problems. In this project, the goal is to equip the underlying physical property of different imaging methods with deep learning framework.
Students participating in this project can gain a comprehensive study of modern imaging technology and the high performance computing in machine learning. The interdisciplinary study experience will help them find more interesting and deeper observations in imaging sciences which might lead to new discoveries. Secondly, students can gain familiarity with using high performance computing for deep learning by designing distributed and parallel algorithms for supercomputers. This unique experience is very important for their future research in data science.
Students applying to this project must have taken the following courses: Calculus, Analysis, Linear Algebra and be proficient in programming in C/Python. Coursework in Read Analysis, Fourier Analysis, Wavelet and Numerical Optimization is a bonus but is not required.
Keywords: Deep learning, large scale optimization, physics and data driven imaging methods
Mentors: Ming Yan, Assistant Professor in MSU's Department of Computational Mathematics, Science and Engineering and the Department of Mathematics; and Chenglong Bao, Professor in the Tsinghua Univerisity's Yau Mathematical Sciences Center
Mathematical deep learning for drug discovery
A major trend of biological sciences in the 21st century is their transition from quantitative, phenomenological and descriptive to a quantitative, analytical and predictive. Fundamental challenges that hinder the current understanding of biomolecular structure-function relationships, which is the central theme of biological sciences, are their tremendous structural complexity and excessively large datasets. These challenges call for innovative strategies.
Modern mathematical methods, such as those based on differential geometry, algebraic topology and graph theory, are able to provide high-level abstractions of biomolecular systems. However, these methods were rarely properly applied to the analysis of massive and diverse biomolecular datasets. The PI has recently made a paradigm-shift progress on devising modern mathematics for biomolecular data analysis. Specifically, the PI has developed algebraic topology and graph theory based methods to win a number of contests in two recent D3R Grand Challenges, a worldwide competition series in computer-aided drug design, which ultimately tests our understanding of the biomolecular world and brings a direct benefit to human health (https://doi.org/10.1007/s10822-018-0146-6).
In the proposed project, we will integrate mathematics (algebraic topology, differential geometry and/or graph theory) and deep learning for drug design and discovery. The objective of the present project is to develop new mathematics (such as de Rham cohomology and Hodge theory) based approaches to revolutionize the current practice in biomolecular data analysis and modeling. The proposed methods will be extensively validated on a variety of datasets, such as protein binding to protein, ligand, DNA and RNA, protein folding stability changes upon mutation, drug toxicity, solvation, solubility, and partition coefficient. User-friendly software packages and online servers will be developed using parallel and GPU architectures for researchers who are not formally trained on mathematics or machine learning.
Keywords: Mathematical/computational biophysics
Mentor(s): Guo-Wei Wei, Professor in MSU's Department of Mathematics; Associate Professor, Electrical & Computer Engineering; Professor, Biochemistry & Molecular Biology
Potential Methods for Improving weather and climate prediction capabilities: Combining high-precision simulations & big data analytics
Numerical weather and climate modeling have long been key applications for supercomputers. Although increasing computing power over the years has helped to improve the simulation resolution as well as the size of ensembles for performing weather and climate modeling, the accuracy of weather and climate predictions is still quite limited. Deep learning techniques, as well as other big data methods, have demonstrated their potential in various application domains. In this project, we explore the potential benefits of combining high-resolution simulation with deep learning-based data analysis. The major goal is to improve either the prediction accuracy or the validity period of the prediction. The tasks for graduate students involved in this project include: (1) perform high-precision weather or climate simulations using the existing software on Sunway TaihuLight, and resolve the performance bottlenecks when possible; (2) utilize the deep learning framework on Sunway TaihuLight to perform data analysis tasks of observation and reanalysis of data; (3) explore potential methods of combining the deep learning data analysis parts into the simulation workflow, so as to improve the prediction capabilities in either weather or climate scenarios.
Students participating in this project should have a basic understanding of computer architecture and parallel computing. Applied domain knowledge and MPI experience is also acceptable experience.
Keywords: Climate modeling; weather forecasting
Mentors: Wei Xue (Tsinghua University) and Haohuan Fu (Tsinghua University)
Design and Development of a Parallel Programming System for Processing Big Genome Data on Sunway TaihuLight
Next-generation sequencing (NGS) technologies have led to the sequencing of more and more genomes, propelling related research into the era of big genome data. Our project aims to develop a new open-source C++ parallel programming system on Sunway TaihuLight supercomputer to process big genome data generated by NGS in a highly efficient manner. To facilitate applications, our system will first be deployed in collaboration with interdisciplinary project partners in the areas of metagenomics and individualized cancer therapy. More specifically, we will investigate the parallel construction of a number of popular full-text indexing data structures such as the enhanced suffix array (ESA), the enhanced sparse suffix array (ESSA) and the FM-index, on Sunway TaihuLight and other emerging parallel architectures. Based on these indexing data structures, we will further investigate associated pattern search algorithms and dynamic programming-based alignment algorithms, on the aforementioned architectures. Our system will provide a consistent, high-level programming interface for the different types of processing units in order to facilitate integration with existing programs. Its high-level interface enables our system to greatly enhance the productivity of developers, while enabling performance portability. Furthermore, on a heterogeneous computer with accelerators, our system will have the capability of autonomously selecting the "most efficient" parallelization, which may run on any of the different types of processing units or even on some hybrid combination at runtime. In this fashion, while deploying the parallel algorithms from the system, developers do not need to care about the details of the underlying hardware configuration, and thereby can pay more attention to the development of other parts of a program. In addition, we will highly optimize the code to achieve faster speed.
Students participating in this project should have a basic understanding of computer architecture and parallel computing. Applied domain knowledge and MPI experience is also acceptable experience.
Keywords: 'Omics, Big Data
Mentor: Weiguo Liu (Shandong University)
Large scale simulation of brittle crack propagation in heterogeneous material
Heterogeneous materials have two or more constituent materials with significantly different properties and exhibit much improved qualities than either of the constituent materials. The behavior of a large composite component undergoing complex external loading is difficult to predict since it is far too complex to consider all the microstructures at the same time with traditional simulation methods like the Finite Element Method (FEM). However, with the computational power provided by supercomputers and advanced computational methods like MultiGrid, we can take into account the real structure in numerical simulations. Previous research has shown that material damage comes from a micro defect. Hence, the purpose of this research is to investigate the brittle crack propagation in a large heterogeneous component at micro-structural scale using MultiGrid method on the supercomputer. The major role of the graduate students involved in this project will be to investigate the methods and techniques for achieving better scalability and efficiency of MultiGrid methods, for supercomputers with many-core architectures, such as Sunway TaihuLight.
Students participating in this project should have a basic understanding of computer architecture and parallel computing. Applied domain knowledge and MPI experience is also acceptable experience.
Keywords: heterogeneous, brittle crack, large scale, supercomputer, MultiGrid
Mentor: Hanfeng Gu (NSCC-Wuxi)
Simulation of hydraulic fracturing with Lattice Boltzmann Method
The Lattice Boltzmann Method is suitable for simulating flows in porous media, like hydraulic fracturing. It can handle complex geometry easily and has many multiphase/multicomponent models to simulate multiphase flows. Instead of solving the Navier–Stokes equations, the discrete Boltzmann equation is solved to simulate the flow of a Newtonian fluid with collision models such as Bhatnagar–Gross–Krook (BGK). By simulating streaming and collision processes across a limited number of particles, it runs efficiently on massively parallel architectures. SWLBM is a LBM solver designed to run on Sunway TaihuLight Supercomputer system. Under development, 100 billion-scale mesh problems have already been successfully simulated. Hydraulic fracturing could be a topic for its application.
Students participating in this project should have a basic understanding of computer architecture and parallel computing. Applied domain knowledge and MPI experience is also acceptable experience.
Keywords: Hydraulic fracturing, algorithms, parallel computing
Mentor: Xuesen Chu (NSCC-Wuxi)