Professional Education
-
Doctor of Philosophy, University of Iowa (2023)
-
MS, University of Iowa, Computer Science (2018)
-
BE, Harbin Institute of Technology, Bioinformatics (2016)
All Publications
- Provable Multi-instance Deep AUC Maximization with Stochastic Pooling International Conference on Machine Learning (ICML) 2023: 43205-43227
- Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity International Conference on Machine Learning (ICML) 2023: 43289-43325
- When auc meets dro: Optimizing partial auc for deep learning with non-convex convergence guarantee International Conference on Machine Learning (ICML) 2022: 27548-27573
- A robust zero-sum game framework for pool-based active learning international conference on artificial intelligence and statistics (AISTATS) 2019: 517-526
- Libauc: A deep learning library for x-risk optimization ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023
- Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. Conference on Neural Information Processing Systems (NeurIPS) 2023
- Deep unsupervised binary coding networks for multivariate time series retrieval AAAI Conference on Artificial Intelligence 2020
-
deBWT: parallel construction of Burrows-Wheeler Transform for large collection of genomes with de Bruijn-branch encoding
OXFORD UNIV PRESS. 2016: 174-182
Abstract
With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows-Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for large collection of genomes, especially for highly similar or repetitive genomes. Moreover, the state-of-the-art approaches cannot well support scalable parallel computing owing to their incremental nature, which is a bottleneck to use modern computers to accelerate BWT construction.We propose de Bruijn branch-based BWT constructor (deBWT), a novel parallel BWT construction approach. DeBWT innovatively represents and organizes the suffixes of input sequence with a novel data structure, de Bruijn branch encoding. This data structure takes the advantage of de Bruijn graph to facilitate the comparison between the suffixes with long common prefix, which breaks the bottleneck of the BWT construction of repetitive genomic sequences. Meanwhile, deBWT also uses the structure of de Bruijn graph for reducing unnecessary comparisons between suffixes. The benchmarking suggests that, deBWT is efficient and scalable to construct BWT for large dataset by parallel computing. It is well-suited to index many genomes, such as a collection of individual human genomes, with multiple-core servers or clusters.deBWT is implemented in C language, the source code is available at https://github.com/hitbc/deBWT or https://github.com/DixianZhu/deBWTContact: ydwang@hit.edu.cnSupplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btw266
View details for Web of Science ID 000379734300020
View details for PubMedID 27307614
View details for PubMedCentralID PMC4908350