Tavor Zvi Baharav
Ph.D. Student in Electrical Engineering, admitted Autumn 2018
All Publications
-
Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments.
Patterns (New York, N.Y.)
2020; 1 (6): 100081
Abstract
Pairwise sequence alignment is often a computational bottleneck in genomic analysis pipelines, particularly in the context of third-generation sequencing technologies. To speed up this process, the pairwise k-mer Jaccard similarity is sometimes used as a proxy for alignment size in order to filter pairs of reads, and min-hashes are employed to efficiently estimate these similarities. However, when the k-mer distribution of a dataset is significantly non-uniform (e.g., due to GC biases and repeats), Jaccard similarity is no longer a good proxy for alignment size. In this work, we introduce a min-hash-based approach for estimating alignment sizes called Spectral Jaccard Similarity, which naturally accounts for uneven k-mer distributions. The Spectral Jaccard Similarity is computed by performing a singular value decomposition on a min-hash collision matrix. We empirically show that this new metric provides significantly better estimates for alignment sizes, and we provide a computationally efficient estimator for these spectral similarity scores.
View details for DOI 10.1016/j.patter.2020.100081
View details for PubMedID 33205128
-
Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2020
View details for Web of Science ID 000627697000040
-
Ultra Fast Medoid Identification via Correlated Sequential Halving
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
View details for Web of Science ID 000534424303062