All Publications


  • Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments. Patterns (New York, N.Y.) Baharav, T. Z., Kamath, G. M., Tse, D. N., Shomorony, I. 2020; 1 (6): 100081

    Abstract

    Pairwise sequence alignment is often a computational bottleneck in genomic analysis pipelines, particularly in the context of third-generation sequencing technologies. To speed up this process, the pairwise k-mer Jaccard similarity is sometimes used as a proxy for alignment size in order to filter pairs of reads, and min-hashes are employed to efficiently estimate these similarities. However, when the k-mer distribution of a dataset is significantly non-uniform (e.g., due to GC biases and repeats), Jaccard similarity is no longer a good proxy for alignment size. In this work, we introduce a min-hash-based approach for estimating alignment sizes called Spectral Jaccard Similarity, which naturally accounts for uneven k-mer distributions. The Spectral Jaccard Similarity is computed by performing a singular value decomposition on a min-hash collision matrix. We empirically show that this new metric provides significantly better estimates for alignment sizes, and we provide a computationally efficient estimator for these spectral similarity scores.

    View details for DOI 10.1016/j.patter.2020.100081

    View details for PubMedID 33205128

  • Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment Kamath, G. M., Baharav, T. Z., Shomorony, I., Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., Lin, H. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2020
  • Ultra Fast Medoid Identification via Correlated Sequential Halving Baharav, T. Z., Tse, D., Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019