Education & Certifications


  • MS, Stanford University, Electrical Engineering (2018)
  • B.Tech., Indian Institute of Technology Bombay, Electrical Engineering (2016)

Stanford Advisors


Current Research and Scholarly Interests


DNA storage, genomic data compression, information theory, machine learning

All Publications


  • Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy Bioinformatics (Oxford, England) Chandak, S. n., Tatwawadi, T. n., Sridhar, S. n., Weissman, T. n. 2020

    Abstract

    Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications.We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (⁠≲0.2% reduction) and consensus accuracy (⁠≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications.The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation.

    View details for DOI 10.1093/bioinformatics/btaa1017

    View details for PubMedID 33325499

  • DZip: improved general-purpose lossless compression based on novel neural network modeling Goyal, M., Tatwawadi, K., Chandak, S., Ochoa, I., Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE COMPUTER SOC. 2020: 372
  • SPRING: a next-generation compressor for FASTQ data BIOINFORMATICS Chandak, S., Tatwawadi, K., Ochoa, I., Hernaez, M., Weissman, T. 2019; 35 (15): 2674–76
  • Humans are still the best lossy image compressors Bhown, A., Mukherjee, S., Yang, S., Chandak, S., Fischer-Hwang, I., Tatwawadi, K., Weissman, T., IEEE IEEE. 2019: 558
  • Improved read/write cost tradeoff in DNA-based data storage using LDPC codes Chandak, S., Tatwawadi, K., Lau, B., Mardia, J., Kubit, M., Neu, J., Griffin, P., Wootters, M., Weissman, T., Ji, H., IEEE IEEE. 2019: 147–56
  • DeepZip: Lossless Data Compression using Recurrent Neural Networks Goyal, M., Tatwawadi, K., Chandak, S., Ochoa, I., IEEE IEEE. 2019: 575
  • Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis BIOINFORMATICS Chandak, S., Tatwawadi, K., Weissman, T. 2018; 34 (4): 558–67

    Abstract

    New Generation Sequencing (NGS) technologies for genome sequencing produce large amounts of short genomic reads per experiment, which are highly redundant and compressible. However, general-purpose compressors are unable to exploit this redundancy due to the special structure present in the data.We present a new algorithm for compressing reads both with and without preserving the read order. In both cases, it achieves 1.4×-2× compression gain over state-of-the-art read compression tools for datasets containing as many as 3 billion Illumina reads. Our tool is based on the idea of approximately reordering the reads according to their position in the genome using hashed substring indices. We also present a systematic analysis of the read compression problem and compute bounds on fundamental limits of read compression. This analysis sheds light on the dynamics of the proposed algorithm (and read compression algorithms in general) and helps understand its performance in practice. The algorithm compresses only the read sequence, works with unaligned FASTQ files, and does not require a reference.schandak@stanford.edu.Supplementary material are available at Bioinformatics online. The proposed algorithm is available for download at https://github.com/shubhamchandak94/HARC.

    View details for PubMedID 29444237

    View details for PubMedCentralID PMC5860611

  • An Actively Detuned Wireless Power Receiver With Public Key Cryptographic Authentication and Dynamic Power Allocation Desai, N., Juvekar, C., Chandak, S., Chandrakasan, A. P. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2018: 236–46
  • An Actively Detuned Wireless Power Receiver with Public Key Cryptographic Authentication and Dynamic Power Allocation Desai, N. V., Juvekar, C., Chandak, S., Chandrakasan, A. P., IEEE IEEE. 2017: 366