
Shubham Chandak
Ph.D. Student in Electrical Engineering, admitted Autumn 2016
Education & Certifications
-
MS, Stanford University, Electrical Engineering (2018)
-
B.Tech., Indian Institute of Technology Bombay, Electrical Engineering (2016)
Stanford Advisors
-
Tsachy Weissman, Doctoral Dissertation Advisor (AC)
-
Hanlee Ji, Doctoral Dissertation Reader (AC)
Current Research and Scholarly Interests
DNA storage, genomic data compression, information theory, machine learning
All Publications
-
OVERCOMING HIGH NANOPORE BASECALLER ERROR RATES FOR DNA STORAGE VIA BASECALLER-DECODER INTEGRATION AND CONVOLUTIONAL CODES
IEEE. 2020: 8822–26
View details for Web of Science ID 000615970409020
-
DZip: improved general-purpose lossless compression based on novel neural network modeling
IEEE COMPUTER SOC. 2020: 372
View details for DOI 10.1109/DCC47342.2020.00065
View details for Web of Science ID 000591183800053
-
LFZip: Lossy compression of multivariate floating-point time series data via improved prediction
IEEE COMPUTER SOC. 2020: 342–51
View details for DOI 10.1109/DCC47342.2020.00042
View details for Web of Science ID 000591183800035
-
Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy
Bioinformatics (Oxford, England)
2020
Abstract
Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications.We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (≲0.2% reduction) and consensus accuracy (≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications.The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation.
View details for DOI 10.1093/bioinformatics/btaa1017
View details for PubMedID 33325499
-
SPRING: a next-generation compressor for FASTQ data
BIOINFORMATICS
2019; 35 (15): 2674–76
View details for DOI 10.1093/bioinformatics/bty1015
View details for Web of Science ID 000484378200023
-
Humans are still the best lossy image compressors
IEEE. 2019: 558
View details for DOI 10.1109/DCC.2019.00070
View details for Web of Science ID 000470908200063
-
DeepZip: Lossless Data Compression using Recurrent Neural Networks
IEEE. 2019: 575
View details for DOI 10.1109/DCC.2019.00087
View details for Web of Science ID 000470908200080
-
Improved read/write cost tradeoff in DNA-based data storage using LDPC codes
IEEE. 2019: 147–56
View details for Web of Science ID 000535355700022
-
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis
BIOINFORMATICS
2018; 34 (4): 558–67
Abstract
New Generation Sequencing (NGS) technologies for genome sequencing produce large amounts of short genomic reads per experiment, which are highly redundant and compressible. However, general-purpose compressors are unable to exploit this redundancy due to the special structure present in the data.We present a new algorithm for compressing reads both with and without preserving the read order. In both cases, it achieves 1.4×-2× compression gain over state-of-the-art read compression tools for datasets containing as many as 3 billion Illumina reads. Our tool is based on the idea of approximately reordering the reads according to their position in the genome using hashed substring indices. We also present a systematic analysis of the read compression problem and compute bounds on fundamental limits of read compression. This analysis sheds light on the dynamics of the proposed algorithm (and read compression algorithms in general) and helps understand its performance in practice. The algorithm compresses only the read sequence, works with unaligned FASTQ files, and does not require a reference.schandak@stanford.edu.Supplementary material are available at Bioinformatics online. The proposed algorithm is available for download at https://github.com/shubhamchandak94/HARC.
View details for PubMedID 29444237
View details for PubMedCentralID PMC5860611
-
An Actively Detuned Wireless Power Receiver With Public Key Cryptographic Authentication and Dynamic Power Allocation
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2018: 236–46
View details for DOI 10.1109/JSSC.2017.2737562
View details for Web of Science ID 000418873800021
-
An Actively Detuned Wireless Power Receiver with Public Key Cryptographic Authentication and Dynamic Power Allocation
IEEE. 2017: 366
View details for Web of Science ID 000403393800153