Current Research and Scholarly Interests
Information Theory, Genomic data compression, DNA storage
- DeepZip: Lossless Data Compression using Recurrent Neural Networks IEEE. 2019: 575
- Humans are still the best lossy image compressors IEEE. 2019: 558
SPRING: A next-generation compressor for FASTQ data.
Bioinformatics (Oxford, England)
Motivation: High-Throughput Sequencing (HTS) technologies produce huge amounts of data in the form of short genomic reads, associated quality values, and read identifiers. Because of the significant structure present in these FASTQ datasets, general-purpose compressors are unable to completely exploit much of the inherent redundancy. Although there has been a lot of work on designing FASTQ compressors, most of them lack in support of one or more crucial properties, such as support for variable length reads, scalability to high coverage datasets, pairing-preserving compression and lossless compression.Results: In this work, we propose SPRING, a reference-free compressor for FASTQ files. SPRING supports a wide variety of compression modes and features, including lossless compression, pairingpreserving compression, lossy compression of quality values, long read compression and random access. SPRING achieves substantially better compression than existing tools, for example, SPRING compresses 195 GB of 25x whole genome human FASTQ from Illumina's NovaSeq sequencer to less than 7 GB, around 1.6x smaller than previous state-of-the-art FASTQ compressors. SPRING achieves this improvement while using comparable computational resources.Availability: SPRING can be downloaded from https://github.com/shubhamchandak94/SPRING.Supplementary Information: Supplementary data are available at Bioinformatics online.
View details for PubMedID 30535063
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis
2018; 34 (4): 558–67
New Generation Sequencing (NGS) technologies for genome sequencing produce large amounts of short genomic reads per experiment, which are highly redundant and compressible. However, general-purpose compressors are unable to exploit this redundancy due to the special structure present in the data.We present a new algorithm for compressing reads both with and without preserving the read order. In both cases, it achieves 1.4×-2× compression gain over state-of-the-art read compression tools for datasets containing as many as 3 billion Illumina reads. Our tool is based on the idea of approximately reordering the reads according to their position in the genome using hashed substring indices. We also present a systematic analysis of the read compression problem and compute bounds on fundamental limits of read compression. This analysis sheds light on the dynamics of the proposed algorithm (and read compression algorithms in general) and helps understand its performance in practice. The algorithm compresses only the read sequence, works with unaligned FASTQ files, and does not require a firstname.lastname@example.org.Supplementary material are available at Bioinformatics online. The proposed algorithm is available for download at https://github.com/shubhamchandak94/HARC.
View details for PubMedID 29444237
View details for PubMedCentralID PMC5860611
- An Actively Detuned Wireless Power Receiver With Public Key Cryptographic Authentication and Dynamic Power Allocation IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2018: 236–46
An Actively Detuned Wireless Power Receiver with Public Key Cryptographic Authentication and Dynamic Power Allocation
IEEE. 2017: 366
View details for Web of Science ID 000403393800153