Itschak Weissman

Robert and Barbara Kleist Professor in the School of Engineering

Electrical Engineering

Bio

Tsachy's research focuses on Information Theory, Data Compression and Communications, Statistical Signal Processing, Machine Learning, the interplay between them, and their applications, with recent focus on applications to genomic data compression and processing. He is inventor of several patents and involved in several companies as member of the technical board. IEEE fellow, he serves on the board of governors of the information theory society as well as the editorial boards of the Transactions on Information Theory and Foundations and Trends in Communications and Information Theory. He is founding Director of the Stanford Compression Forum.

Academic Appointments

Administrative Appointments

Founder/Director, Stanford Compression Forum (2015 - Present)
Academic Affairs Committee, Stanford University (2013 - 2018)
Graduate Admissions Committee, Stanford University (2004 - 2012)

Honors & Awards

Google Research Award, Google (2016)
Yahoo! Faculty Research Award, Yahoo! (2015)
Google Research Award, Google (2013)
Yahoo! Faculty Research And Engagement Award, Yahoo! (2013)
IEEE Fellow Grade, IEEE (2012)
Google Research Award, Google (2011)
STMicroelectronics Faculty Scholar Award, Stanford University (2011)

Boards, Advisory Committees, Professional Organizations

Guest Editor, Proceedings of IEEE (2017 - 2017)
Organizer/Technical Program Committee, IEEE (ITW) (2015 - 2015)
Technical Program Committee, IEEE (ISIT) (2015 - 2015)
Technical Program Committee, IEEE (ISIT) (2014 - 2014)
Editorial Board, Foundations and Trends in Communications and Information Theory (2012 - Present)
Fellow, IEEE (2012 - Present)
Technical Program Committee, IEEE (ISIT) (2012 - 2012)
Technical Program Committee, IEEE (ISIT) (2011 - 2011)
Editorial Board, Transactions on Information Theory Associate Editor for Shannon Theory (2010 - Present)
Organizer/Technical Program Committee, IEEE (ITW) (2010 - 2010)
Paper Award Committee, IEEE (2009 - 2009)
Technical Program Committee, Data Compression Conference (DCC) (2009 - 2009)
Technical Program Committee, IEEE International Symposium on Information Theory (ISIT) (2009 - 2009)
Technical Program Committee, Control Over Communication Channels Workshop (ConCom) (2009 - 2009)
Technical Program Committee, DCC (Data Compression Conference) (2008 - 2008)
Technical Program Committee, IAPR Workshop on Cognitive Information Processing (CIP) (2008 - 2008)
Technical Program Committee, IEEE (ISIT) (2007 - 2007)
Member, The Mathematical Association of America (MAA) (2006 - Present)
Panelist, Allerton Conference (2006 - 2006)
Technical Program Committee, IEEE (ISIT) (2005 - 2005)

Program Affiliations

Stanford SystemX Alliance

Professional Education

PhD, Technion-Israel Institute of Technology, Electrical Engineering (2001)
B.Sc., Technion-Israel Institute of Technology, Electrical Engineering (1997)

Patents

Itschak Weissman. "United States Patent US8320687 Universal Lossy Compression Methods", Nov 27, 2012
Itschak Weissman. "United States Patent US200404580 Denoising and Error Correction for Finite Input General Output Channel", May 10, 2012
Itschak Weissman. "United States Patent US7624009 Method and System for Producing Variable Length Context Models", Nov 24, 2009
Itschak Weissman. "United States Patent US7498961 Context Identification Using a Denoised Signal", Mar 3, 2009
Itschak Weissman. "United States Patent US7474793 Methods for Compression Using a Denoiser", Jan 6, 2009
Itschak Weissman. "United States Patent EP1766832 Discrete Universal Denoising with Error Correction Coding", Dec 10, 2008
Itschak Weissman. "United States Patent US7436969 Method and System for Optimizing Denoising Parameters Using Compressibility", Oct 14, 2008
Itschak Weissman. "United States Patent US7433427 Enhanced Denoising System Utilizing Incremental Parsing", Oct 7, 2008
Itschak Weissman. "United States Patent US7420487 Denoising Video", Sep 2, 2008
Itschak Weissman. "United States Patent US7271749 Context-based Denoiser That Simultaneously Updates Probabilities for Multiple Contexts", Sep 18, 2007
Itschak Weissman. "United States Patent US7269781 Discrete Universal Denoising with Reliability Information", Sep 11, 2007
Itschak Weissman. "United States Patent US7123172 Method and System for Determining an Optimal or Near Optimal Set of Contexts by Constructing a Multi-directional Context Tree", Oct 17, 2006
Itschak Weissman. "United States Patent US7047472 Method For Correcting Noise Errors in a Digital Signal", May 16, 2006

Contact

Academic
tsachy@stanford.edu

University - Faculty Department: Electrical Engineering Position: Professor

Alternate Contact Shea Goodner Assistant sgoodner@stanford.edu

Additional Info

Mail Code: 9510

2025-26 Courses

Data Compression: Theory and Applications
EE 274 (Aut)
Information Theory
EE 276 (Win)
Probability and Statistical Inference
EE 278 (Spr)
Independent Studies (7)
- Directed Studies in Applied Physics
  APPPHYS 290 (Aut, Win, Spr, Sum)
- Master's Thesis and Thesis Research
  EE 300 (Aut, Win, Spr, Sum)
- Special Studies and Reports in Electrical Engineering
  EE 191 (Aut, Win, Spr, Sum)
- Special Studies and Reports in Electrical Engineering
  EE 391 (Aut, Win, Spr, Sum)
- Special Studies and Reports in Electrical Engineering (WIM)
  EE 191W (Aut, Win, Spr, Sum)
- Special Studies or Projects in Electrical Engineering
  EE 190 (Aut, Win, Spr, Sum)
- Special Studies or Projects in Electrical Engineering
  EE 390 (Aut, Win, Spr, Sum)
Prior Year Courses
2024-25 Courses
- Information Theory
  EE 276 (Win)
- Universal Information Processing
  EE 376C (Aut)
2023-24 Courses
- Data Compression: Theory and Applications
  EE 274 (Aut)
- Information Theory
  EE 276 (Win)
2022-23 Courses
- Data Compression: Theory and Applications
  EE 274 (Aut)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Andy Dong, Rajat Dwaraknath, Yiqi Jiang, Hannah Lee, Mira Partha, Dan Song, Alice Tor, Ajay Tripathi
Doctoral Dissertation Advisor (AC)
Naomi Sagan
Master's Program Advisor
Guilherme Beyruti Suranyi, Munir Bshara, Nicholas Bukovec, Wayne Chu, Caden Davis
Doctoral Dissertation Co-Advisor (AC)
Jiwon Jeong
Doctoral (Program)
Lara Arikan, Connor Ding, Yiqi Jiang, Sahasrajit Sarmasarkar

All Publications

Optimal No-Regret Learning in Repeated First-Price Auctions OPERATIONS RESEARCH Han, Y., Weissman, T., Zhou, Z. 2024

View details for DOI 10.1287/opre.2020.0282

View details for Web of Science ID 001264870900001
Mutual Information Upper Bounds for Uniform Inputs Through the Deletion Channel IEEE TRANSACTIONS ON INFORMATION THEORY Pernice, F., Isik, B., Weissman, T. 2024; 70 (7): 4599-4610

View details for DOI 10.1109/TIT.2024.3355841

View details for Web of Science ID 001253773500008
Semantic Image Compression using Textual Transforms Arikan, L., Weissman, T., IEEE IEEE. 2024

View details for DOI 10.1109/ISIT-W61686.2024.10591763

View details for Web of Science ID 001345730900010
Adaptive Compression in Federated Learning via Side Information Isik, B., Pase, F., Gunduz, D., Koyejo, S., Weissman, T., Zorzi, M. edited by Dasgupta, S., Mandt, S., Li, Y. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024

View details for Web of Science ID 001221034001020
Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. Cell reports methods Lee, H., Greer, S. U., Pavlichin, D. S., Zhou, B., Urban, A. E., Weissman, T., Ji, H. P. 2023; 3 (8): 100543

Abstract

The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.

View details for DOI 10.1016/j.crmeth.2023.100543

View details for PubMedID 37671027

View details for PubMedCentralID PMC10475782
Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing. Scientific reports Lau, B., Chandak, S., Roy, S., Tatwawadi, K., Wootters, M., Weissman, T., Ji, H. P. 2023; 13 (1): 8514

Abstract

The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, basecalling errors, and limitations with scaling up read operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. By conjugating synthesized DNA to magnetic agarose beads, we enabled repeated data readouts while preserving the original DNA analyte and maintaining data readout quality. MDRAM utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.

View details for DOI 10.1038/s41598-023-29575-z

View details for PubMedID 37231057
Neural Network Compression for Noisy Storage Devices ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS Isik, B., Choi, K., Zheng, X., Weissman, T., Ermon, S., Wong, H., Alaghi, A. 2023; 22 (3)

View details for DOI 10.1145/3588436

View details for Web of Science ID 001020424400019
Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach. Scientific reports Meng, Q., Chandak, S., Zhu, Y., Weissman, T. 2023; 13 (1): 2082

Abstract

The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35-0.65 bits per base which is 3-6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring .

View details for DOI 10.1038/s41598-023-29267-8

View details for PubMedID 36747011
Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS Tandon, P., Chandak, S., Pataranutaporn, P., Liu, Y., Mapuranga, A. M., Maes, P., Weissman, T., Sra, M. 2023; 41 (1): 107-118

View details for DOI 10.1109/JSAC.2022.3221953

View details for Web of Science ID 000927934500007
Exact Optimality of Communication-Privacy-Utility Tradeoffs in Distributed Mean Estimation Isik, B., Chen, W., Ozgur, A., Weissman, T., No, A. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001220818806016
Txt2Vid-Web: Web-based, Text-to-Video, Video Conferencing Pipeline Barrett, A., Gomezjurado, L., Mukherjee, S., Bshara, A., Sarmasarkar, S., Tandon, P., Weissman, T. edited by Bilgin, A., Marcellin, M. W., Serra-Sagrista, J., Storer, J. A. IEEE COMPUTER SOC. 2023: 331

View details for DOI 10.1109/DCC55655.2023.00073

View details for Web of Science ID 001008252400034
KmerKeys: a web resource for searching indexed genome assemblies and variants. Nucleic acids research Pavlichin, D. S., Lee, H., Greer, S. U., Grimes, S. M., Weissman, T., Ji, H. P. 2022

Abstract

K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.

View details for DOI 10.1093/nar/gkac266

View details for PubMedID 35474383
The Human Pangenome Project: a global resource to map genomic diversity. Nature Wang, T., Antonacci-Fulton, L., Howe, K., Lawson, H. A., Lucas, J. K., Phillippy, A. M., Popejoy, A. B., Asri, M., Carson, C., Chaisson, M. J., Chang, X., Cook-Deegan, R., Felsenfeld, A. L., Fulton, R. S., Garrison, E. P., Garrison, N. A., Graves-Lindsay, T. A., Ji, H., Kenny, E. E., Koenig, B. A., Li, D., Marschall, T., McMichael, J. F., Novak, A. M., Purushotham, D., Schneider, V. A., Schultz, B. I., Smith, M. W., Sofia, H. J., Weissman, T., Flicek, P., Li, H., Miga, K. H., Paten, B., Jarvis, E. D., Hall, I. M., Eichler, E. E., Haussler, D., Human Pangenome Reference Consortium 2022; 604 (7906): 437-446

Abstract

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goalof constructing the highest-possible quality human pangenome reference. Our goal is toimprove data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.

View details for DOI 10.1038/s41586-022-04601-8

View details for PubMedID 35444317
Classification and clustering of RNA crosslink-ligation data reveal complex structures and homodimers. Genome research Zhang, M., Hwang, I. T., Li, K., Bai, J., Chen, J., Weisman, T., Zou, J. Y., Lu, Z. 2022

Abstract

The recent development and application of methods based on the general principle of crosslinking and proximity ligation (crosslink-ligation) are revolutionizing RNA structure studies in living cells. However, extracting structure information from such data presents unique challenges. Here we introduce a set of computational tools for the systematic analysis of data from a wide variety of crosslink-ligation methods, specifically focusing on read mapping, alignment classification and clustering. We design a new strategy to map short reads with irregular gaps at high sensitivity and specificity. Analysis of previously published data reveals distinct properties and bias caused by the crosslinking reactions. We perform rigorous and exhaustive classification of alignments and discover 8 types of arrangements that provide distinct information on RNA structures and interactions. To deconvolve the dense and intertwined gapped alignments, we develop a net-work/graph-based tool CRSSANT (Crosslinked RNA Secondary Structure Analysis using Network Techniques), which enables clustering of gapped alignments and discovery of new alternative and dynamic conformations. We discover that multiple crosslinking and ligation events can occur on the same RNA, generating multi-segment alignments to report complex high level RNA structures and multi-RNA interactions. We find that alignments with overlapped segments are produced from potential homodimers and develop a new method for their de novo identification. Analysis of overlapping alignments revealed potential new homodimers in cellular noncoding RNAs and RNA virus genomes in the Picornaviridae family. Together, this suite of computational tools enables rapid and efficient analysis of RNA structure and interaction data in living cells.

View details for DOI 10.1101/gr.275979.121

View details for PubMedID 35332099
An Information-Theoretic Justification for Model Pruning Isik, B., Weissman, T., No, A. edited by Camps-Valls, G., Ruiz, F. J., Valera JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022

View details for Web of Science ID 000828072703040
Optimal Communication Rates and Combinatorial Properties for Common Randomness Generation IEEE TRANSACTIONS ON INFORMATION THEORY Han, Y., Tatwawadi, K., Kurri, G. R., Zhou, Z., Prabhakaran, V. M., Weissman, T. 2021; 67 (12): 7723-7739

View details for DOI 10.1109/TIT.2021.3119976

View details for Web of Science ID 000720518300008
Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints IEEE TRANSACTIONS ON INFORMATION THEORY Han, Y., Ozgur, A., Weissman, T. 2021; 67 (12): 8248-8263

View details for DOI 10.1109/TIT.2021.3108952

View details for Web of Science ID 000720518300039
Reducing latency and bandwidth for video streaming using keypoint extraction and digital puppetry Prabhakar, R., Chandak, S., Chiu, C., Liang, R., Nguyen, H., Tatwawadi, K., Weissman, T. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE COMPUTER SOC. 2021: 360

View details for DOI 10.1109/DCC50243.2021.00057

View details for Web of Science ID 000675592800059
MEOW: A Space-Efficient Nonparametric Bid Shading Algorithm Zhang, W., Kitts, B., Han, Y., Zhou, Z., Mao, T., He, H., Pan, S., Flores, A., Gultekin, S., Weissman, T., ASSOC COMP MACHINERY ASSOC COMPUTING MACHINERY. 2021: 3928-3936

View details for DOI 10.1145/3447548.3467113

View details for Web of Science ID 000749556803097
Optimal Communication Rates and Combinatorial Properties for Common Randomness Generation Han, Y., Tatwawadi, K., Kurri, G. R., Zhou, Z., Prabhakaran, V. M., Weissman, T., IEEE IEEE. 2021: 1931-1936

View details for DOI 10.1109/ISIT45174.2021.9518090

View details for Web of Science ID 000701502202004
OPTIMAL RATES OF ENTROPY ESTIMATION OVER LIPSCHITZ BALLS ANNALS OF STATISTICS Han, Y., Jiao, J., Weissman, T., Wu, Y. 2020; 48 (6): 3228–50

View details for DOI 10.1214/19-AOS1927

View details for Web of Science ID 000598369200006
Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy Bioinformatics (Oxford, England) Chandak, S. n., Tatwawadi, T. n., Sridhar, S. n., Weissman, T. n. 2020

Abstract

Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications.We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (⁠≲0.2% reduction) and consensus accuracy (⁠≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications.The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation.

View details for DOI 10.1093/bioinformatics/btaa1017

View details for PubMedID 33325499
OVERCOMING HIGH NANOPORE BASECALLER ERROR RATES FOR DNA STORAGE VIA BASECALLER-DECODER INTEGRATION AND CONVOLUTIONAL CODES Chandak, S., Neu, J., Tatwawadi, K., Mardia, J., Lau, B., Kubit, M., Hulett, R., Griffin, P., Wootters, M., Weissman, T., Ji, H., IEEE IEEE. 2020: 8822–26

View details for Web of Science ID 000615970409020
LFZip: Lossy compression of multivariate floating-point time series data via improved prediction Chandak, S., Tatwawadi, K., Wen, C., Wang, L., Ojea, J., Weissman, T. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE COMPUTER SOC. 2020: 342–51

View details for DOI 10.1109/DCC47342.2020.00042

View details for Web of Science ID 000591183800035
Denoising of Aligned Genomic Data. Scientific reports Fischer-Hwang, I., Ochoa, I., Weissman, T., Hernaez, M. 2019; 9 (1): 15067

Abstract

Noise in genomic sequencing data is known to have effects on various stages of genomic data analysis pipelines. Variant identification is an important step of many of these pipelines, and is increasingly being used in clinical settings to aid medical practices. We propose a denoising method, dubbed SAMDUDE, which operates on aligned genomic data in order to improve variant calling performance. Denoising human data with SAMDUDE resulted in improved variant identification in both individual chromosome as well as whole genome sequencing (WGS) data sets. In the WGS data set, denoising led to identification of almost 2,000 additional true variants, and elimination of over 1,500 erroneously identified variants. In contrast, we found that denoising with other state-of-the-art denoisers significantly worsens variant calling performance. SAMDUDE is written in Python and is freely available at https://github.com/ihwang/SAMDUDE .

View details for DOI 10.1038/s41598-019-51418-z

View details for PubMedID 31636330
Estimating the Fundamental Limits Is Easier Than Achieving the Fundamental Limits IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Han, Y., Fischer-Hwang, I., Weissman, T. 2019; 65 (10): 6704–15

View details for DOI 10.1109/TIT.2019.2927697

View details for Web of Science ID 000487041200041
SPRING: a next-generation compressor for FASTQ data BIOINFORMATICS Chandak, S., Tatwawadi, K., Ochoa, I., Hernaez, M., Weissman, T. 2019; 35 (15): 2674–76

View details for DOI 10.1093/bioinformatics/bty1015

View details for Web of Science ID 000484378200023
Humans are still the best lossy image compressors Bhown, A., Mukherjee, S., Yang, S., Chandak, S., Fischer-Hwang, I., Tatwawadi, K., Weissman, T., IEEE IEEE. 2019: 558

View details for DOI 10.1109/DCC.2019.00070

View details for Web of Science ID 000470908200063
Genomic Data Compression ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 2, 2019 Hernaez, M., Pavlichin, D., Weissman, T., Ochoa, I. edited by Altman, R. B., Levitt, M. 2019; 2: 19–37

View details for DOI 10.1146/annurev-biodatasci-072018-021229

View details for Web of Science ID 000484798500002
Improved read/write cost tradeoff in DNA-based data storage using LDPC codes Chandak, S., Tatwawadi, K., Lau, B., Mardia, J., Kubit, M., Neu, J., Griffin, P., Wootters, M., Weissman, T., Ji, H., IEEE IEEE. 2019: 147–56

View details for Web of Science ID 000535355700022
Approximate Profile Maximum Likelihood JOURNAL OF MACHINE LEARNING RESEARCH Pavlichin, D. S., Jiao, J., Weissman, T. 2019; 20

View details for Web of Science ID 000487068900006
SPRING: A next-generation compressor for FASTQ data. Bioinformatics (Oxford, England) Chandak, S., Tatwawadi, K., Ochoa, I., Hernaez, M., Weissman, T. 2018

Abstract

Motivation: High-Throughput Sequencing (HTS) technologies produce huge amounts of data in the form of short genomic reads, associated quality values, and read identifiers. Because of the significant structure present in these FASTQ datasets, general-purpose compressors are unable to completely exploit much of the inherent redundancy. Although there has been a lot of work on designing FASTQ compressors, most of them lack in support of one or more crucial properties, such as support for variable length reads, scalability to high coverage datasets, pairing-preserving compression and lossless compression.Results: In this work, we propose SPRING, a reference-free compressor for FASTQ files. SPRING supports a wide variety of compression modes and features, including lossless compression, pairingpreserving compression, lossy compression of quality values, long read compression and random access. SPRING achieves substantially better compression than existing tools, for example, SPRING compresses 195 GB of 25x whole genome human FASTQ from Illumina's NovaSeq sequencer to less than 7 GB, around 1.6x smaller than previous state-of-the-art FASTQ compressors. SPRING achieves this improvement while using comparable computational resources.Availability: SPRING can be downloaded from https://github.com/shubhamchandak94/SPRING.Supplementary Information: Supplementary data are available at Bioinformatics online.

View details for PubMedID 30535063
Mutual Information, Relative Entropy and Estimation Error in Semi-Martingale Channels IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Venkat, K., Weissman, T. 2018; 64 (10): 6662–71

View details for DOI 10.1109/TIT.2018.2829889

View details for Web of Science ID 000444800600015
Minimax Estimation of the L-1 Distance IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Han, Y., Weissman, T. 2018; 64 (10): 6672–6706

View details for DOI 10.1109/TIT.2018.2846245

View details for Web of Science ID 000444800600016
Generalizations of maximal inequalities to arbitrary selection rules STATISTICS & PROBABILITY LETTERS Jiao, J., Han, Y., Weissman, T. 2018; 137: 19–25

View details for DOI 10.1016/j.spl.2018.01.002

View details for Web of Science ID 000435046400004
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis BIOINFORMATICS Chandak, S., Tatwawadi, K., Weissman, T. 2018; 34 (4): 558–67

Abstract

New Generation Sequencing (NGS) technologies for genome sequencing produce large amounts of short genomic reads per experiment, which are highly redundant and compressible. However, general-purpose compressors are unable to exploit this redundancy due to the special structure present in the data.We present a new algorithm for compressing reads both with and without preserving the read order. In both cases, it achieves 1.4×-2× compression gain over state-of-the-art read compression tools for datasets containing as many as 3 billion Illumina reads. Our tool is based on the idea of approximately reordering the reads according to their position in the genome using hashed substring indices. We also present a systematic analysis of the read compression problem and compute bounds on fundamental limits of read compression. This analysis sheds light on the dynamics of the proposed algorithm (and read compression algorithms in general) and helps understand its performance in practice. The algorithm compresses only the read sequence, works with unaligned FASTQ files, and does not require a reference.schandak@stanford.edu.Supplementary material are available at Bioinformatics online. The proposed algorithm is available for download at https://github.com/shubhamchandak94/HARC.

View details for PubMedID 29444237

View details for PubMedCentralID PMC5860611
Entropy Rate Estimation for Markov Chains with Large State Space Han, Y., Jiao, J., Lee, C., Weissman, T., Wu, Y., Yu, T. edited by Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018

View details for Web of Science ID 000461852004035
Distributed Statistical Estimation of High-Dimensional and Nonparametric Distributions Han, Y., Mukherjee, P., Ozgur, A., Weissman, T., IEEE IEEE. 2018: 506–10

View details for Web of Science ID 000448139300102
On Universal Compression with Constant Random Access Tatwawadi, K., Bidokhti, S., Weissman, T., IEEE IEEE. 2018: 891–95

View details for Web of Science ID 000448139300179
Minimax redundancy for Markov chains with large state space Tatwawadi, K., Jiao, J., Weissman, T., IEEE IEEE. 2018: 216–20

View details for Web of Science ID 000448139300044
Maximum Likelihood Estimation of Functionals of Discrete Distributions IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Venkat, K., Han, Y., Weissman, T. 2017; 63 (10): 6774–98

View details for DOI 10.1109/TIT.2017.2733537

View details for Web of Science ID 000411021000043
DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing PLOS ONE Lee, B., Moon, T., Yoon, S., Weissman, T. 2017; 12 (7): e0181463

Abstract

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq.

View details for PubMedID 28749987
Relations Between Information and Estimation in Discrete-Time Levy Channels Jiao, J., Venkat, K., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2017: 3579–94

View details for DOI 10.1109/TIT.2017.2692211

View details for Web of Science ID 000402058900016
Effect of lossy compression of quality scores on variant calling BRIEFINGS IN BIOINFORMATICS Ochoa, I., Hernaez, M., Goldfeder, R., Weissman, T., Ashley, E. 2017; 18 (2): 183-194

Abstract

Recent advancements in sequencing technology have led to a drastic reduction in genome sequencing costs. This development has generated an unprecedented amount of data that must be stored, processed, and communicated. To facilitate this effort, compression of genomic files has been proposed. Specifically, lossy compression of quality scores is emerging as a natural candidate for reducing the growing costs of storage. A main goal of performing DNA sequencing in population studies and clinical settings is to identify genetic variation. Though the field agrees that smaller files are advantageous, the cost of lossy compression, in terms of variant discovery, is unclear.Bioinformatic algorithms to identify SNPs and INDELs use base quality score information; here, we evaluate the effect of lossy compression of quality scores on SNP and INDEL detection. Specifically, we investigate how the output of the variant caller when using the original data differs from that obtained when quality scores are replaced by those generated by a lossy compressor. Using gold standard genomic datasets and simulated data, we are able to analyze how accurate the output of the variant calling is, both for the original data and that previously lossily compressed. We show that lossy compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the original data. Further, in some cases lossy compression can lead to variant calling performance that is superior to that using the original file. We envisage our findings and framework serving as a benchmark in future development and analyses of lossy genomic data compressors.

View details for DOI 10.1093/bib/bbw011

View details for Web of Science ID 000397060200002
Effect of lossy compression of quality scores on variant calling. Briefings in bioinformatics Ochoa, I., Hernaez, M., Goldfeder, R., Weissman, T., Ashley, E. 2017; 18 (2): 183-194

Abstract

Recent advancements in sequencing technology have led to a drastic reduction in genome sequencing costs. This development has generated an unprecedented amount of data that must be stored, processed, and communicated. To facilitate this effort, compression of genomic files has been proposed. Specifically, lossy compression of quality scores is emerging as a natural candidate for reducing the growing costs of storage. A main goal of performing DNA sequencing in population studies and clinical settings is to identify genetic variation. Though the field agrees that smaller files are advantageous, the cost of lossy compression, in terms of variant discovery, is unclear.Bioinformatic algorithms to identify SNPs and INDELs use base quality score information; here, we evaluate the effect of lossy compression of quality scores on SNP and INDEL detection. Specifically, we investigate how the output of the variant caller when using the original data differs from that obtained when quality scores are replaced by those generated by a lossy compressor. Using gold standard genomic datasets and simulated data, we are able to analyze how accurate the output of the variant calling is, both for the original data and that previously lossily compressed. We show that lossy compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the original data. Further, in some cases lossy compression can lead to variant calling performance that is superior to that using the original file. We envisage our findings and framework serving as a benchmark in future development and analyses of lossy genomic data compressors.

View details for DOI 10.1093/bib/bbw011

View details for PubMedID 26966283
When is Noisy State Information at the Encoder as Useless as No Information or as Good as Noise-Free State? IEEE TRANSACTIONS ON INFORMATION THEORY Xu, R., Chen, J., Weissman, T., Zhang, J. 2017; 63 (2): 960-974

View details for DOI 10.1109/TIT.2016.2631150

View details for Web of Science ID 000394667700010
Principles and Applications of Science of Information PROCEEDINGS OF THE IEEE Courtade, T., Grama, A., Mahoney, M. W., Weissman, T. 2017; 105 (2): 183–88

View details for DOI 10.1109/JPROC.2016.2646778

View details for Web of Science ID 000394532000002
GeneComp, a new reference-based compressor for SAM files Long, R., Hernaez, M., Ochoa, I., Weissman, T. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE COMPUTER SOC. 2017: 330–39

Abstract

The affordability of DNA sequencing has led to unprecedented volumes of genomic data. These data must be stored, processed, and analyzed. The most popular format for genomic data is the SAM format, which contains information such as alignment, quality values, etc. These files are large (on the order of terabytes), which necessitates compression. In this work we propose a new reference-based compressor for SAM files, which can accommodate different levels of compression, based on the specific needs of the user. In particular, the proposed compressor GeneComp allows the user to perform lossy compression of the quality scores, which have been proven to occupy more than half of the compressed file (when losslessly compressed). We show that the proposed compressor GeneComp overall achieves better compression ratios than previously proposed algorithms when working on lossless mode.

View details for PubMedID 29046896
Compressing Tabular Data via Pairwise Dependencies Pavlichin, D. S., Ingber, A., Weissman, T. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE COMPUTER SOC. 2017: 455

View details for PubMedID 29046897
Dependence Measures Bounding the Exploration Bias for General Measurements Jiao, J., Han, Y., Weissman, T., IEEE IEEE. 2017: 1475–79

View details for Web of Science ID 000430345201108
Rateless Lossy Compression via the Extremes IEEE TRANSACTIONS ON INFORMATION THEORY No, A., Weissman, T. 2016; 62 (10): 5484-5495

Abstract

We begin by presenting a simple lossy compressor operating at near-zero rate: The encoder merely describes the indices of the few maximal source components, while the decoder's reconstruction is a natural estimate of the source components based on this information. This scheme turns out to be near optimal for the memoryless Gaussian source in the sense of achieving the zero-rate slope of its distortion-rate function. Motivated by this finding, we then propose a scheme comprised of iterating the above lossy compressor on an appropriately transformed version of the difference between the source and its reconstruction from the previous iteration. The proposed scheme achieves the rate distortion function of the Gaussian memoryless source (under squared error distortion) when employed on any finite-variance ergodic source. It further possesses desirable properties, and we, respectively, refer to as infinitesimal successive refinability, ratelessness, and complete separability. Its storage and computation requirements are of order no more than (n2)/(log β n) per source symbol for β > 0 at both the encoder and the decoder. Though the details of its derivation, construction, and analysis differ considerably, we discuss similarities between the proposed scheme and the recently introduced Sparse Regression Codes of Venkataramanan et al.

View details for DOI 10.1109/TIT.2016.2598148

View details for Web of Science ID 000384304600013

View details for PubMedCentralID PMC5786423
Rateless Lossy Compression via the Extremes. IEEE transactions on information theory No, A., Weissman, T. 2016; 62 (10): 5484-5495

Abstract

We begin by presenting a simple lossy compressor operating at near-zero rate: The encoder merely describes the indices of the few maximal source components, while the decoder's reconstruction is a natural estimate of the source components based on this information. This scheme turns out to be near optimal for the memoryless Gaussian source in the sense of achieving the zero-rate slope of its distortion-rate function. Motivated by this finding, we then propose a scheme comprised of iterating the above lossy compressor on an appropriately transformed version of the difference between the source and its reconstruction from the previous iteration. The proposed scheme achieves the rate distortion function of the Gaussian memoryless source (under squared error distortion) when employed on any finite-variance ergodic source. It further possesses desirable properties, and we, respectively, refer to as infinitesimal successive refinability, ratelessness, and complete separability. Its storage and computation requirements are of order no more than (n2)/(log β n) per source symbol for β > 0 at both the encoder and the decoder. Though the details of its derivation, construction, and analysis differ considerably, we discuss similarities between the proposed scheme and the recently introduced Sparse Regression Codes of Venkataramanan et al.

View details for DOI 10.1109/tit.2016.2598148

View details for PubMedID 29375154

View details for PubMedCentralID PMC5786423
GTRAC: fast retrieval from compressed collections of genomic variants BIOINFORMATICS Tatwawadi, K., Hernaez, M., Ochoa, I., Weissman, T. 2016; 32 (17): 479-486

Abstract

The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether.We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures.The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.eduSupplementary data are available at Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btw437

View details for Web of Science ID 000384666800012

View details for PubMedCentralID PMC5013914
GTRAC: fast retrieval from compressed collections of genomic variants. Bioinformatics Tatwawadi, K., Hernaez, M., Ochoa, I., Weissman, T. 2016; 32 (17): i479-i486

Abstract

The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether.We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures.The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.eduSupplementary data are available at Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btw437

View details for PubMedID 27587665

View details for PubMedCentralID PMC5013914
Information, Estimation, and Lookahead in the Gaussian Channel IEEE TRANSACTIONS ON SIGNAL PROCESSING Venkat, K., Weissman, T., Carmon, Y., Shamai, S. 2016; 64 (14): 3605-3618

View details for DOI 10.1109/TSP.2016.2544748

View details for Web of Science ID 000377370500005
Secure Source Coding With a Public Helper IEEE TRANSACTIONS ON INFORMATION THEORY Kittichokechai, K., Chia, Y., Oechtering, T. J., Skoglund, M., Weissman, T. 2016; 62 (7): 3930-3949

View details for DOI 10.1109/TIT.2016.2570011

View details for Web of Science ID 000382135600009
Strong Successive Refinability and Rate-Distortion-Complexity Tradeoff IEEE TRANSACTIONS ON INFORMATION THEORY No, A., Ingber, A., Weissman, T. 2016; 62 (6): 3618-3635

View details for DOI 10.1109/TIT.2016.2549540

View details for Web of Science ID 000380070600038
Compression for Quadratic Similarity Queries: Finite Blocklength and Practical Schemes IEEE TRANSACTIONS ON INFORMATION THEORY Steiner, F., Dempfle, S., Ingber, A., Weissman, T. 2016; 62 (5): 2737-2747

Abstract

We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem - the minimal compression rate and the error exponent - were found in a previous work, in this paper we focus on the nonasymptotic domain and on practical, implementable schemes. We first present a finite blocklength achievability bound based on shape-gain quantization: The gain (amplitude) of the vector is compressed via scalar quantization and the shape (the projection on the unit sphere) is quantized using a spherical code. The results are numerically evaluated and they converge to the asymptotic values as predicted by the error exponent. We then give a nonasymptotic lower bound on the performance of any compression scheme, and compare to the upper (achievability) bound. For a practical implementation of such a scheme, we use wrapped spherical codes, studied by Hamkins and Zeger, and use the Leech lattice as an example for an underlying lattice. As a side result, we obtain a bound on the covering angle of any wrapped spherical code, as a function of the covering radius of the underlying lattice.

View details for DOI 10.1109/TIT.2016.2535172

View details for Web of Science ID 000375005500026

View details for PubMedCentralID PMC5796586
Compression for Quadratic Similarity Queries: Finite Blocklength and Practical Schemes. IEEE transactions on information theory Steiner, F., Dempfle, S., Ingber, A., Weissman, T. 2016; 62 (5): 2737-2747

Abstract

We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem - the minimal compression rate and the error exponent - were found in a previous work, in this paper we focus on the nonasymptotic domain and on practical, implementable schemes. We first present a finite blocklength achievability bound based on shape-gain quantization: The gain (amplitude) of the vector is compressed via scalar quantization and the shape (the projection on the unit sphere) is quantized using a spherical code. The results are numerically evaluated and they converge to the asymptotic values as predicted by the error exponent. We then give a nonasymptotic lower bound on the performance of any compression scheme, and compare to the upper (achievability) bound. For a practical implementation of such a scheme, we use wrapped spherical codes, studied by Hamkins and Zeger, and use the Leech lattice as an example for an underlying lattice. As a side result, we obtain a bound on the covering angle of any wrapped spherical code, as a function of the covering radius of the underlying lattice.

View details for DOI 10.1109/TIT.2016.2535172

View details for PubMedID 29398721

View details for PubMedCentralID PMC5796586
Comment on: 'ERGC: an efficient referential genome compression algorithm' BIOINFORMATICS Deorowicz, S., Grabowski, S., Ochoa, I., Hernaez, M., Weissman, T. 2016; 32 (7): 1115–17

Abstract

Data compression is crucial in effective handling of genomic data. Among several recently published algorithms, ERGC seems to be surprisingly good, easily beating all of the competitors.We evaluated ERGC and the previously proposed algorithms GDC and iDoComp, which are the ones used in the original paper for comparison, on a wide data set including 12 assemblies of human genome (instead of only four of them in the original paper). ERGC wins only when one of the genomes (referential or target) contains mixed-cased letters (which is the case for only the two Korean genomes). In all other cases ERGC is on average an order of magnitude worse than GDC and iDoComp.sebastian.deorowicz@polsl.pl, iochoa@stanford.eduSupplementary data are available at Bioinformatics online.

View details for PubMedID 26615213

View details for PubMedCentralID PMC4907388
smallWig: parallel compression of RNA-seq WIG files BIOINFORMATICS Wang, Z., Weissman, T., Milenkovic, O. 2016; 32 (2): 173-180

Abstract

We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms.The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate this claim, we performed a statistical analysis of expression data in different transform domains and developed accompanying entropy coding methods that bridge the gap between theoretical and practical WIG file compression rates.We tested different variants of the smallWig compression algorithm on a number of integer-and real- (floating point) valued RNA-seq WIG files generated by the ENCODE project. The results reveal that, on average, smallWig offers 18-fold compression rate improvements, up to 2.5-fold compression time improvements, and 1.5-fold decompression time improvements when compared with bigWig. On the tested files, the memory usage of the algorithm never exceeded 90 KB. When more elaborate context mixing compressors were used within smallWig, the obtained compression rates were as much as 23 times better than those of bigWig. For smallWig used in the random query mode, which also supports retrieval of the summary statistics, an overhead in the compression rate of roughly 3-17% was introduced depending on the chosen system parameters. An increase in encoding and decoding time of 30% and 55% represents an additional performance loss caused by enabling random data access. We also implemented smallWig using multi-processor programming. This parallelization feature decreases the encoding delay 2-3.4 times compared with that of a single-processor implementation, with the number of processors used ranging from 2 to 8; in the same parameter regime, the decoding delay decreased 2-5.2 times.The smallWig software can be downloaded from: http://stanford.edu/~zhiyingw/smallWig/smallwig.html, http://publish.illinois.edu/milenkovic/, http://web.stanford.edu/~tsachy/.zhiyingw@stanford.eduSupplementary data are available at Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btv561

View details for Web of Science ID 000368360100003

View details for PubMedCentralID PMC6078172
smallWig: parallel compression of RNA-seq WIG files. Bioinformatics (Oxford, England) Wang, Z., Weissman, T., Milenkovic, O. 2016; 32 (2): 173-80

Abstract

We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms.The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate this claim, we performed a statistical analysis of expression data in different transform domains and developed accompanying entropy coding methods that bridge the gap between theoretical and practical WIG file compression rates.We tested different variants of the smallWig compression algorithm on a number of integer-and real- (floating point) valued RNA-seq WIG files generated by the ENCODE project. The results reveal that, on average, smallWig offers 18-fold compression rate improvements, up to 2.5-fold compression time improvements, and 1.5-fold decompression time improvements when compared with bigWig. On the tested files, the memory usage of the algorithm never exceeded 90 KB. When more elaborate context mixing compressors were used within smallWig, the obtained compression rates were as much as 23 times better than those of bigWig. For smallWig used in the random query mode, which also supports retrieval of the summary statistics, an overhead in the compression rate of roughly 3-17% was introduced depending on the chosen system parameters. An increase in encoding and decoding time of 30% and 55% represents an additional performance loss caused by enabling random data access. We also implemented smallWig using multi-processor programming. This parallelization feature decreases the encoding delay 2-3.4 times compared with that of a single-processor implementation, with the number of processors used ranging from 2 to 8; in the same parameter regime, the decoding delay decreased 2-5.2 times.The smallWig software can be downloaded from: http://stanford.edu/~zhiyingw/smallWig/smallwig.html, http://publish.illinois.edu/milenkovic/, http://web.stanford.edu/~tsachy/.zhiyingw@stanford.eduSupplementary data are available at Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btv561

View details for PubMedID 26424856

View details for PubMedCentralID PMC6078172
Distortion Rate Function of Sub-Nyquist Sampled Gaussian Sources IEEE TRANSACTIONS ON INFORMATION THEORY Kipnis, A., Goldsmith, A. J., Eldar, Y. C., Weissman, T. 2016; 62 (1): 401-429

View details for DOI 10.1109/TIT.2015.2485271

View details for Web of Science ID 000369309500027
Denoising of Quality Scores for Boosted Inference and Reduced Storage Ochoa, I., Hernaez, M., Goldfeder, R., Weissman, T., Ashley, E. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE. 2016: 251–60

Abstract

Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores. The code used in this work as well as a Supplement with all the results are available at http://web.stanford.edu/~iochoa/DCCdenoiser_CodeAndSupplement.zip.

View details for PubMedID 29098178
Mutual Information, Relative Entropy and Estimation Error in Semi-Martingale Channels Jiao, J., Venkat, K., Weissman, T., IEEE IEEE. 2016: 2794–98

View details for Web of Science ID 000390098702172
Chained Kullback-Leibler Divergences Pavlichin, D. S., Weissman, T., IEEE IEEE. 2016: 580–84

Abstract

We define and characterize the "chained" Kullback-Leibler divergence min w D(p‖w) + D(w‖q) minimized over all intermediate distributions w and the analogous k-fold chained K-L divergence min D(p‖wk-1) + … + D(w2‖w1) + D(w1‖q) minimized over the entire path (w1,…,wk-1). This quantity arises in a large deviations analysis of a Markov chain on the set of types - the Wright-Fisher model of neutral genetic drift: a population with allele distribution q produces offspring with allele distribution w, which then produce offspring with allele distribution p, and so on. The chained divergences enjoy some of the same properties as the K-L divergence (like joint convexity in the arguments) and appear in k-step versions of some of the same settings as the K-L divergence (like information projections and a conditional limit theorem). We further characterize the optimal k-step "path" of distributions appearing in the definition and apply our findings in a large deviations analysis of the Wright-Fisher process. We make a connection to information geometry via the previously studied continuum limit, where the number of steps tends to infinity, and the limiting path is a geodesic in the Fisher information metric. Finally, we offer a thermodynamic interpretation of the chained divergence (as the rate of operation of an appropriately defined Maxwell's demon) and we state some natural extensions and applications (a k-step mutual information and k-step maximum likelihood inference). We release code for computing the objects we study.

View details for PubMedID 29130024
A cluster-based approach to compression of Quality Scores Hernaez, M., Ochoa, I., Weissman, T. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE. 2016: 261–70

Abstract

Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Storing and sharing this large data has become a major bottleneck in the discovery and analysis of genetic variants that are used for medical inference. As such, lossless compression of this data has been proposed. Of the compressed data, more than 70% correspond to quality scores, which indicate the sequencing machine reliability when calling a particular basepair. Thus, to further improve the compression performance, lossy compression of quality scores is emerging as the natural candidate. Since the data is used for genetic variants discovery, lossy compressors for quality scores are analyzed in terms of their rate-distortion performance, as well as their effect on the variant callers. Previously proposed algorithms do not do well under all performance metrics, and are hence unsuitable for certain applications. In this work we propose a new lossy compressor that first performs a clustering step, by assuming all the quality scores sequences come from a mixture of Markov models. Then, it performs quantization of the quality scores based on the Markov models. Each quantizer targets a specific distortion to optimize for the overall rate-distortion performance. Finally, the quantized values are compressed by an entropy encoder. We demonstrate that the proposed lossy compressor outperforms the previously proposed methods under all analyzed distortion metrics. This suggests that the effect that the proposed algorithm will have on any downstream application will likely be less noticeable than that of previously proposed lossy compressors. Moreover, we analyze how the proposed lossy compressor affects Single Nucleotide Polymorphism (SNP) calling, and show that the variability introduced on the calls is considerably smaller than the variability that exists between different methodologies for SNP calling.

View details for PubMedID 29057318
Minimax Estimation of the L-I Distance Jiao, J., Han, Y., Weissman, T., IEEE IEEE. 2016: 750–54

View details for Web of Science ID 000390098700150
When is Noisy State Information at the Eneoder as Useless as No Information or as Good as Noise-Free State? Xu, R., Chen, J., Weissman, T., Zhang, J., IEEE IEEE. 2016: 890–94

View details for Web of Science ID 000390098700178
Minimax Rate-optimal Estimation of KL Divergence between Discrete Distributions Han, Y., Jiao, J., Weissman, T., IEEE IEEE. 2016: 256–60

Abstract

We refine the general methodology in [1] for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions with support size S comparable with the number of observations n. Specifically, we determine the "smooth" and "non-smooth" regimes based on the confidence set and the smoothness of the functional. In the "non-smooth" regime, we apply an unbiased estimator for a "suitable" polynomial approximation of the functional. In the "smooth" regime, we construct a bias corrected version of the Maximum Likelihood Estimator (MLE) based on Taylor expansion. We apply the general methodology to the problem of estimating the KL divergence between two discrete distributions from empirical data. We construct a minimax rate-optimal estimator which is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio. Moreover, the performance of the optimal estimator with n samples is essentially that of the MLE with n ln n samples, i.e., the effective sample size enlargement phenomenon holds.

View details for PubMedID 29457152
Beyond Maximum Likelihood: Boosting the Chow-Liu Algorithm for Large Alphabets Jiao, J., Han, Y., Weissman, T. edited by Matthews, M. B. IEEE COMPUTER SOC. 2016: 321–25

View details for Web of Science ID 000406057400056
Minimax Estimation of Discrete Distributions Under l(1) Loss IEEE TRANSACTIONS ON INFORMATION THEORY Han, Y., Jiao, J., Weissman, T. 2015; 61 (11): 6343-6354

View details for DOI 10.1109/TIT.2015.2478816

View details for Web of Science ID 000363256500038
Justification of Logarithmic Loss via the Benefit of Side Information IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Courtade, T. A., Venkat, K., Weissman, T. 2015; 61 (10): 5357-5365

View details for DOI 10.1109/TIT.2015.2462848

View details for Web of Science ID 000361486800010
QVZ: lossy compression of quality values. Bioinformatics Malysa, G., Hernaez, M., Ochoa, I., Rao, M., Ganesan, K., Weissman, T. 2015; 31 (19): 3122-3129

Abstract

Recent advancements in sequencing technology have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed and transmitted. To facilitate this effort, we propose a new lossy compressor for the quality values presented in genomic data files (e.g. FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit.The proposed algorithm QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms. Finally, we show that QVZ-compressed data exhibit better performance in the genotyping than data compressed with previously proposed algorithms, in the sense that for a similar rate, a genotyping closer to that achieved with the original quality values is obtained.QVZ is written in C and can be downloaded from https://github.com/mikelhernaez/qvz.mhernaez@stanford.edu or gmalysa@stanford.edu or iochoa@stanford.eduSupplementary data are available at Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btv330

View details for PubMedID 26026138
Network Compression: Worst Case Analysis IEEE TRANSACTIONS ON INFORMATION THEORY Asnani, H., Shomorony, I., Avestimehr, A. S., Weissman, T. 2015; 61 (7): 3980-3995

Abstract

We study the problem of communicating a distributed correlated memoryless source over a memoryless network, from source nodes to destination nodes, under quadratic distortion constraints. We establish the following two complementary results: 1) for an arbitrary memoryless network, among all distributed memoryless sources of a given correlation, Gaussian sources are least compressible, that is, they admit the smallest set of achievable distortion tuples and 2) for any memoryless source to be communicated over a memoryless additive-noise network, among all noise processes of a given correlation, Gaussian noise admits the smallest achievable set of distortion tuples. We establish these results constructively by showing how schemes for the corresponding Gaussian problems can be applied to achieve similar performance for (source or noise) distributions that are not necessarily Gaussian but have the same covariance.

View details for DOI 10.1109/TIT.2015.2434829

View details for Web of Science ID 000356301600020

View details for PubMedCentralID PMC5786424
Network Compression: Worst Case Analysis. IEEE transactions on information theory Asnani, H., Shomorony, I., Avestimehr, A. S., Weissman, T. 2015; 61 (7): 3980-3995

Abstract

We study the problem of communicating a distributed correlated memoryless source over a memoryless network, from source nodes to destination nodes, under quadratic distortion constraints. We establish the following two complementary results: 1) for an arbitrary memoryless network, among all distributed memoryless sources of a given correlation, Gaussian sources are least compressible, that is, they admit the smallest set of achievable distortion tuples and 2) for any memoryless source to be communicated over a memoryless additive-noise network, among all noise processes of a given correlation, Gaussian noise admits the smallest achievable set of distortion tuples. We establish these results constructively by showing how schemes for the corresponding Gaussian problems can be applied to achieve similar performance for (source or noise) distributions that are not necessarily Gaussian but have the same covariance.

View details for DOI 10.1109/tit.2015.2434829

View details for PubMedID 29375153

View details for PubMedCentralID PMC5786424
Compression for Quadratic Similarity Queries IEEE TRANSACTIONS ON INFORMATION THEORY Ingber, A., Courtade, T., Weissman, T. 2015; 61 (5): 2729-2747

Abstract

The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold-the identification rate- which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.

View details for DOI 10.1109/TIT.2015.2402972

View details for Web of Science ID 000353515700032

View details for PubMedCentralID PMC5786438
Compression for Quadratic Similarity Queries. IEEE transactions on information theory Ingber, A., Courtade, T., Weissman, T. 2015; 61 (5): 2729-2747

Abstract

The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold-the identification rate- which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.

View details for DOI 10.1109/tit.2015.2402972

View details for PubMedID 29375151

View details for PubMedCentralID PMC5786438
Minimax Estimation of Functionals of Discrete Distributions. IEEE transactions on information theory Jiao, J., Venkat, K., Han, Y., Weissman, T. 2015; 61 (5): 2835-2885

Abstract

We propose a general methodology for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the support size S is unknown and may be comparable with or even much larger than the number of observations n. We treat the respective regions where the functional is nonsmooth and smooth separately. In the nonsmooth regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the smooth regime, we apply a bias-corrected version of the maximum likelihood estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: 1) the entropy [Formula: see text] and 2) [Formula: see text], α > 0. We obtain the minimax L2 rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n ≍ S/ln S for entropy estimation. We also demonstrate that the sample complexity for estimating Fα (P), 0 < α < 1, is n ≍ S1/α /ln S, which can be achieved by our estimator but not the MLE. For 1 < α < 3/2, we show the minimax L2 rate for estimating Fα (P) is (n ln n)-2(α-1) for infinite support size, while the maximum L2 rate for the MLE is n-2(α-1). For all the above cases, the behavior of the minimax rate-optimal estimators with n samples is essentially that of the MLE (plug-in rule) with n ln n samples, which we term "effective sample size enlargement." We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with various existing approaches, and demonstrate that our approach reduces running time and boosts the accuracy. Moreover, we show that the minimax rate-optimal mutual information estimator yielded by our framework leads to significant performance boosts over the Chow-Liu algorithm in learning graphical models. The wide use of information measure estimation suggests that the insights and estimators obtained in this paper could be broadly applicable.

View details for DOI 10.1109/tit.2015.2412945

View details for PubMedID 29375152

View details for PubMedCentralID PMC5786426
Minimax Estimation of Functionals of Discrete Distributions IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Venkat, K., Han, Y., Weissman, T. 2015; 61 (5): 2835-2885

Abstract

We propose a general methodology for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the support size S is unknown and may be comparable with or even much larger than the number of observations n. We treat the respective regions where the functional is nonsmooth and smooth separately. In the nonsmooth regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the smooth regime, we apply a bias-corrected version of the maximum likelihood estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: 1) the entropy [Formula: see text] and 2) [Formula: see text], α > 0. We obtain the minimax L2 rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n ≍ S/ln S for entropy estimation. We also demonstrate that the sample complexity for estimating Fα (P), 0 < α < 1, is n ≍ S1/α /ln S, which can be achieved by our estimator but not the MLE. For 1 < α < 3/2, we show the minimax L2 rate for estimating Fα (P) is (n ln n)-2(α-1) for infinite support size, while the maximum L2 rate for the MLE is n-2(α-1). For all the above cases, the behavior of the minimax rate-optimal estimators with n samples is essentially that of the MLE (plug-in rule) with n ln n samples, which we term "effective sample size enlargement." We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with various existing approaches, and demonstrate that our approach reduces running time and boosts the accuracy. Moreover, we show that the minimax rate-optimal mutual information estimator yielded by our framework leads to significant performance boosts over the Chow-Liu algorithm in learning graphical models. The wide use of information measure estimation suggests that the insights and estimators obtained in this paper could be broadly applicable.

View details for DOI 10.1109/TIT.2015.2412945

View details for Web of Science ID 000353515700038

View details for PubMedCentralID PMC5786426
Comparison of the Achievable Rates in OFDM and Single Carrier Modulation with IID Inputs IEEE TRANSACTIONS ON INFORMATION THEORY Carmon, Y., Shamai, S., Weissman, T. 2015; 61 (4): 1795-1818

View details for DOI 10.1109/TIT.2015.2403354

View details for Web of Science ID 000351470800018
iDoComp: a compression scheme for assembled genomes BIOINFORMATICS Ochoa, I., Hernaez, M., Weissman, T. 2015; 31 (5): 626-633

Abstract

With the release of the latest next-generation sequencing (NGS) machine, the HiSeq X by Illumina, the cost of sequencing a Human has dropped to a mere $4000. Thus we are approaching a milestone in the sequencing history, known as the $1000 genome era, where the sequencing of individuals is affordable, opening the doors to effective personalized medicine. Massive generation of genomic data, including assembled genomes, is expected in the following years. There is crucial need for compression of genomes guaranteed of performing well simultaneously on different species, from simple bacteria to humans, which will ease their transmission, dissemination and analysis. Further, most of the new genomes to be compressed will correspond to individuals of a species from which a reference already exists on the database. Thus, it is natural to propose compression schemes that assume and exploit the availability of such references.We propose iDoComp, a compressor of assembled genomes presented in FASTA format that compresses an individual genome using a reference genome for both the compression and the decompression. In terms of compression efficiency, iDoComp outperforms previously proposed algorithms in most of the studied cases, with comparable or better running time. For example, we observe compression gains of up to 60% in several cases, including H.sapiens data, when comparing with the best compression performance among the previously proposed algorithms.iDoComp is written in C and can be downloaded from: http://www.stanford.edu/~iochoa/iDoComp.html (We also provide a full explanation on how to run the program and an example with all the necessary files to run it.).

View details for DOI 10.1093/bioinformatics/btu698

View details for Web of Science ID 000352268500002

View details for PubMedID 25344501
Macrophages eat cancer cells using their own calreticulin as a guide: Roles of TLR and Btk. Proceedings of the National Academy of Sciences of the United States of America Feng, M., Chen, J. Y., Weissman-Tsukamoto, R., Volkmer, J., Ho, P. Y., McKenna, K. M., Cheshier, S., Zhang, M., Guo, N., Gip, P., Mitra, S. S., Weissman, I. L. 2015; 112 (7): 2145-2150

Abstract

Macrophage-mediated programmed cell removal (PrCR) is an important mechanism of eliminating diseased and damaged cells before programmed cell death. The induction of PrCR by eat-me signals on tumor cells is countered by don't-eat-me signals such as CD47, which binds macrophage signal-regulatory protein α to inhibit phagocytosis. Blockade of CD47 on tumor cells leads to phagocytosis by macrophages. Here we demonstrate that the activation of Toll-like receptor (TLR) signaling pathways in macrophages synergizes with blocking CD47 on tumor cells to enhance PrCR. Bruton's tyrosine kinase (Btk) mediates TLR signaling in macrophages. Calreticulin, previously shown to be an eat-me signal on cancer cells, is activated in macrophages for secretion and cell-surface exposure by TLR and Btk to target cancer cells for phagocytosis, even if the cancer cells themselves do not express calreticulin.

View details for DOI 10.1073/pnas.1424907112

View details for PubMedID 25646432
Universality of Logarithmic Loss in Lossy Compression No, A., Weissman, T., IEEE IEEE. 2015: 2166–70

View details for Web of Science ID 000380904702044
Compression for Similarity Identification: Computing the Error Exponent Ingber, A., Weissman, T. edited by Bilgin, A., Marcellin, M. W., SerraSagrista, J., Storer, J. A. IEEE. 2015: 413–22

Abstract

We consider the problem of compressing discrete memoryless data sequences for the purpose of similarity identification, first studied by Ahlswede et al. (1997). In this setting, a source sequence is compressed, where the goal is to be able to identify whether the original source sequence is similar to another given sequence (called the query sequence). There is no requirement that the source will be reproducible from the compressed version. In the case where no false negatives are allowed, a compression scheme is said to be reliable if the probability of error (false positive) vanishes as the sequence length grows. The minimal compression rate in this sense, which is the parallel of the classical rate distortion function, is called the identification rate. The rate at which the error probability vanishes is measured by its exponent, called the identification exponent (which is the analog of the classical excess distortion exponent). While an information-theoretic expression for the identification exponent was found in past work, it is uncomputable due to a dependency on an auxiliary random variable with unbounded cardinality. The main result of this paper is a cardinality bound on the auxiliary random variable in the identification exponent, thereby making the quantity computable (solving the problem that was left open by Ahlswede et al.). The new proof technique relies on the fact that the Lagrangian in the optimization problem (in the expression for the exponent) can be decomposed by coordinate (of the auxiliary random variable). Then a standard Carathéodory - style argument completes the proof.

View details for PubMedID 29046895
Maximum Likelihood Estimation of Information Measures Jiao, J., Venkat, K., Han, Y., Weissman, T., IEEE IEEE. 2015: 839–43

View details for Web of Science ID 000380904700168
Minimax Estimation of Information Measures Jiao, J., Venkat, K., Han, Y., Weissman, T., IEEE IEEE. 2015: 2296–2300

View details for Web of Science ID 000380904702070
Minimax Estimation of Discrete Distributions Han, Y., Jiao, J., Weissman, T., IEEE IEEE. 2015: 2291–95

View details for Web of Science ID 000380904702069
Adaptive Estimation of Shannon Entropy Han, Y., Jiao, J., Weissman, T., IEEE IEEE. 2015: 1372–76

View details for Web of Science ID 000380904701085
Does Dirichlet Prior Smoothing Solve the Shannon Entropy Estimation Problem? Han, Y., Jiao, J., Weissman, T., IEEE IEEE. 2015: 1367–71

View details for Web of Science ID 000380904701084
Information Measures: The Curious Case of the Binary Alphabet IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Courtade, T. A., No, A., Venkat, K., Weissman, T. 2014; 60 (12): 7616-7626

View details for DOI 10.1109/TIT.2014.2360184

View details for Web of Science ID 000345511400014
Aligned genomic data compression via improved modeling JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY Ochoa, I., Hernaez, M., Weissman, T. 2014; 12 (6)

Abstract

With the release of the latest Next-Generation Sequencing (NGS) machine, the HiSeq X by Illumina, the cost of sequencing the whole genome of a human is expected to drop to a mere $1000. This milestone in sequencing history marks the era of affordable sequencing of individuals and opens the doors to personalized medicine. In accord, unprecedented volumes of genomic data will require storage for processing. There will be dire need not only of compressing aligned data, but also of generating compressed files that can be fed directly to downstream applications to facilitate the analysis of and inference on the data. Several approaches to this challenge have been proposed in the literature; however, focus thus far has been on the low coverage regime and most of the suggested compressors are not based on effective modeling of the data. We demonstrate the benefit of data modeling for compressing aligned reads. Specifically, we show that, by working with data models designed for the aligned data, we can improve considerably over the best compression ratio achieved by previously proposed algorithms. Our results indicate that the pareto-optimal barrier for compression rate and speed claimed by Bonfield and Mahoney (2013) [Bonfield JK and Mahoneys MV, Compression of FASTQ and SAM format sequencing data, PLOS ONE, 8(3):e59190, 2013.] does not apply for high coverage aligned data. Furthermore, our improved compression ratio is achieved by splitting the data in a manner conducive to operations in the compressed domain by downstream applications.

View details for DOI 10.1142/S0219720014420025

View details for PubMedID 25395305
Capacity of a POST Channel With and Without Feedback IEEE TRANSACTIONS ON INFORMATION THEORY Permuter, H. H., Asnani, H., Weissman, T. 2014; 60 (10): 6041-6057

View details for DOI 10.1109/TIT.2014.2343232

View details for Web of Science ID 000342416900017
To Feed or Not to Feedback IEEE TRANSACTIONS ON INFORMATION THEORY Asnani, H., Permuter, H. H., Weissman, T. 2014; 60 (9): 5150-5172

View details for DOI 10.1109/TIT.2014.2331957

View details for Web of Science ID 000342415600006
Minimax Filtering Regret via Relations Between Information and Estimation IEEE TRANSACTIONS ON INFORMATION THEORY No, A., Weissman, T. 2014; 60 (8): 4832-4847

View details for DOI 10.1109/TIT.2014.2323069

View details for Web of Science ID 000340273700027
The Porosity of Additive Noise Channels IEEE TRANSACTIONS ON INFORMATION THEORY Misra, V., Weissman, T. 2014; 60 (6): 3144-3162

View details for DOI 10.1109/TIT.2014.2313578

View details for Web of Science ID 000341980300006
Compression With Actions IEEE TRANSACTIONS ON INFORMATION THEORY Zhao, L., Chia, Y., Weissman, T. 2014; 60 (2): 796-807

View details for DOI 10.1109/TIT.2013.2293339

View details for Web of Science ID 000330286100002
Multiterminal Source Coding Under Logarithmic Loss IEEE TRANSACTIONS ON INFORMATION THEORY Courtade, T. A., Weissman, T. 2014; 60 (1): 740-761

View details for DOI 10.1109/TIT.2013.2288257

View details for Web of Science ID 000330282600049
Rateless Lossy Compression via the Extremes No, A., Weissman, T., IEEE IEEE. 2014: 88–92

View details for Web of Science ID 000380426900014
Compression for Similarity Identification: Fundamental Limits IEEE International Symposium on Information Theory (ISIT) Ingber, A., Weissman, T. IEEE. 2014: 1–5

View details for Web of Science ID 000346496100001
Compression for Quadratic Similarity Queries via Shape-Gain Quantizers IEEE International Symposium on Information Theory (ISIT) Dempfle, S., Steiner, F., Ingber, A., Weissman, T. IEEE. 2014: 2839–2843

View details for Web of Science ID 000346496102196
Information Divergences and the Curious Case of the Binary Alphabet IEEE International Symposium on Information Theory (ISIT) Jiao, J., Courtade, T., No, A., Venkat, K., Weissman, T. IEEE. 2014: 351–355

View details for Web of Science ID 000346496100070
Relations between Information and Estimation in Scalar Levy Channels IEEE International Symposium on Information Theory (ISIT) Jiao, J., Venkat, K., Weissman, T. IEEE. 2014: 2212–2216

View details for Web of Science ID 000346496102070
Strong Successive Refinability: Sufficient Conditions IEEE International Symposium on Information Theory (ISIT) No, A., Ingber, A., Weissman, T. IEEE. 2014: 2659–2663

View details for Web of Science ID 000346496102160
Justification of Logarithmic Loss via the Benefit of Side Information IEEE International Symposium on Information Theory (ISIT) Jiao, J., Courtade, T., Venkat, K., Weissman, T. IEEE. 2014: 946–950

View details for Web of Science ID 000346496101016
Achievable Error Exponents in the Gaussian Channel With Rate-Limited Feedback IEEE TRANSACTIONS ON INFORMATION THEORY Mirghaderi, R., Goldsmith, A., Weissman, T. 2013; 59 (12): 8144-8156

View details for DOI 10.1109/TIT.2013.2280918

View details for Web of Science ID 000327594500024
Estimation With a Helper Who Knows the Interference IEEE TRANSACTIONS ON INFORMATION THEORY Chia, Y., Soundararajan, R., Weissman, T. 2013; 59 (11): 7097-7117

View details for DOI 10.1109/TIT.2013.2273531

View details for Web of Science ID 000325981100008
Universal Estimation of Directed Information IEEE TRANSACTIONS ON INFORMATION THEORY Jiao, J., Permuter, H. H., Zhao, L., Kim, Y., Weissman, T. 2013; 59 (10): 6220-6242

View details for DOI 10.1109/TIT.2013.2267934

View details for Web of Science ID 000324573500004
The human genome contracts again BIOINFORMATICS Pavlichin, D. S., Weissman, T., Yona, G. 2013; 29 (17): 2199-2202

Abstract

The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering the analysis phase. Therefore, there is a growing need to compress these data efficiently. Here, we describe a technique to compress human genomes based on entropy coding, using a reference genome and known Single Nucleotide Polymorphisms (SNPs). Furthermore, we explore several intrinsic features of genomes and information in other genomic databases to further improve the compression attained. Using these methods, we compress James Watson's genome to 2.5 megabytes (MB), improving on recent work by 37%. Similar compression is obtained for most genomes available from the 1000 Genomes Project. Our biologically inspired techniques promise even greater gains for genomes of lower organisms and for human genomes as more genomic data become available.Code is available at sourceforge.net/projects/genomezip/

View details for DOI 10.1093/bioinformatics/btt362

View details for Web of Science ID 000323344800018

View details for PubMedID 23793748
Successive Refinement With Decoder Cooperation and Its Channel Coding Duals IEEE TRANSACTIONS ON INFORMATION THEORY Asnani, H., Permuter, H. H., Weissman, T. 2013; 59 (9): 5511-5533

View details for DOI 10.1109/TIT.2013.2266655

View details for Web of Science ID 000323455800017
QualComp: a new lossy compressor for quality scores based on rate distortion theory BMC BIOINFORMATICS Ochoa, I., Asnani, H., Bharadia, D., Chowdhury, M., Weissman, T., Yona, G. 2013; 14

Abstract

Next Generation Sequencing technologies have revolutionized many fields in biology by reducing the time and cost required for sequencing. As a result, large amounts of sequencing data are being generated. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. This data consists of both the nucleotide sequences and per-base quality scores that indicate the level of confidence in the readout of these sequences. Quality scores account for about half of the required disk space in the commonly used FASTQ format (before compression), and therefore the compression of the quality scores can significantly reduce storage requirements and speed up analysis and transmission of sequencing data.In this paper, we present a new scheme for the lossy compression of the quality scores, to address the problem of storage. Our framework allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to be compressed. Our algorithm can work at any rate, unlike other lossy compression algorithms. We envisage our algorithm as being part of a more general compression scheme that works with the entire FASTQ file. Numerical experiments show that we can achieve a better mean squared error (MSE) for small rates (bits per quality score) than other lossy compression schemes. For the organism PhiX, whose assembled genome is known and assumed to be correct, we show that it is possible to achieve a significant reduction in size with little compromise in performance on downstream applications (e.g., alignment).QualComp is an open source software package, written in C and freely available for download at https://sourceforge.net/projects/qualcomp.

View details for DOI 10.1186/1471-2105-14-187

View details for Web of Science ID 000321136300001

View details for PubMedID 23758828

View details for PubMedCentralID PMC3698011
Real-Time Coding With Limited Lookahead IEEE TRANSACTIONS ON INFORMATION THEORY Asnani, H., Weissman, T. 2013; 59 (6): 3582-3606

View details for DOI 10.1109/TIT.2013.2245396

View details for Web of Science ID 000320709800024
Multiterminal Source Coding With Action-Dependent Side Information IEEE TRANSACTIONS ON INFORMATION THEORY Chia, Y., Asnani, H., Weissman, T. 2013; 59 (6): 3653-3667

View details for DOI 10.1109/TIT.2013.2245398

View details for Web of Science ID 000320709800028
Directed Information, Causal Estimation, and Communication in Continuous Time IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T., Kim, Y., Permuter, H. H. 2013; 59 (3): 1271-1287

View details for DOI 10.1109/TIT.2012.2227677

View details for Web of Science ID 000315120400004
Multiterminal Source Coding with Action Dependent Side Information IEEE Trans. Inform. Theory Chi, Y.K., Asnani, H., Weissman, T. 2013; 59 (6): 3653-3667
Secure Source Coding with a Public Helper Kittichokechai, K., Chia, Y., Oechtering, T. J., Skoglund, M., Weissman, T., IEEE IEEE. 2013: 2209-+

Abstract

We consider secure multi-terminal source coding problems in the presence of a public helper. Two main scenarios are studied: 1) source coding with a helper where the coded side information from the helper is eavesdropped by an external eavesdropper, 2) triangular source coding with a helper where the helper is considered as a public terminal. We are interested in how the helper can support the source transmission subject to a constraint on the amount of information leaked due to its public nature. We characterize the tradeoff between transmission rate, incurred distortion, and information leakage rate at the helper/eavesdropper in the form of a rate-distortion-leakage region for various classes of problems.

View details for PubMedID 29398720
QualComp: a new lossy compressor for quality scores based on rate distortion theory. BMC bioinformatics Ochoa, I., Asnani, H., Bharadia, D., Chowdhury, M., Weissman, T., Yona, G. 2013; 14: 187-?

Abstract

Next Generation Sequencing technologies have revolutionized many fields in biology by reducing the time and cost required for sequencing. As a result, large amounts of sequencing data are being generated. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. This data consists of both the nucleotide sequences and per-base quality scores that indicate the level of confidence in the readout of these sequences. Quality scores account for about half of the required disk space in the commonly used FASTQ format (before compression), and therefore the compression of the quality scores can significantly reduce storage requirements and speed up analysis and transmission of sequencing data.In this paper, we present a new scheme for the lossy compression of the quality scores, to address the problem of storage. Our framework allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to be compressed. Our algorithm can work at any rate, unlike other lossy compression algorithms. We envisage our algorithm as being part of a more general compression scheme that works with the entire FASTQ file. Numerical experiments show that we can achieve a better mean squared error (MSE) for small rates (bits per quality score) than other lossy compression schemes. For the organism PhiX, whose assembled genome is known and assumed to be correct, we show that it is possible to achieve a significant reduction in size with little compromise in performance on downstream applications (e.g., alignment).QualComp is an open source software package, written in C and freely available for download at https://sourceforge.net/projects/qualcomp.

View details for DOI 10.1186/1471-2105-14-187

View details for PubMedID 23758828
Efficient Similarity Queries via Lossy Compression 51st Annual Allerton Conference on Communication, Control, and Computing Ochoa, I., Ingber, A., Weissman, T. IEEE. 2013: 883–889

View details for Web of Science ID 000350802400122
Unsupervised Learning and Universal Communication IEEE International Symposium on Information Theory Proceedings (ISIT) Misra, V., Weissman, T. IEEE. 2013: 261–265

View details for Web of Science ID 000348913400053
The Role of Lookahead in Estimation under Gaussian Noise IEEE International Symposium on Information Theory Proceedings (ISIT) Venkat, K., Weissman, T., Carmon, Y., Shamai, S. IEEE. 2013: 2850–2854

View details for Web of Science ID 000348913402197
Minimax Filtering Regret via Relations Between Information and Estimation IEEE International Symposium on Information Theory Proceedings (ISIT) No, A., Weissman, T. IEEE. 2013: 444–448

View details for Web of Science ID 000348913400090
Complexity and Rate-Distortion Tradeoff via Successive Refinement 51st Annual Allerton Conference on Communication, Control, and Computing No, A., Ingber, A., Weissman, T. IEEE. 2013: 1531–1536

View details for Web of Science ID 000350802400213
Distortion Rate Function of Sub-Nyquist Sampled Gaussian Sources Corrupted by Noise 51st Annual Allerton Conference on Communication, Control, and Computing Kipnis, A., Goldsmith, A. J., Weissman, T., Eldar, Y. C. IEEE. 2013: 901–908

View details for Web of Science ID 000350802400125
Reliable Uncoded Communication in the Underdetermined SIMO MAC with Low-Complexity Decoding 51st Annual Allerton Conference on Communication, Control, and Computing Chowdhury, M., Goldsmith, A., Weissman, T. IEEE. 2013: 1075–1081

View details for Web of Science ID 000350802400148
Compression for Exact Match Identification IEEE International Symposium on Information Theory Proceedings (ISIT) Ingber, A., Courtade, T., Weissman, T. IEEE. 2013: 654–658

View details for Web of Science ID 000348913400132
Pointwise Relations between Information and Estimation in the Poisson Channel IEEE International Symposium on Information Theory Proceedings (ISIT) Jiao, J., Venkat, K., Weissman, T. IEEE. 2013: 449–453

View details for Web of Science ID 000348913400091
Reliable Uncoded Communication in the SIMO MAC via Low-Complexity Decoding IEEE International Symposium on Information Theory Proceedings (ISIT) Chowdhury, M., Goldsmith, A., Weissman, T. IEEE. 2013: 1645–1649

View details for Web of Science ID 000348913401156
Network Compression: Worst-Case Analysis IEEE International Symposium on Information Theory Proceedings (ISIT) Asnani, H., Shomorony, I., Avestimehr, A. S., Weissman, T. IEEE. 2013: 196–200

View details for Web of Science ID 000348913400040
Capacity of a POST Channel with and without Feedback IEEE International Symposium on Information Theory Proceedings (ISIT) Asnani, H., Permuter, H. H., Weissman, T. IEEE. 2013: 2538–2542

View details for Web of Science ID 000348913402134
Relations between Information and Estimation in the presence of Feedback (invited contribution to) Information and Control in Networks (accepted) Asnani, H., Venkat, K., Weissman, T. Cambridge University Press. 2013
Operational Extremality of Gaussianity in Network Compression, Communication, and Coding IEEE Information Theory Workshop (ITW) Asnani, H., Shomorony, I., Avestimehr, A. S., Weissman, T. IEEE. 2013

View details for Web of Science ID 000330643200008
Quadratic Similarity Queries on Compressed Data Data Compression Conference (DCC) Ingber, A., Courtade, T., Weissman, T. IEEE. 2013: 441–450

View details for DOI 10.1109/DCC.2013.52

View details for Web of Science ID 000325712000045
Uncoded transmission in MAC channels achieves arbitrarily small error probability 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) Chowdhury, M., Goldsmith, A., Weissman, T. IEEE. 2013: 1983–1990

View details for Web of Science ID 000320654000274
On Real Time Coding with Limited Lookahead IEEE Trans. Inform. Theory Asnani, H., Weissman, T. 2013; 59 (6): 3582-3606
On Information, Estimation and Lookahead 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) Venkat, K., Weissman, T., Carmon, Y., Shamai, S. IEEE. 2013: 1292–1299

View details for Web of Science ID 000320654000176
Estimation with a Helper who Knows the Interference IEEE Trans. Inform. Theory Chi, Y.K., Soundararajan, R., Weissman, T. 2013; 59 (11): 7097-7117
The Human Genome Contracts Again Bioinformatics Pavlichin, D., Yona, G., Weissman, T. 2013
QualComp: a new lossy compressor for quality scores based on rate distortion theory BMC Bioinformatics Ochoa, I., Asnani, H., Bharadia, D., Chowdhury, M., Weissman, T. 2013: 14:187
Achievable Error Exponents in the Gaussian Channel with Rate-Limited Feedback IEEE Trans. Inform. Theory Mirghaderi, R., Goldsmith, A., Weissman, T. 2013; 59 (12): 8144-8156
Pointwise Relations Between Information and Estimation in Gaussian Noise IEEE TRANSACTIONS ON INFORMATION THEORY Venkat, K., Weissman, T. 2012; 58 (10): 6264-6281

View details for DOI 10.1109/TIT.2012.2206794

View details for Web of Science ID 000308965600002
Achievable Complexity-Performance Tradeoffs in Lossy Compression PROBLEMS OF INFORMATION TRANSMISSION Gupta, A., Verdu, S., Weissman, T. 2012; 48 (4): 352-375

View details for DOI 10.1134/S0032946012040060

View details for Web of Science ID 000314036400006
An MCMC Approach to Universal Lossy Compression of Analog Sources IEEE TRANSACTIONS ON SIGNAL PROCESSING Baron, D., Weissman, T. 2012; 60 (10): 5230-5240

View details for DOI 10.1109/TSP.2012.2206585

View details for Web of Science ID 000308963300018
Block and Sliding-Block Lossy Compression via MCMC IEEE TRANSACTIONS ON COMMUNICATIONS Jalali, S., Weissman, T. 2012; 60 (8): 2187-2198

View details for DOI 10.1109/TCOMM.2012.061412.110194

View details for Web of Science ID 000309203600017
Denoising via MCMC-Based Lossy Compression IEEE TRANSACTIONS ON SIGNAL PROCESSING Jalali, S., Weissman, T. 2012; 60 (6): 3092-3100

View details for DOI 10.1109/TSP.2012.2190597

View details for Web of Science ID 000304154500029
Cascade and Triangular Source Coding With Side Information at the First Two Nodes IEEE TRANSACTIONS ON INFORMATION THEORY Permuter, H. H., Weissman, T. 2012; 58 (6): 3339-3349

View details for DOI 10.1109/TIT.2012.2188273

View details for Web of Science ID 000304245100002
Lossy Compression of Discrete Sources via the Viterbi Algorithm IEEE TRANSACTIONS ON INFORMATION THEORY Jalali, S., Montanari, A., Weissman, T. 2012; 58 (4): 2475-2489

View details for DOI 10.1109/TIT.2011.2178059

View details for Web of Science ID 000302079800033
Mutual Information, Relative Entropy, and Estimation in the Poisson Channel IEEE TRANSACTIONS ON INFORMATION THEORY Atar, R., Weissman, T. 2012; 58 (3): 1302-1318

View details for DOI 10.1109/TIT.2011.2172572

View details for Web of Science ID 000300845900003
Cascade, Triangular and Two Way Source Coding with Degraded Side Information at the Second User IEEE Trans. Inform. Theory Chia, Y. K., Permuter, H., Weissman, T. 2012; 58 (1): 189-206
The Degraded Broadcast Channel with Action-Dependent States IEEE International Symposium on Information Theory Steinberg, Y., Weissman, T. IEEE. 2012: 596–600

View details for Web of Science ID 000312544300122
Estimation with a Helper Who Knows the Interference IEEE International Symposium on Information Theory Chia, Y., Soundararajan, R., Weissman, T. IEEE. 2012: 706–710

View details for Web of Science ID 000312544300144
Multiterminal Source Coding under Logarithmic Loss IEEE International Symposium on Information Theory Courtade, T. A., Weissman, T. IEEE. 2012: 761–765

View details for Web of Science ID 000312544300155
Reference Based Genome Compression IEEE Information Theory Workshop (ITW) Chern, B. G., Ochoa, I., Manolakos, A., No, A., Venkat, K., Weissman, T. IEEE. 2012: 427–431

View details for Web of Science ID 000313526400088
Universal Estimation of Directed Information via Sequential Probability Assignments IEEE International Symposium on Information Theory Jiao, J., Permuter, H. H., Zhao, L., Kim, Y., Weissman, T. IEEE. 2012: 523–527

View details for Web of Science ID 000312544300107
Worst-Case Source for Distributed Compression with Quadratic Distortion IEEE Information Theory Workshop (ITW) Shomorony, I., Avestimehr, A. S., Asnani, H., Weissman, T. IEEE. 2012: 187–191

View details for Web of Science ID 000313526400040
Successive Refinement with Cribbing Decoders and its Channel Coding Duals IEEE International Symposium on Information Theory Asnani, H., Permuter, H., Weissman, T. IEEE. 2012

View details for Web of Science ID 000312544301093
Cascade, Triangular, and Two-Way Source Coding With Degraded Side Information at the Second User IEEE TRANSACTIONS ON INFORMATION THEORY Chia, Y., Permuter, H. H., Weissman, T. 2012; 58 (1): 189-206

View details for DOI 10.1109/TIT.2011.2167738

View details for Web of Science ID 000298989200016
Pointwise Relations between Information and Estimation in Gaussian Noise IEEE International Symposium on Information Theory Venkat, K., Weissman, T. IEEE. 2012: 701–705

View details for Web of Science ID 000312544300143
Joint Source-Channel Coding of one Random Variable over the Poisson Channel IEEE International Symposium on Information Theory No, A., Venkat, K., Weissman, T. IEEE. 2012

View details for Web of Science ID 000312544302044
The Porosity of Additive Noise Sequences IEEE International Symposium on Information Theory Misra, V., Weissman, T. IEEE. 2012

View details for Web of Science ID 000312544302136
Achievable Complexity-Performance Tradeoffs in Lossy Compression Problems of Information Transmission Gupta, A., Verdú, S., Weissman, T. 2012; 48 (4): 352-375
An MCMC Approach to Universal Lossy Compression of Analog Sources IEEE Trans. Sig. Proc. Baron, D., Weissman, T. 2012; 60 (10): 5230-5240
Cascade and Triangular Source Coding with Side Information at the First Two Nodes IEEE Trans. Inform. Theory Permuter, H. H., Weissman, T. 2012; 58 (6): 3309-3349
Denoising via MCMC-based Lossy Compression IEEE Trans. Sig. Proc. Jalali, S., Weissman, T. 2012; 60 (6): 3092-3100
Block and Sliding-Block Lossy Compression via MCMC IEEE Trans. on Communications Jalali, S., Weissman, T. 2012; 60 (8): 2187-2198
Mutual Information, Relative Entropy, and Estimation in the Poisson Channel IEEE Trans. Inform. Theory Atar, R., Weissman, T. 2012; 58 (3): 1302-1318
Probing Capacity IEEE TRANSACTIONS ON INFORMATION THEORY Asnani, H., Permuter, H., Weissman, T. 2011; 57 (11): 7317-7332

View details for DOI 10.1109/TIT.2011.2162089

View details for Web of Science ID 000297046100008
Source Coding With a Side Information "Vending Machine" IEEE TRANSACTIONS ON INFORMATION THEORY Permuter, H. H., Weissman, T. 2011; 57 (7): 4530-4544

View details for DOI 10.1109/TIT.2011.2145510

View details for Web of Science ID 000291922500035
Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing IEEE TRANSACTIONS ON INFORMATION THEORY Permuter, H. H., Kim, Y., Weissman, T. 2011; 57 (6): 3248-3259

View details for DOI 10.1109/TIT.2011.2136270

View details for Web of Science ID 000291003900007
Error Exponents for the Gaussian Channel With Active Noisy Feedback IEEE TRANSACTIONS ON INFORMATION THEORY Kim, Y., Lapidoth, A., Weissman, T. 2011; 57 (3): 1223-1236

View details for DOI 10.1109/TIT.2011.2104991

View details for Web of Science ID 000287657200003
Error Exponents for the Gaussian Channel with Active Noisy Feedback IEEE Trans. Inform. Theory Kim, Y.H., Lapidoth, A., Weissman, T. 2011; 57 (3): 1223-1236
Mutual Information, Relative Entropy, and Estimation in the Poisson Channel IEEE International Symposium on Information Theory (ISIT) Atar, R., Weissman, T. IEEE. 2011: 708–712

View details for Web of Science ID 000297465100142
Discrete denoising of heterogeneous two-dimensional data IEEE International Symposium on Information Theory (ISIT) Moon, T., Weissman, T., Kim, J. IEEE. 2011: 1041–1045

View details for Web of Science ID 000297465101045
Continuous-Time Directed Information and Its Role in Communication IEEE Information Theory Workshop (ITW) Permuter, H. H., Kim, Y., Weissman, T. IEEE. 2011

View details for Web of Science ID 000299416200039
Bounds on the Entropy Rate of Binary Hidden Markov Processes Entropy of Hidden Markov Processes and Connections to Dynamical Systems Ordentlich, E., Weissman, T. edited by Marcus, B., Petersen, K., Weissman, T. Cambridge University Press. 2011: 117–171
Entropy of Hidden Markov Processes and Connections to Dynamical Systems Marcus, B., Petersen, K., Weissman, T. edited by Marcus, B., Petersen, K., Weissman, T. Cambridge University Press. 2011
Cascade and Triangular source coding with causal side information IEEE International Symposium on Information Theory (ISIT) Chia, Y., Weissman, T. IEEE. 2011: 1683–1687

View details for Web of Science ID 000297465101175
Multi-terminal Source Coding With Action Dependent Side Information IEEE International Symposium on Information Theory (ISIT) Chia, Y., Asnani, H., Weissman, T. IEEE. 2011

View details for Web of Science ID 000297465102063
To Feed or Not to Feed Back IEEE International Symposium on Information Theory (ISIT) Asnani, H., Permuter, H., Weissman, T. IEEE. 2011: 159–163

View details for Web of Science ID 000297465100033
Capacity of Channels With Action-Dependent States IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T. 2010; 56 (11): 5396-5411

View details for DOI 10.1109/TIT.2010.2068991

View details for Web of Science ID 000283449000003
The Relationship Between Causal and Noncausal Mismatched Estimation in Continuous-Time AWGN Channels IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T. 2010; 56 (9): 4256-4273

View details for DOI 10.1109/TIT.2010.2054430

View details for Web of Science ID 000283072400006
Tighter Bounds on the Capacity of Finite-State Channels Via Markov Set-Chains IEEE TRANSACTIONS ON INFORMATION THEORY Chen, J., Permuter, H., Weissman, T. 2010; 57 (8): 3660-3691

View details for DOI 10.1109/TIT.2010.2050825

View details for Web of Science ID 000282001700003
Two-Way Source Coding With a Helper IEEE TRANSACTIONS ON INFORMATION THEORY Permuter, H. H., Steinberg, Y., Weissman, T. 2010; 56 (6): 2905-2919

View details for DOI 10.1109/TIT.2010.2046238

View details for Web of Science ID 000277880200031
Universal Reinforcement Learning IEEE TRANSACTIONS ON INFORMATION THEORY Farias, V. F., Moallemi, C. C., Van Roy, B., Weissman, T. 2010; 56 (5): 2441-2454

View details for DOI 10.1109/TIT.2010.2043762

View details for Web of Science ID 000278067900026
A Universal Scheme for Wyner-Ziv Coding of Discrete Sources IEEE TRANSACTIONS ON INFORMATION THEORY Jalali, S., Verdu, S., Weissman, T. 2010; 56 (4): 1737-1750

View details for DOI 10.1109/TIT.2010.2040889

View details for Web of Science ID 000275999500023
Capacity of Channels with Action-Dependent States IEEE Trans. Inform. Theory Weissman, T. 2010; 56 (11): 5396-5411
An MCMC Approach to Lossy Compression of Continuous Sources Baron, D., Weissman, T. edited by Storer, J. A., Marcellin, M. W. IEEE COMPUTER SOC. 2010: 40–48

View details for DOI 10.1109/DCC.2010.11

View details for Web of Science ID 000397228500005
Cascade and Triangular Source Coding with Side Information at the First Two Nodes 2010 IEEE International Symposium on Information Theory Permuter, H. H., Weissman, T. IEEE. 2010: 31–35

View details for Web of Science ID 000287512700007
Universal Lossless Compression-based Denoising 2010 IEEE International Symposium on Information Theory Su, H., Weissman, T. IEEE. 2010: 1648–1652

View details for Web of Science ID 000287512700332
The Relationship Between Causal and Non-Causal Mismatched Estimation in Continuous-Time AWGN Channels IEEE Trans. Inform. Theory Weissman, T. 2010; 56 (9): 4256-4273
A Universal Scheme for Wyner-Ziv Coding of Discrete Sources IEEE Trans. Inform. Theory Jalali, S., Verdú, S., Weissman, T. 2010; 56 (4): 1737-1750
Universal Estimation of Directed Information 2010 IEEE International Symposium on Information Theory Zhao, L., Permuter, H., Kim, Y., Weissman, T. IEEE. 2010: 1433–1437

View details for Web of Science ID 000287512700289
Discrete Denoising With Shifts IEEE TRANSACTIONS ON INFORMATION THEORY Moon, T., Weissman, T. 2009; 55 (11): 5284-5301

View details for DOI 10.1109/TIT.2009.2030461

View details for Web of Science ID 000271019700036
A Context Quantization Approach to Universal Denoising IEEE TRANSACTIONS ON SIGNAL PROCESSING Sivaramakrishnan, K., Weissman, T. 2009; 57 (6): 2110-2129

View details for DOI 10.1109/TSP.2008.2011847

View details for Web of Science ID 000266333200007
Capacity Region of the Finite-State Multiple-Access Channel With and Without Feedback IEEE International Symposium on Information Theory Permuter, H. H., Weissman, T., Chen, J. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2009: 2455–77

View details for DOI 10.1109/TIT.2009.2018346

View details for Web of Science ID 000266878500003
Universal FIR MMSE Filtering IEEE TRANSACTIONS ON SIGNAL PROCESSING Moon, T., Weissman, T. 2009; 57 (3): 1068-1083

View details for DOI 10.1109/TSP.2008.2009894

View details for Web of Science ID 000263431900021
Finite State Channels With Time-Invariant Deterministic Feedback IEEE TRANSACTIONS ON INFORMATION THEORY Permuter, H. H., Weissman, T., Goldsmith, A. J. 2009; 55 (2): 644-662

View details for DOI 10.1109/TIT.2008.2009849

View details for Web of Science ID 000263375500015
A Context Quantization Approach to Universal denoising IEEE Trans. Sig. Proc. Sivaramakrishnan, K., Weissman, T. 2009; 57 (6): 2110-2129
Two-way source coding with a common helper IEEE International Symposium on Information Theory (ISIT 2009) Permuter, H., Steinberg, Y., Weissman, T. IEEE. 2009: 1473–1477

View details for Web of Science ID 000280141401006
Problems We Can Solve With a Helper IEEE Information Theory Workshop on Networking and Information Theory Permuter, H., Steinberg, Y., Weissman, T. IEEE. 2009: 266–270

View details for Web of Science ID 000273966100056
An Outer Bound for Side-Information Scalable Source Coding with Partially Cooperating Decoders IEEE International Symposium on Information Theory (ISIT 2009) Bross, S. I., Weissman, T. IEEE. 2009: 1194–1198

View details for Web of Science ID 000280141400243
Multiple Description Coding of Discrete Ergodic Sources 47th Annual Allerton Conference on Communication, Control, and Computing Jalali, S., Weissman, T. IEEE. 2009: 1256–1261

View details for Web of Science ID 000279627100172
Directed Information, Causal Estimation, and Communication in Continuous Time 7th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Kim, Y., Permuter, H. H., Weissman, T. IEEE. 2009: 632–638

View details for Web of Science ID 000275175100084
Source Coding with a Side Information 'Vending Machine' at the Decoder IEEE International Symposium on Information Theory (ISIT 2009) Permuter, H., Weissman, T. IEEE. 2009: 1030–1034

View details for Web of Science ID 000280141400210
Directed Information and Causal Estimation in Continuous Time IEEE International Symposium on Information Theory (ISIT 2009) Kim, Y., Permuter, H. H., Weissman, T. IEEE. 2009: 819–823

View details for Web of Science ID 000280141400167
An Iterative Scheme for Near Optimal and Universal Lossy Compression IEEE Information Theory Workshop on Networking and Information Theory Jalali, S., Montanari, A., Weissman, T. IEEE. 2009: 231–235

View details for Web of Science ID 000273966100049
Capacity of Channels with Action-Dependent States IEEE International Symposium on Information Theory (ISIT 2009) Weissman, T. IEEE. 2009: 1794–1798

View details for Web of Science ID 000280141401072
Discrete Denoising with Shifts IEEE Trans. Inform. Theory Moon, T., Weissman, T. 2009; 55 (11): 5284-5301
Finite-state channels with time-invariant deterministic feedback IEEE Trans. Inform. Theory Permuter, H., Weissman, T. 2009; 55 (2): 644-662
Capacity Region of the Multiple Access Channel With or Without Feedback IEEE Trans. Inform. Theory Permuter, H., Weissman, T. 2009; 55 (6): 2455-2477
Universal Denoising of Discrete-Time Continuous-Amplitude Signals IEEE TRANSACTIONS ON INFORMATION THEORY Sivaramakrishnan, K., Weissman, T. 2008; 54 (12): 5632-5660

View details for DOI 10.1109/TIT.2008.2006438

View details for Web of Science ID 000261648100022
Scanning and Sequential Decision Making for Multidimensional Data-Part II: The Noisy Case IEEE TRANSACTIONS ON INFORMATION THEORY Cohen, A., Weissman, T., Merhav, N. 2008; 54 (12): 5609-5631

View details for DOI 10.1109/TIT.2008.2006378

View details for Web of Science ID 000261648100021
The Information Lost in Erasures IEEE International Symposium on Information Theory Verdu, S., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2008: 5030–58

View details for DOI 10.1109/TIT.2008.929968

View details for Web of Science ID 000260426400016
Coding for additive white noise channels with feedback corrupted by quantization or bounded noise IEEE TRANSACTIONS ON INFORMATION THEORY Martins, N. C., Weissman, T. 2008; 54 (9): 4274-4282

View details for DOI 10.1109/TIT.2008.928289

View details for Web of Science ID 000258913400028
How to filter an "individual sequence with feedback" 44th Annual Allerton Conference on Communication, Control and Computing Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2008: 3831–41

View details for DOI 10.1109/TIT.2008.926457

View details for Web of Science ID 000257861400047
Capacity of the trapdoor channel with feedback IEEE International Symposium on Information Theory Permuter, H., Cuff, P., Van Roy, B., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2008: 3150–65

View details for DOI 10.1109/TIT.2008.924681

View details for Web of Science ID 000257111500021
Universal filtering via hidden Markov modeling IEEE International Symposium on Information Theory and Its Applications Moon, T., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2008: 692–708

View details for DOI 10.1109/TIT.2007.913220

View details for Web of Science ID 000252612200012
Coding Schemes for Additive White Noise Channels with Feedback Corrupted by Quantization or Bounded Noise IEEE Trans. Inform. Theory Martins, N.C., Weissman, T. 2008; 54 (9): 4274-4282
On Successive Refinement for the Wyner-Ziv Problem with Partially Cooperating Decoders 2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6 Bross, S. I., Weissman, T. 2008: 2172-?

View details for Web of Science ID 000260364401158
Rate-Distortion in Near-Linear Time IEEE International Symposium on Information Theory Gupta, A., Verdu, S., Weissman, T. IEEE. 2008: 847–851

View details for Web of Science ID 000260364400171
Near optimal lossy source coding and compression-based denoising via Markov Chain Monte Carlo 42nd Annual Conference on Information Sciences and Systems Jalali, S., Weissman, T. IEEE. 2008: 441–446

View details for Web of Science ID 000258931600084
RATE-DISTORTION WITH COMMON RATE-LIMITED SIDE INFORMATION TO THE ENCODER AND DECODER 47th AIAA Aerospace Sciences Meeting and Exhibit including the New Horizons Forum and Aerospace Exposition Permuter, H., Steinberg, Y., Weissman, T. IEEE. 2008: 777–779

View details for Web of Science ID 000265073200166
Rate-Distortion via Markov Chain Monte Carlo IEEE International Symposium on Information Theory Jalali, S., Weissman, T. IEEE. 2008: 852–856

View details for Web of Science ID 000260364400172
Lossy source coding via Markov chain Monte Carlo International Zurich Seminar on Communications Jalali, S., Weissman, T. IEEE. 2008: 80–83

View details for Web of Science ID 000255784400021
On Directed Information and Gambling IEEE International Symposium on Information Theory Permuter, H. H., Kim, Y., Weissman, T. IEEE. 2008: 1403–1407

View details for Web of Science ID 000260364401001
On the Capacity of Finite-State Channels IEEE International Symposium on Information Theory Chen, J., Permuter, H., Weissman, T. IEEE. 2008: 1223–1227

View details for Web of Science ID 000260364400247
New Bounds for the Capacity Region of the Finite-State Multiple Access Channel IEEE International Symposium on Information Theory Permuter, H. H., Weissman, T., Chen, J. IEEE. 2008: 394–398

View details for Web of Science ID 000260364400080
Scanning and sequential decision making for multi-dimensional data, Part II: the noisy case IEEE Trans. Inform. Theory Cohen, A., Weissman, T., Merhav, N. 2008; 54 (12): 5609-5631
An Implementable Scheme for Universal Lossy Compression of Discrete Markov Sources 19th Data Compression Conference Jalali, S., Montanari, A., Weissman, T. IEEE COMPUTER SOC. 2008: 292–301

View details for DOI 10.1109/DCC.2009.72

View details for Web of Science ID 000268029900030
Universal denoising of discretetime continuous-amplitude signals IEEE Trans. Inform. Theory Sivaramakrishnan, K., Weissman, T. 2008; 54 (12): 5632-5660
The Information Lost in Erasures IEEE Trans. Inform. Theory Verdú, S., Weissman, T. 2008; 54 (11): 5030-5058
How to filter an 'individual sequence with feedback' IEEE Trans. Inform. Theory Weissman, T. 2008; 54 (8): 3831-3841
Universal FIR MMSE filtering IEEE Trans. Sig. Proc. Moon, T., Weissman, T. 2008; 57 (3): 1068-1083
Discrete Universal Filtering via Hidden Markov Modeling IEEE Trans. Inform. Theory Moon, T., Weissman, T. 2008; 54 (2): 692 - 708
Scanning and sequential decision making for multidimensional data - Part I : The noiseless case IEEE International Symposium on Information Theory Cohen, A., Merhav, N., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2007: 3001–20

View details for DOI 10.1109/TIT.2007.903117

View details for Web of Science ID 000249025100001
Universal filtering via prediction Data Compression Conference (DCC 2004) Weissman, T., Ordentlich, E., Weinberger, M. J., Somekh-Baruch, A., Merhav, N. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2007: 1253–64

View details for DOI 10.1109/TIT.2007.892782

View details for Web of Science ID 000245306000001
Denoising and filtering under the probability of excess loss criterion 43rd Annual Allerton Conference on Communication, Control and Computing Pereira, S., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2007: 1265–81

View details for DOI 10.1109/TIT.2007.892772

View details for Web of Science ID 000245306000002
Denoising and filtering under the probability of excess loss criterion IEEE Trans. Inform. Theory Matloub, S., Weissman, T. 2007; 53 (4): 1265 - 1281
Scanning and sequential decision making for multidimensional data IEEE Information Theory and Applications Workshop Cohen, A., Merhav, N., Weissman, T. IEEE. 2007: 99–108

View details for Web of Science ID 000250812500016
Scanning, filtering and prediction for random fields corrupted by Gaussian noise IEEE International Symposium on Information Theory Cohen, A., Merhav, N., Weissman, T. IEEE. 2007: 691–695

View details for Web of Science ID 000257010200139
The Gaussian channel with noisy feedback IEEE International Symposium on Information Theory Kim, Y., Lapidoth, A., Weissman, T. IEEE. 2007: 1416–1420

View details for Web of Science ID 000257010201115
Capacity and zero-error capacity of the chemical channel with feedback IEEE International Symposium on Information Theory Permuter, H., Cuff, P., Van Roy, B., Weissman, T. IEEE. 2007: 1866–1870

View details for Web of Science ID 000257010202035
Competitive on-line linear FIR MMSE filtering IEEE International Symposium on Information Theory Moon, T., Weissman, T. IEEE. 2007: 1126–1130

View details for Web of Science ID 000257010201057
A context quantization approach to universal denoising IEEE International Symposium on Information Theory Sivaramakrishnan, K., Weissman, T. IEEE. 2007: 2361–2365

View details for Web of Science ID 000257010202134
Scanning and sequential decision making for multi-dimensional data, Part I: the noiseless case IEEE Trans. Inform. Theory Cohen, A., Merhav, N., Weissman, T. 2007; 53 (9): 3001 - 3020
New bounds on the rate-distortion function of a binary Markov source IEEE International Symposium on Information Theory Jalali, S., Weissman, T. IEEE. 2007: 571–575

View details for Web of Science ID 000257010200115
A universal Wyner-Ziv scheme for discrete sources IEEE International Symposium on Information Theory Jalali, S., Verdu, S., Weissman, T. IEEE. 2007: 1951–1955

View details for Web of Science ID 000257010202052
On separation in the presence of feedback IEEE Information Theory Workshop Permuter, H., Weissman, T. IEEE. 2007: 266–270

View details for Web of Science ID 000253441300046
Reflections on the DUDE IEEE Information Theory Society Newsletter Ordentlich, E., Serouss, G., Verd_x0013_u, S., Weinberger, M., Weissman, T. 2007; 57 (2): 5-10
Source coding with limited-look-ahead side information at the decoder IEEE International Symposium on Information Theory Weissman, T., El Gamal, A. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2006: 5218–39

View details for DOI 10.1109/TIT.2006.885500

View details for Web of Science ID 000242503300003
Universal zero-delay joint source-channel coding 38th Conference on Information Sciences and Systems Matloub, S., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2006: 5240–50

View details for DOI 10.1109/TIT.2006.885537

View details for Web of Science ID 000242503300004
On the entropy rate of pattern processes IEEE TRANSACTIONS ON INFORMATION THEORY Gemelos, G. M., Weissman, T. 2006; 52 (9): 3994-4007

View details for DOI 10.1109/TIT.2006.880044

View details for Web of Science ID 000240076700008
Coding for the feedback Gel'fand-Pinsker channel and the feedforward Wyner-Ziv source IEEE International Symposium on Information Theory and Its Applications Merhav, N., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2006: 4207–11

View details for Web of Science ID 000240076700023
Universal minimax discrete denoising under channel uncertainty IEEE International Symposium on Information Theory Gemelos, G. M., Sigurjonsson, S., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2006: 3476–97

View details for DOI 10.1109/TIT.2006.878234

View details for Web of Science ID 000239408700006
Algorithms for discrete denoising under channel uncertainty IEEE TRANSACTIONS ON SIGNAL PROCESSING Gemelos, G. A., Sigurjonsson, S., Weissman, T. 2006; 54 (6): 2263-2276

View details for DOI 10.1109/TSP.2006.874295

View details for Web of Science ID 000237900000026
Coding for the feedback Gel'fand-Pinsker channel and the feedforward Wyner-Ziv source IEEE Trans. Inform. Theory Weissman, T., Merhav, N. 2006; 52 (9): 4207 - 4211
Universal scanning and sequential decision making for multidimensional data IEEE International Symposium on Information Theory Cohen, A., Merhav, N., Weissman, T. IEEE. 2006: 431–435

View details for Web of Science ID 000245289700089
Erasure entropy IEEE International Symposium on Information Theory Verdu, S., Weissman, T. IEEE. 2006: 98–102

View details for Web of Science ID 000245289700021
Universal scanning of mixing ra the performance of the peano ndom fields and Peano-Hilbert scan 24th IEEE Convention of Electrical and Electronics Engineers in Israel Cohen, A., Merhav, N., Weissman, T. IEEE. 2006: 62–66

View details for Web of Science ID 000246334200014
On the optimality of symbol-by-symbol filtering and denoising IEEE International Symposium on Information Theory Ordentlich, E., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2006: 19–40

View details for DOI 10.1109/TIT.2005.860432

View details for Web of Science ID 000234412500002
Compound sequential decisions against the well-informed antagonist IEEE Information Theory Workshop Weissman, T. IEEE. 2006: 77–81

View details for Web of Science ID 000245205900017
Universal denoising of continuous amplitude signals with applications to images IEEE International Conference on Image Processing (ICIP 2006) Sivaramakrishnan, K., Weissman, T. IEEE. 2006: 2609–2612

View details for Web of Science ID 000245768501271
Bounds on the error exponent of the AWGN channel with AWGN-corrupted feedback 24th IEEE Convention of Electrical and Electronics Engineers in Israel Kim, Y., Lapidoth, A., Weissman, T. IEEE. 2006: 184–188

View details for Web of Science ID 000246334200042
Capacity of finite-state channels with time-invariant deterministic feedback IEEE International Symposium on Information Theory Permuter, H., Weissman, T., Goldsmith, A. IEEE. 2006: 64–68

View details for Web of Science ID 000245289700014
Universal Zero-Delay Joint Source-Channel Coding IEEE Trans. Inform. Theory Matloub, S., Weissman, T. 2006; 52 (12): 5240 - 5250
Universal Minimax Discrete Denoising under Channel Uncertainty IEEE Trans. Inform. Theory Gemelos, G., Sigurjonsson, S., Weissman, T. 2006; 52 (8): 3476-3497
Universal denoising of discrete-time continuous-amplitude signals IEEE International Symposium on Information Theory Sivaramakrishnan, K., Weissman, T. IEEE. 2006: 2531–2535

View details for Web of Science ID 000245289705020
Source Coding with Limited Side Information Lookahead at the Decoder IEEE Trans. Inform. Theory Weissman, T., Gamal, A., El 2006; 52 (12): 5218 - 5239
Source coding with limited side information lookahead at the decoder IEEE International Symposium on Information Theory El Gamal, A., Weissman, T. IEEE. 2006: 2441–2445

View details for Web of Science ID 000245289705001
2006 Kailath Lecture and Colloquia Verdú, S., Weissman, T. Newsletter article in the IEEE Information Theory Society Newsletter. 2006
On the Entropy Rate of Pattern Processes IEEE Trans. Inform. Theory Gemelos, G., Weissman, T. 2006; 52 (9): 3994 - 4007
Algorithms for Discrete Denoising under Channel Uncertainty IEEE Trans. Inform. Theory Gemelos, G., Sigurjonsson, S., Weissman, T. 2006; 54 (6): 2263-2276
On the Optimality of Symbol by Symbol Fitering and Denoising IEEE Trans. Inform. Theory Ordentlich, E., Weissman, T. 2006; 52 (1): 19-40
The empirical distribution of rate-constrained source codes IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T., Ordentlich, E. 2005; 51 (11): 3718-3733

View details for DOI 10.1109/TIT.2005.856982

View details for Web of Science ID 000233047800002
On causal source codes with side information 40th Annual Allerton Conference on Communication, Control and Computing Weissman, T., Merhav, N. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2005: 4003–13

View details for DOI 10.1109/TIT.2005.856978

View details for Web of Science ID 000233047800025
Universal denoising for the finite-input general-output channel IEEE International Symposium on Information Theory Dembo, A., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2005: 1507–17

View details for DOI 10.1109/TIT.2005.844104

View details for Web of Science ID 000228050800020
Universal Denoising for the Finite-Input- General-Output Channel IEEE Trans. Inform. Theory Dembo, A., Weissman, T. 2005; 51 (4): 1507-1517
DISCRETE DENOISING FOR CHANNELS WITH MEMORY COMMUNICATIONS IN INFORMATION AND SYSTEMS Zhang, R., Weissman, T. 2005; 5 (2): 257–88

View details for Web of Science ID 000213658900005
On the entropy rate of pattern processes IEEE Data Compression Conference Gemelos, G. M., Weissman, T. IEEE COMPUTER SOC. 2005: 233–242

View details for Web of Science ID 000229070000024
Universal discrete denoising: Known channel IEEE International Symposium on Information Theory Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., Weinberger, M. J. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2005: 5–28

View details for DOI 10.1109/TIT.2004.839518

View details for Web of Science ID 000226179300001
On the relationship between process and pattern entropy rate IEEE International Symposium on Information Theory and Its Applications Gemelos, G. M., Weissman, T. IEEE. 2005: 2208–2212

View details for Web of Science ID 000234713801208
Coding for the feedback Gel'fand-Pinsker channel and the feedforward Wyner-Ziv source IEEE International Symposium on Information Theory and Its Applications Merhav, N., Weissman, T. IEEE. 2005: 1506–1510

View details for Web of Science ID 000234713801064
Approximations for the entropy rate of a hidden Markov process IEEE International Symposium on Information Theory and Its Applications Ordentlich, E., Weissman, T. IEEE. 2005: 2198–2202

View details for Web of Science ID 000234713801206
Multi-directional context sets with applications to universal denoising and compression IEEE International Symposium on Information Theory and Its Applications Ordentlich, E., Weinberger, M. J., Weissman, T. IEEE. 2005: 1270–1274

View details for Web of Science ID 000234713801015
Discrete universal filtering via hidden Markov modelling IEEE International Symposium on Information Theory and Its Applications Moon, T., Weissman, T. IEEE. 2005: 1285–1289

View details for Web of Science ID 000234713801018
A universal scheme for learning IEEE International Symposium on Information Theory and Its Applications Farias, V. F., Moallemi, C. C., Van Roy, B., Weissman, T. IEEE. 2005: 1158–1162

View details for Web of Science ID 000234713800244
Asymptotic filtering and entropy rate of a hidden Markov process in the rare transitions regime IEEE International Symposium on Information Theory and Its Applications Nair, C., Ordentlich, E., Weissman, T. IEEE. 2005: 1838–1842

View details for Web of Science ID 000234713801132
On Causal Source Codes with Side Information IEEE Trans. Inform. Theory Weissman, T., Merhav, N. 2005; 51 (11): 4003-4013
Discrete Denoising for Channels with Memory Comm. in Information and Systems Zhang, R., Weissman, T. 2005; 5 (2): 257-288
The empirical distribution of rate-constrained codes IEEE Trans. Inform. Theory Weissman, T., Ordentlich, E. 2005; 51 (11): 3718-3733
Universal Discrete Denoising: Known Channel IEEE Trans. Inform. Theory Weissman, T., Ordentlich, E., Seroussi, G., Verdú, S., Weinberger, M. 2005; 51 (1): 5-28
Universally attainable error exponents for rate-distortion coding of noisy sources IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T. 2004; 50 (6): 1229-1246

View details for DOI 10.1109/TIT.2004.828061

View details for Web of Science ID 000221803300019
Universal prediction of random binary sequences in a noisy environment ANNALS OF APPLIED PROBABILITY Weissman, T., Merhav, N. 2004; 14 (1): 54-89

View details for Web of Science ID 000188878000002
Universal prediction of random binary sequences in a noisy environment Annals of Applied Probability Weissman, T., Merhav, N. 2004; 14 (1): 54-89
Universal denoising for the finite-input-general-output channel IEEE International Symposium on Information Theory Dembo, A., Weissman, T. IEEE. 2004: 201–201

View details for Web of Science ID 000223202600201
The empirical distribution of rate - Constrained source codes IEEE International Symposium on Information Theory Weissman, T., Ordentlich, E. IEEE. 2004: 464–464

View details for Web of Science ID 000223202600464
New bounds on the entropy rate of hidden Markov processes IEEE Information Theory Workshop Ordentlich, E., Weissman, T. IEEE. 2004: 117–122

View details for Web of Science ID 000226087200022
Channel decoding of systematically encoded unknown redundant sources IEEE International Symposium on Information Theory Ordentlich, E., Seroussi, G., Verdu, S., Viswanathan, K., Weinberger, M. J., Weissman, T. IEEE. 2004: 165–165

View details for Web of Science ID 000223202600165
Efficient pruning of bi-directional context trees with applications to universal denoising and compression IEEE Information Theory Workshop Ordentlich, E., Weinberger, M. J., Weissman, T. IEEE. 2004: 94–98

View details for Web of Science ID 000226087200018
On the optimality of symbol by symbol filtering and denoising IEEE International Symposium on Information Theory Ordentlich, E., Weissman, T. IEEE. 2004: 200–200

View details for Web of Science ID 000223202600200
Universal minimax binary image denoising under channel uncertainty International Conference on Image Processing (ICIP 2004) Gemelos, G., Sigurjonsson, S., Weissman, T. IEEE. 2004: 997–1000

View details for Web of Science ID 000228043501072
Discrete universal filtering through incremental parsing Data Compression Conference (DCC 2004) Ordentlich, E., Weissman, T., Weinberger, M. J., Somekh-Baruch, A., Merhav, N. IEEE COMPUTER SOC. 2004: 352–361

View details for Web of Science ID 000189502800036
Universal minimax discrete denoising under channel uncertainty IEEE International Symposium on Information Theory Gemelos, G., Sigurjonsson, S., Weissman, T. IEEE. 2004: 199–199

View details for Web of Science ID 000223202600199
On the Entropy Rate of Pattern Sequences HP Laboratories Technical Report Gemelos, G., Weissman, T. 2004
Universally Attainable Error-Exponents for Rate-Distortion Coding of Noisy Sources IEEE Trans. Inform. Theory Weissman, T. 2004; 50 (6): 1229-1246
On competitive prediction and its relation to rate-distortion theory IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T., Merhav, N. 2003; 49 (12): 3185-3194

View details for DOI 10.1109/TIT.2003.820014

View details for Web of Science ID 000187958600005
The minimax distortion redundancy in noisy source coding IEEE TRANSACTIONS ON INFORMATION THEORY Dembo, A., Weissman, T. 2003; 49 (11): 3020-3030

View details for DOI 10.1109/TIT.2003.819336

View details for Web of Science ID 000186618500022
Scanning and Prediction in Multi-Dimensional Data Arrays IEEE Trans. Inform. Theory Merhav, N., Weissman, T. 2003; IT-49 (1): 65-82
Scanning and prediction in multidimensional data arrays IEEE International Symposium on Information Theory Merhav, N., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2003: 65–82

View details for DOI 10.1109/TIT.2002.806134

View details for Web of Science ID 000180370400006
A discrete universal denoiser and its application to binary images IEEE International Conference on Image Processing Ordentlich, E., Seroussi, G., Verdu, S., Weinberger, M., Weissman, T. IEEE. 2003: 117–120

View details for Web of Science ID 000187010000030
Universally attainable error-exponents for rate-distortion coding of noisy sources IEEE International Symposium on Information Theory Weissman, T. IEEE. 2003: 194–194

View details for Web of Science ID 000186112600194
On competitive prediction and its relation to rate-distortion theory and to channel capacity theory IEEE International Symposium on Information Theory Weissman, T., Merhav, N. IEEE. 2003: 81–81

View details for Web of Science ID 000186112600081
On Competitive Prediction and its Relation to Rate-Distortion Theory IEEE Trans. Inform. Theory Weissman, T., Merhav, N. 2003; 49 (12): 3185-3194
On limited-delay lossy coding and filtering of individual sequences IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T., Merhav, N. 2002; 48 (3): 721-733

View details for Web of Science ID 000173968200012
Tradeoffs between the excess-code-length exponent and the excess-distortion exponent in lossy source coding IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T., Merhav, N. 2002; 48 (2): 396-415

View details for Web of Science ID 000173485600004
Universally Attainable Error-Exponents for Rate-Constrained Denoising of Noisy Sources HP Laboratories Technical Report Weissman, T. 2002
The minimax distortion redundancy in noisy source coding IEEE International Symposium on Information Theory Dembo, A., Weissman, T. IEEE. 2002: 318–318

View details for Web of Science ID 000177912700317
On Competitive Prediction and its Relation to Rate-Distortion Theory and to Channel Capacity Theory Technion – I.I.T., CC Pub., EE Pub. Weissman, T., Merhav, N. 2002
On limited-delay lossy coding and filtering of individual sequences IEEE Trans. Inform. Theory Weissman, T., Merhav, N. 2002; IT-48 (3): 721-733
Tradeoffs between the excess-code-length exponent and the excess-distortion exponent in lossy source coding IEEE Trans. Inform. Theory Weissman, T., Merhav, N. 2002; IT-48 (2): 396-415
Universal prediction of individual binary sequences in the presence of noise IEEE TRANSACTIONS ON INFORMATION THEORY Weissman, T., Merhav, N. 2001; 47 (6): 2151-2173

View details for Web of Science ID 000170717700003
Twofold universal prediction schemes for achieving the finite-state predictability of a noisy individual binary sequence IEEE International Symposium on Information Theory Weissman, T., Merhav, N., Somekh-Baruch, A. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2001: 1849–66

View details for Web of Science ID 000169487900013
Twofold universal prediction schemes for achieving the finite-state predictability of a noisy individual binary sequence IEEE Trans. Inform. Theory Weissman, T., Merhav, N., Somekh-Baruch, A. 2001; IT-47 (5): 1849-1866
Tradeoffs between the excess-code-length exponent and the excess-distortion exponent in lossy source coding Technical Report, CCIT Pub., EE Pub. Weissman, T., Merhav, N. 2001
Universal prediction of binary individual sequences in the presence of noise IEEE Trans. Inform. Theory Weissman, T., Merhav, N. 2001; IT-47 (6): 2151-2173
Universal prediction of individual binary sequences in the presence of noise Technion – I.I.T., CC Pub., EE Pub. Weissman, T., Merhav, N. 1999
Not all universal codes are pointwise universal Weissman, T. unpublished manuscript available upon request and at http://www.stanford.edu/~tsachy/papers.html.

Itschak Weissman

Robert and Barbara Kleist Professor in the School of Engineering

Electrical Engineering

Bio

Academic Appointments

Administrative Appointments

Honors & Awards

Boards, Advisory Committees, Professional Organizations

Program Affiliations

Professional Education

Patents

Contact

Additional Info

Links

2025-26 Courses

2024-25 Courses

2023-24 Courses

2022-23 Courses

Stanford Advisees

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract