All Publications

  • Data Sanitization to Reduce Private Information Leakage from Functional Genomics. Cell Gursoy, G., Emani, P., Brannon, C. M., Jolanki, O. A., Harmanci, A., Strattan, J. S., Cherry, J. M., Miranker, A. D., Gerstein, M. 2020; 183 (4): 905


    The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.

    View details for DOI 10.1016/j.cell.2020.09.036

    View details for PubMedID 33186529

  • Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts BMC MEDICAL GENOMICS Gursoy, G., Brannon, C. M., Gerstein, M. 2020; 13 (1): 74


    As pharmacogenomics data becomes increasingly integral to clinical treatment decisions, appropriate data storage and sharing protocols need to be adopted. One promising option for secure, high-integrity storage and sharing is Ethereum smart contracts. Ethereum is a blockchain platform, and smart contracts are immutable pieces of code running on virtual machines in this platform that can be invoked by a user or another contract (in the blockchain network). The 2019 iDASH (Integrating Data for Analysis, Anonymization, and Sharing) competition for Secure Genome Analysis challenged participants to develop time- and space-efficient Ethereum smart contracts for gene-drug relationship data.Here we design a specific smart contract to store and query gene-drug interactions in Ethereum using an index-based, multi-mapping approach. Our contract stores each pharmacogenomics observation, a gene-variant-drug triplet with outcome, in a mapping searchable by a unique identifier, allowing for time and space efficient storage and query. This solution ranked in the top three at the 2019 IDASH competition. We further improve our "challenge solution" and develop an alternate "fastQuery" smart contract, which combines together identical gene-variant-drug combinations into a single storage entry, leading to significantly better scalability and query efficiency.On a private, proof-of-authority network, both our challenge and fastQuery solutions exhibit approximately linear memory and time usage for inserting into and querying small databases (<1,000 entries). For larger databases (1000 to 10,000 entries), fastQuery maintains this scaling. Furthermore, both solutions can query by a single field ("0-AND") or a combination of fields ("1- or 2-AND"). Specifically, the challenge solution can complete a 2-AND query from a small database (100 entries) in 35ms using 0.1 MB of memory. For the same query, fastQuery has a 2-fold improvement in time and a 10-fold improvement in memory.We show that pharmacogenomics data can be stored and queried efficiently using Ethereum blockchain. Our solutions could potentially be used to store a range of clinical data and extended to other fields requiring high-integrity data storage and efficient access.

    View details for DOI 10.1186/s12920-020-00732-x

    View details for Web of Science ID 000538117800002

    View details for PubMedID 32487214

    View details for PubMedCentralID PMC7268467