• Big data analytics for quality improvement and clinical effectiveness
• Disease biomarker discovery through multi-omics based analyses
• Assistant Professor Surgery
• B.S., Biochemistry, Fudan University, China (1990)
• M.A., Molecular and Developmental Biology, UCLA, US (1994)
• Ph.D., Biological Chemistry, UCLA, US (1996)
• Postdoctoral training, medicine/oncology/Computer science, Stanford University, US (1996-1998)
• Business administration, Leavey School of Business, Santa Clara University, US (2000-2001)
Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning.
Journal of medical Internet research
2018; 20 (1): e22
As a high-prevalence health condition, hypertension is clinically costly, difficult to manage, and often leads to severe and life-threatening diseases such as cardiovascular disease (CVD) and stroke.The aim of this study was to develop and validate prospectively a risk prediction model of incident essential hypertension within the following year.Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. Retrospective (N=823,627, calendar year 2013) and prospective (N=680,810, calendar year 2014) cohorts were formed. A machine learning algorithm, XGBoost, was adopted in the process of feature selection and model building. It generated an ensemble of classification trees and assigned a final predictive risk score to each individual.The 1-year incident hypertension risk model attained areas under the curve (AUCs) of 0.917 and 0.870 in the retrospective and prospective cohorts, respectively. Risk scores were calculated and stratified into five risk categories, with 4526 out of 381,544 patients (1.19%) in the lowest risk category (score 0-0.05) and 21,050 out of 41,329 patients (50.93%) in the highest risk category (score 0.4-1) receiving a diagnosis of incident hypertension in the following 1 year. Type 2 diabetes, lipid disorders, CVDs, mental illness, clinical utilization indicators, and socioeconomic determinants were recognized as driving or associated features of incident essential hypertension. The very high risk population mainly comprised elderly (age>50 years) individuals with multiple chronic conditions, especially those receiving medications for mental disorders. Disparities were also found in social determinants, including some community-level factors associated with higher risk and others that were protective against hypertension.With statewide EHR datasets, our study prospectively validated an accurate 1-year risk prediction model for incident essential hypertension. Our real-time predictive analytic model has been deployed in the state of Maine, providing implications in interventions for hypertension and related diseases and hopefully enhancing hypertension care.
View details for DOI 10.2196/jmir.9268
View details for PubMedID 29382633
Assessing Statewide All-Cause Future One-Year Mortality: Prospective Study With Implications for Quality of Life, Resource Utilization, and Medical Futility.
Journal of medical Internet research
2018; 20 (6): e10311
For many elderly patients, a disproportionate amount of health care resources and expenditures is spent during the last year of life, despite the discomfort and reduced quality of life associated with many aggressive medical approaches. However, few prognostic tools have focused on predicting all-cause 1-year mortality among elderly patients at a statewide level, an issue that has implications for improving quality of life while distributing scarce resources fairly.Using data from a statewide elderly population (aged ≥65 years), we sought to prospectively validate an algorithm to identify patients at risk for dying in the next year for the purpose of minimizing decision uncertainty, improving quality of life, and reducing futile treatment.Analysis was performed using electronic medical records from the Health Information Exchange in the state of Maine, which covered records of nearly 95% of the statewide population. The model was developed from 125,896 patients aged at least 65 years who were discharged from any care facility in the Health Information Exchange network from September 5, 2013, to September 4, 2015. Validation was conducted using 153,199 patients with same inclusion and exclusion criteria from September 5, 2014, to September 4, 2016. Patients were stratified into risk groups. The association between all-cause 1-year mortality and risk factors was screened by chi-squared test and manually reviewed by 2 clinicians. We calculated risk scores for individual patients using a gradient tree-based boost algorithm, which measured the probability of mortality within the next year based on the preceding 1-year clinical profile.The development sample included 125,896 patients (72,572 women, 57.64%; mean 74.2 [SD 7.7] years). The final validation cohort included 153,199 patients (88,177 women, 57.56%; mean 74.3 [SD 7.8] years). The c-statistic for discrimination was 0.96 (95% CI 0.93-0.98) in the development group and 0.91 (95% CI 0.90-0.94) in the validation cohort. The mortality was 0.99% in the low-risk group, 16.75% in the intermediate-risk group, and 72.12% in the high-risk group. A total of 99 independent risk factors (n=99) for mortality were identified (reported as odds ratios; 95% CI). Age was on the top of list (1.41; 1.06-1.48); congestive heart failure (20.90; 15.41-28.08) and different tumor sites were also recognized as driving risk factors, such as cancer of the ovaries (14.42; 2.24-53.04), colon (14.07; 10.08-19.08), and stomach (13.64; 3.26-86.57). Disparities were also found in patients' social determinants like respiratory hazard index (1.24; 0.92-1.40) and unemployment rate (1.18; 0.98-1.24). Among high-risk patients who expired in our dataset, cerebrovascular accident, amputation, and type 1 diabetes were the top 3 diseases in terms of average cost in the last year of life.Our study prospectively validated an accurate 1-year risk prediction model and stratification for the elderly population (≥65 years) at risk of mortality with statewide electronic medical record datasets. It should be a valuable adjunct for helping patients to make better quality-of-life choices and alerting care givers to target high-risk elderly for appropriate care and discussions, thus cutting back on futile treatment.
View details for DOI 10.2196/10311
View details for PubMedID 29866643
Estimating One-Year Risk of Incident Chronic Kidney Disease: Retrospective Development and Validation Study Using Electronic Medical Record Data From the State of Maine.
JMIR medical informatics
2017; 5 (3): e21
Chronic kidney disease (CKD) is a major public health concern in the United States with high prevalence, growing incidence, and serious adverse outcomes.We aimed to develop and validate a model to identify patients at risk of receiving a new diagnosis of CKD (incident CKD) during the next 1 year in a general population.The study population consisted of patients who had visited any care facility in the Maine Health Information Exchange network any time between January 1, 2013, and December 31, 2015, and had no history of CKD diagnosis. Two retrospective cohorts of electronic medical records (EMRs) were constructed for model derivation (N=1,310,363) and validation (N=1,430,772). The model was derived using a gradient tree-based boost algorithm to assign a score to each individual that measured the probability of receiving a new diagnosis of CKD from January 1, 2014, to December 31, 2014, based on the preceding 1-year clinical profile. A feature selection process was conducted to reduce the dimension of the data from 14,680 EMR features to 146 as predictors in the final model. Relative risk was calculated by the model to gauge the risk ratio of the individual to population mean of receiving a CKD diagnosis in next 1 year. The model was tested on the validation cohort to predict risk of CKD diagnosis in the period from January 1, 2015, to December 31, 2015, using the preceding 1-year clinical profile.The final model had a c-statistic of 0.871 in the validation cohort. It stratified patients into low-risk (score 0-0.005), intermediate-risk (score 0.005-0.05), and high-risk (score ≥ 0.05) levels. The incidence of CKD in the high-risk patient group was 7.94%, 13.7 times higher than the incidence in the overall cohort (0.58%). Survival analysis showed that patients in the 3 risk categories had significantly different CKD outcomes as a function of time (P<.001), indicating an effective classification of patients by the model.We developed and validated a model that is able to identify patients at high risk of having CKD in the next 1 year by statistically learning from the EMR-based clinical history in the preceding 1 year. Identification of these patients indicates care opportunities such as monitoring and adopting intervention plans that may benefit the quality of care and outcomes in the long term.
View details for DOI 10.2196/medinform.7954
View details for PubMedID 28747298
View details for PubMedCentralID PMC5550735
Defining and characterizing the critical transition state prior to the type 2 diabetes disease.
2017; 12 (7): e0180937
Type 2 diabetes mellitus (T2DM), with increased risk of serious long-term complications, currently represents 8.3% of the adult population. We hypothesized that a critical transition state prior to the new onset T2DM can be revealed through the longitudinal electronic medical record (EMR) analysis.We applied the transition-based network entropy methodology which previously identified a dynamic driver network (DDN) underlying the critical T2DM transition at the tissue molecular biological level. To profile pre-disease phenotypical changes that indicated a critical transition state, a cohort of 7,334 patients was assembled from the Maine State Health Information Exchange (HIE). These patients all had their first confirmative diagnosis of T2DM between January 1, 2013 and June 30, 2013. The cohort's EMRs from the 24 months preceding their date of first T2DM diagnosis were extracted.Analysis of these patients' pre-disease clinical history identified a dynamic driver network (DDN) and an associated critical transition state six months prior to their first confirmative T2DM state.This 6-month window before the disease state provides an early warning of the impending T2DM, warranting an opportunity to apply proactive interventions to prevent or delay the new onset of T2DM.
View details for DOI 10.1371/journal.pone.0180937
View details for PubMedID 28686739
View details for PubMedCentralID PMC5501620
Unique Molecular Patterns Uncovered in Kawasaki Disease Patients with Elevated Serum Gamma Glutamyl Transferase Levels: Implications for Intravenous Immunoglobulin Responsiveness
2016; 11 (12)
Resistance to intravenous immunoglobulin (IVIG) occurs in 10-20% of patients with Kawasaki disease (KD). The risk of resistance is about two-fold higher in patients with elevated gamma glutamyl transferase (GGT) levels. We sought to understand the biological mechanisms underlying IVIG resistance in patients with elevated GGT levels.We explored the association between elevated GGT levels and IVIG-resistance with a cohort of 686 KD patients (Cohort I). Gene expression data from 130 children with acute KD (Cohort II) were analyzed using the R square statistic and false discovery analysis to identify genes that were differentially represented in patients with elevated GGT levels with regard to IVIG responsiveness. Two additional KD cohorts (Cohort III and IV) were used to test the hypothesis that sialylation and GGT may be involved in IVIG resistance through neutrophil apoptosis.Thirty-six genes were identified that significantly explained the variations of both GGT levels and IVIG responsiveness in KD patients. After Bonferroni correction, significant associations with IVIG resistance persisted for 12 out of 36 genes among patients with elevated GGT levels and none among patients with normal GGT levels. With the discovery of ST6GALNAC3, a sialyltransferase, as the most differentially expressed gene, we hypothesized that sialylation and GGT are involved in IVIG resistance through neutrophil apoptosis. We then confirmed that in Cohort III and IV there was significantly less reduction in neutrophil count in IVIG non-responders.Gene expression analyses combining molecular and clinical datasets support the hypotheses that: (1) neutrophil apoptosis induced by IVIG may be a mechanism of action of IVIG in KD; (2) changes in sialylation and GGT level in KD patients may contribute synergistically to IVIG resistance through blocking IVIG-induced neutrophil apoptosis. These findings have implications for understanding the mechanism of action in IVIG resistance, and possibly for development of novel therapeutics.
View details for DOI 10.1371/journal.pone.0167434
View details for Web of Science ID 000392853100008
View details for PubMedID 28002448
View details for PubMedCentralID PMC5176264
Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing-Based Algorithm With Statewide Electronic Medical Records.
JMIR medical informatics
2016; 4 (4)
Diabetes case finding based on structured medical records does not fully identify diabetic patients whose medical histories related to diabetes are available in the form of free text. Manual chart reviews have been used but involve high labor costs and long latency.This study developed and tested a Web-based diabetes case finding algorithm using both structured and unstructured electronic medical records (EMRs).This study was based on the health information exchange (HIE) EMR database that covers almost all health facilities in the state of Maine, United States. Using narrative clinical notes, a Web-based natural language processing (NLP) case finding algorithm was retrospectively (July 1, 2012, to June 30, 2013) developed with a random subset of HIE-associated facilities, which was then blind tested with the remaining facilities. The NLP-based algorithm was subsequently integrated into the HIE database and validated prospectively (July 1, 2013, to June 30, 2014).Of the 935,891 patients in the prospective cohort, 64,168 diabetes cases were identified using diagnosis codes alone. Our NLP-based case finding algorithm prospectively found an additional 5756 uncodified cases (5756/64,168, 8.97% increase) with a positive predictive value of .90. Of the 21,720 diabetic patients identified by both methods, 6616 patients (6616/21,720, 30.46%) were identified by the NLP-based algorithm before a diabetes diagnosis was noted in the structured EMR (mean time difference = 48 days).The online NLP algorithm was effective in identifying uncodified diabetes cases in real time, leading to a significant improvement in diabetes case finding. The successful integration of the NLP-based case finding algorithm into the Maine HIE database indicates a strong potential for application of this novel method to achieve a more complete ascertainment of diagnoses of diabetes mellitus.
View details for PubMedID 27836816
View details for PubMedCentralID PMC5124114
A Classification Tool for Differentiation of Kawasaki Disease from Other Febrile Illnesses.
journal of pediatrics
2016; 176: 114-120 e8
To develop and validate a novel decision tree-based clinical algorithm to differentiate Kawasaki disease (KD) from other pediatric febrile illnesses that share common clinical characteristics.Using clinical and laboratory data from 801 subjects with acute KD (533 for development, and 268 for validation) and 479 febrile control subjects (318 for development, and 161 for validation), we developed a stepwise KD diagnostic algorithm combining our previously developed linear discriminant analysis (LDA)-based model with a newly developed tree-based algorithm.The primary model (LDA) stratified the 1280 subjects into febrile controls (n = 276), indeterminate (n = 247), and KD (n = 757) subgroups. The subsequent model (decision trees) further classified the indeterminate group into febrile controls (n = 103) and KD (n = 58) subgroups, leaving only 29 of 801 KD (3.6%) and 57 of 479 febrile control (11.9%) subjects indeterminate. The 2-step algorithm had a sensitivity of 96.0% and a specificity of 78.5%, and correctly classified all subjects with KD who later developed coronary artery aneurysms.The addition of a decision tree step increased sensitivity and specificity in the classification of subject with KD and febrile controls over our previously described LDA model. A multicenter trial is needed to prospectively determine its utility as a point of care diagnostic test for KD.
View details for DOI 10.1016/j.jpeds.2016.05.060
View details for PubMedID 27344221
View details for PubMedCentralID PMC5003696
Prehypertension During Normotensive Pregnancy and Postpartum Clustering of Cardiometabolic Risk Factors: A Prospective Cohort Study.
2016; 68 (2): 455-463
The nonstratification of blood pressure (BP) levels may underestimate future cardiovascular risk in pregnant women who present with BP levels in the range of prehypertension (120-139/80-89 mm Hg). We prospectively evaluated the relationship between multiple antepartum BP measurements (from 11(+0) to 13(+6) weeks' gestation to term) and the occurrence of postpartum metabolic syndrome in 507 normotensive pregnant women after a live birth. By using latent class growth modeling, we identified the following 3 distinctive diastolic BP (DBP) trajectory groups: the low-J-shaped group (34.2%; DBP from 62.5±5.8 to 65.0±6.8 mm Hg), the moderate-U-shaped group (52.6%; DBP from 71.0±5.9 to 69.8±6.2 mm Hg), and the elevated-J-shaped group (13.2%; DBP from 76.2±6.7 to 81.8±4.8 mm Hg). Notably, the elevated-J-shaped trajectory group had mean DBP and systolic BP levels within the range of prehypertension from 37(+0) and 26(+0) weeks of pregnancy, respectively. Among the 309 women who completed the ≈1.6 years of postpartum follow-up, the women in the elevated-J-shaped group had greater odds of developing postpartum metabolic syndrome (adjusted odds ratio, 6.55; 95% confidence interval, 1.79-23.92; P=0.004) than the low-J-shaped group. Moreover, a parsimonious model incorporating DBP (membership in the elevated-J-shaped group but not in the DBP prehypertension group as identified by a single measurement) and elevated levels of fasting glucose (>4.99 mmol/L) and triglycerides (>3.14 mmol/L) at term was developed, with good discrimination and calibration for postpartum metabolic syndrome (c-statistic, 0.764; 95% confidence interval, 0.674-0.855; P<0.001). Therefore, prehypertension identified by DBP trajectories throughout pregnancy is an independent risk factor for predicting postpartum metabolic syndrome in normotensive pregnant women.
View details for DOI 10.1161/HYPERTENSIONAHA.116.07261
View details for PubMedID 27354425
A Novel Truncated Form of Serum Amyloid A in Kawasaki Disease
2016; 11 (6)
Kawasaki disease (KD) is an acute vasculitis in children that can cause coronary artery abnormalities. Its diagnosis is challenging, and many cytokines, chemokines, acute phase reactants, and growth factors have failed evaluation as specific biomarkers to distinguish KD from other febrile illnesses. We performed protein profiling, comparing plasma from children with KD with febrile control (FC) subjects to determine if there were specific proteins or peptides that could distinguish the two clinical states.Plasma from three independent cohorts from the blood of 68 KD and 61 FC subjects was fractionated by anion exchange chromatography, followed by surface-enhanced laser desorption ionization (SELDI) mass spectrometry of the fractions. The mass spectra of KD and FC plasma samples were analyzed for peaks that were statistically significantly different.A mass spectrometry peak with a mass of 7,860 Da had high intensity in acute KD subjects compared to subacute KD (p = 0.0003) and FC (p = 7.9 x 10-10) subjects. We identified this peak as a novel truncated form of serum amyloid A with N-terminal at Lys-34 of the circulating form and validated its identity using a hybrid mass spectrum immunoassay technique. The truncated form of serum amyloid A was present in plasma of KD subjects when blood was collected in tubes containing protease inhibitors. This peak disappeared when the patients were examined after their symptoms resolved. Intensities of this peptide did not correlate with KD-associated laboratory values or with other mass spectrum peaks from the plasma of these KD subjects.Using SELDI mass spectrometry, we have discovered a novel truncated form of serum amyloid A that is elevated in the plasma of KD when compared with FC subjects. Future studies will evaluate its relevance as a diagnostic biomarker and its potential role in the pathophysiology of KD.
View details for DOI 10.1371/journal.pone.0157024
View details for Web of Science ID 000377560200029
View details for PubMedID 27271757
View details for PubMedCentralID PMC4894573
Exploring the Role of Polycythemia in Patients With Cyanosis After Palliative Congenital Heart Surgery.
Pediatric critical care medicine
2016; 17 (3): 216-222
To understand the relationship between polycythemia and clinical outcome in patients with hypoplastic left heart syndrome following the Norwood operation.A retrospective, single-center cohort study.Pediatric cardiovascular ICU, university-affiliated children's hospital.Infants with hypoplastic left heart syndrome admitted to our medical center from September 2009 to December 2012 undergoing stage 1/Norwood operation.None.Baseline demographic and clinical information including first recorded postoperative hematocrit and subsequent mean, median, and nadir hematocrits during the first 72 hours postoperatively were recorded. The primary outcomes were in-hospital mortality and length of hospitalization. Thirty-two patients were included in the analysis. Patients did not differ by operative factors (cardiopulmonary bypass time and cross-clamp time) or traditional markers of severity of illness (vasoactive inotrope score, lactate, saturation, and PaO2/FIO2 ratio). Early polycythemia (hematocrit value > 49%) was associated with longer cardiovascular ICU stay (51.0 [± 38.6] vs 21.4 [± 16.2] d; p < 0.01) and total hospital length of stay (65.0 [± 46.5] vs 36.1 [± 20.0] d; p = 0.03). In a multivariable analysis, polycythemia remained independently associated with the length of hospitalization after controlling for the amount of RBC transfusion (weight, 4.36 [95% CI, 1.35-7.37]; p < 0.01). No difference in in-hospital mortality rates was detected between the two groups (17.6% vs 20%).Early polycythemia following the Norwood operation is associated with longer length of hospitalization even after controlling for blood cell transfusion practices. We hypothesize that polycythemia may be caused by hemoconcentration and used as an early marker of capillary leak syndrome.
View details for DOI 10.1097/PCC.0000000000000654
View details for PubMedID 26825044
Urinary Colorimetric Sensor Array and Algorithm to Distinguish Kawasaki Disease from Other Febrile Illnesses
2016; 11 (2)
Kawasaki disease (KD) is an acute pediatric vasculitis of infants and young children with unknown etiology and no specific laboratory-based test to identify. A specific molecular diagnostic test is urgently needed to support the clinical decision of proper medical intervention, preventing subsequent complications of coronary artery aneurysms. We used a simple and low-cost colorimetric sensor array to address the lack of a specific diagnostic test to differentiate KD from febrile control (FC) patients with similar rash/fever illnesses.Demographic and clinical data were prospectively collected for subjects with KD and FCs under standard protocol. After screening using a genetic algorithm, eleven compounds including metalloporphyrins, pH indicators, redox indicators and solvatochromic dye categories, were selected from our chromatic compound library (n = 190) to construct a colorimetric sensor array for diagnosing KD. Quantitative color difference analysis led to a decision-tree-based KD diagnostic algorithm.This KD sensing array allowed the identification of 94% of KD subjects (receiver operating characteristic [ROC] area under the curve [AUC] 0.981) in the training set (33 KD, 33 FC) and 94% of KD subjects (ROC AUC: 0.873) in the testing set (16 KD, 17 FC). Color difference maps reconstructed from the digital images of the sensing compounds demonstrated distinctive patterns differentiating KD from FC patients.The colorimetric sensor array, composed of common used chemical compounds, is an easily accessible, low-cost method to realize the discrimination of subjects with KD from other febrile illness.
View details for DOI 10.1371/journal.pone.0146733
View details for Web of Science ID 000370038400003
View details for PubMedID 26859297
View details for PubMedCentralID PMC4747548
Precision test for precision medicine: opportunities, challenges and perspectives regarding pre-eclampsia as an intervention window for future cardiovascular disease.
American journal of translational research
2016; 8 (5): 1920–34
Hypertensive disorders of pregnancy (HDP) comprise a spectrum of syndromes that range in severity from gestational hypertension and pre-eclamplsia (PE) to eclampsia, as well as chronic hypertension and chronic hypertension with superimposed PE. HDP occur in 2% to 10% of pregnant women worldwide, and impose a substantial burden on maternal and fetal/infant health. Cardiovascular disease (CVD) is the leading cause of death in women. The high prevalence of non-obstructive coronary artery disease and the lack of an efficient diagnostic workup make the identification of CVD in women challenging. Accumulating evidence suggests that a previous history of PE is consistently associated with future CVD risk. Moreover, PE as a maladaptation to pregnancy-induced hemodynamic and metabolic stress may also be regarded as a "precision" testing result that predicts future cardiovascular risk. Therefore, the development of PE provides a tremendous, early opportunity that may lead to changes in maternal and infant future well-being. However, the underlying pathogenesis of PE is not precise, which warrants precision medicine-based approaches to establish a more precise definition and reclassification. In this review, we proposed a stage-specific, PE-targeted algorithm, which may provide novel hypotheses that bridge the gap between Big Data-generating approaches and clinical translational research in terms of PE prediction and prevention, clinical treatment, and long-term CVD management.
View details for PubMedID 27347303
Prospective stratification of patients at risk for emergency department revisit: resource utilization and population management strategy implications.
BMC emergency medicine
2016; 16 (1): 10-?
Estimating patient risk of future emergency department (ED) revisits can guide the allocation of resources, e.g. local primary care and/or specialty, to better manage ED high utilization patient populations and thereby improve patient life qualities.We set to develop and validate a method to estimate patient ED revisit risk in the subsequent 6 months from an ED discharge date. An ensemble decision-tree-based model with Electronic Medical Record (EMR) encounter data from HealthInfoNet (HIN), Maine's Health Information Exchange (HIE), was developed and validated, assessing patient risk for a subsequent 6 month return ED visit based on the ED encounter-associated demographic and EMR clinical history data. A retrospective cohort of 293,461 ED encounters that occurred between January 1, 2012 and December 31, 2012, was assembled with the associated patients' 1-year clinical histories before the ED discharge date, for model training and calibration purposes. To validate, a prospective cohort of 193,886 ED encounters that occurred between January 1, 2013 and June 30, 2013 was constructed.Statistical learning that was utilized to construct the prediction model identified 152 variables that included the following data domains: demographics groups (12), different encounter history (104), care facilities (12), primary and secondary diagnoses (10), primary and secondary procedures (2), chronic disease condition (1), laboratory test results (2), and outpatient prescription medications (9). The c-statistics for the retrospective and prospective cohorts were 0.742 and 0.730 respectively. Total medical expense and ED utilization by risk score 6 months after the discharge were analyzed. Cluster analysis identified discrete subpopulations of high-risk patients with distinctive resource utilization patterns, suggesting the need for diversified care management strategies.Integration of our method into the HIN secure statewide data system in real time prospectively validated its performance. It promises to provide increased opportunity for high ED utilization identification, and optimized resource and population management.
View details for DOI 10.1186/s12873-016-0074-5
View details for PubMedID 26842066
View details for PubMedCentralID PMC4739399
NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records.
International journal of medical informatics
2015; 84 (12): 1039-1047
In order to proactively manage congestive heart failure (CHF) patients, an effective CHF case finding algorithm is required to process both structured and unstructured electronic medical records (EMR) to allow complementary and cost-efficient identification of CHF patients.We set to identify CHF cases from both EMR codified and natural language processing (NLP) found cases. Using narrative clinical notes from all Maine Health Information Exchange (HIE) patients, the NLP case finding algorithm was retrospectively (July 1, 2012-June 30, 2013) developed with a random subset of HIE associated facilities, and blind-tested with the remaining facilities. The NLP based method was integrated into a live HIE population exploration system and validated prospectively (July 1, 2013-June 30, 2014). Total of 18,295 codified CHF patients were included in Maine HIE. Among the 253,803 subjects without CHF codings, our case finding algorithm prospectively identified 2411 uncodified CHF cases. The positive predictive value (PPV) is 0.914, and 70.1% of these 2411 cases were found to be with CHF histories in the clinical notes.A CHF case finding algorithm was developed, tested and prospectively validated. The successful integration of the CHF case findings algorithm into the Maine HIE live system is expected to improve the Maine CHF care.
View details for DOI 10.1016/j.ijmedinf.2015.06.007
View details for PubMedID 26254876
Exploring Value in Congenital Heart Disease: An Evaluation of Inpatient Admissions.
Congenital heart disease
2015; 10 (6): E278-87
Understanding value provides an important context for improvement. However, most health care models fail to measure value. Our objective was to categorize inpatient encounters within an academic congenital heart program based on clinical outcome and the cost to achieve the outcome (value). We aimed to describe clinical and nonclinical features associated with value.We defined hospital encounters based on outcome per resource utilized. We performed principal component and cluster analysis to classify encounters based on mortality, length of stay, hospital cost and revenue into six classes. We used nearest shrunken centroid to identify discriminant features associated with the cluster-derived classes. These features underwent hierarchical clustering and multivariate analysis to identify features associated with each class.We analyzed all patients admitted to an academic congenital heart program between September 1, 2009, and December 31, 2012.A total of 2658 encounters occurred during the study period. Six classes were categorized by value. Low-performing value classes were associated with greater institutional reward; however, encounters with higher-performing value were associated with a loss in profitability. Encounters that included insertion of a pediatric ventricular assist device (log OR 2.5 [95% CI, 1.78 to 3.43]) and acquisition of a hospital-acquired infection (log OR 1.42 [95% CI, 0.99 to 1.87]) were risk factors for inferior health care value.Among the patients in our study, institutional reward was not associated with value. We describe a framework to target quality improvement and resource management efforts that can benefit patients, institutions, and payers alike.
View details for DOI 10.1111/chd.12290
View details for PubMedID 26219731
Novel data-mining approach identifies biomarkers for diagnosis of Kawasaki disease
2015; 78 (5): 547-553
As Kawasaki disease (KD) shares many clinical features with other more common febrile illnesses and misdiagnosis, leading to a delay in treatment, increases the risk of coronary artery damage, a diagnostic test for KD is urgently needed. We sought to develop a panel of biomarkers that could distinguish between acute KD patients and febrile controls (FC) with sufficient accuracy to be clinically useful.Plasma samples were collected from three independent cohorts of FC and acute KD patients who met the American Heart Association definition for KD and presented within the first 10 d of fever. The levels of 88 biomarkers associated with inflammation were assessed by Luminex bead technology. Unsupervised clustering followed by supervised clustering using a Random Forest model was used to find a panel of candidate biomarkers.A panel of biomarkers commonly available in the hospital laboratory (absolute neutrophil count, erythrocyte sedimentation rate, alanine aminotransferase, γ-glutamyl transferase, concentrations of α-1-antitrypsin, C-reactive protein, and fibrinogen, and platelet count) accurately diagnosed 81-96% of KD patients in a series of three independent cohorts.After prospective validation, this eight-biomarker panel may improve the recognition of KD.
View details for DOI 10.1038/pr.2015.137
View details for Web of Science ID 000363601700011
View details for PubMedID 26237629
- Serological Targeted Analysis of an ITIH4 Peptide Isoform: A Preterm Birth Biomarker and Its Associated SNP Implications JOURNAL OF GENETICS AND GENOMICS 2015; 42 (9): 507-510
Online Prediction of Health Care Utilization in the Next Six Months Based on Electronic Health Record Information: A Cohort and Validation Study
JOURNAL OF MEDICAL INTERNET RESEARCH
2015; 17 (9)
The increasing rate of health care expenditures in the United States has placed a significant burden on the nation's economy. Predicting future health care utilization of patients can provide useful information to better understand and manage overall health care deliveries and clinical resource allocation.This study developed an electronic medical record (EMR)-based online risk model predictive of resource utilization for patients in Maine in the next 6 months across all payers, all diseases, and all demographic groups.In the HealthInfoNet, Maine's health information exchange (HIE), a retrospective cohort of 1,273,114 patients was constructed with the preceding 12-month EMR. Each patient's next 6-month (between January 1, 2013 and June 30, 2013) health care resource utilization was retrospectively scored ranging from 0 to 100 and a decision tree-based predictive model was developed. Our model was later integrated in the Maine HIE population exploration system to allow a prospective validation analysis of 1,358,153 patients by forecasting their next 6-month risk of resource utilization between July 1, 2013 and December 31, 2013.Prospectively predicted risks, on either an individual level or a population (per 1000 patients) level, were consistent with the next 6-month resource utilization distributions and the clinical patterns at the population level. Results demonstrated the strong correlation between its care resource utilization and our risk scores, supporting the effectiveness of our model. With the online population risk monitoring enterprise dashboards, the effectiveness of the predictive algorithm has been validated by clinicians and caregivers in the State of Maine.The model and associated online applications were designed for tracking the evolving nature of total population risk, in a longitudinal manner, for health care resource utilization. It will enable more effective care management strategies driving improved patient outcomes.
View details for DOI 10.2196/jmir.4976
View details for Web of Science ID 000361809800005
View details for PubMedID 26395541
Cerebrospinal fluid protein dynamic driver network: At the crossroads of brain tumorigenesis
2015; 83: 36-43
To get a better understanding of the ongoing in situ environmental changes preceding the brain tumorigenesis, we assessed cerebrospinal fluid (CSF) proteome profile changes in a glioma rat model in which brain tumor invariably developed after a single in utero exposure to the neurocarcinogen ethylnitrosourea (ENU). Computationally, the CSF proteome profile dynamics during the tumorigenesis can be modeled as non-smooth or even abrupt state changes. Such brain tumor environment transition analysis, correlating the CSF composition changes with the development of early cellular hyperplasia, can reveal the pathogenesis process at network level during a time before the image detection of the tumors. In our controlled rat model study, matched ENU- and saline-exposed rats' CSF proteomics changes were quantified at approximately 30, 60, 90, 120, 150days of age (P30, P60, P90, P120, P150). We applied our transition-based network entropy (TNE) method to compute the CSF proteome changes in the ENU rat model and test the hypothesis of the critical transition state prior to impending hyperplasia. Our analysis identified a dynamic driver network (DDN) of CSF proteins related with the emerging tumorigenesis progressing from the non-hyperplasia state. The DDN associated leading network CSF proteins can allow the early detection of such dynamics before the catastrophic shift to the clear clinical landmarks in gliomas. Future characterization of the critical transition state (P60) during the brain tumor progression may reveal the underlying pathophysiology to device novel therapeutics preventing tumor formation. More detailed method and information are accessible through our website at http://translationalmedicine.stanford.edu.
View details for DOI 10.1016/j.ymeth.2015.05.004
View details for Web of Science ID 000358755100005
Utility of Clinical Biomarkers to Predict Central Line-associated Bloodstream Infections After Congenital Heart Surgery.
Pediatric infectious disease journal
2015; 34 (3): 251-254
Central line associated bloodstream infections is an important contributor of morbidity and mortality in children recovering from congenital heart surgery. The reliability of commonly used biomarkers to differentiate these patients have not been specifically studied.This was a retrospective cohort study in a university-affiliated children's hospital examining all patients with congenital or acquired heart disease admitted to the cardiovascular intensive care unit following cardiac surgery who underwent evaluation for a catheter-associated bloodstream infection.Among 1260 cardiac surgeries performed, 451 encounters underwent an infection evaluation post-operatively. Twenty-five instances of CLABSI and 227 instances of a negative infection evaluation were the subject of analysis. Patients with CLABSI tended to be younger (1.34 vs 4.56 years, p = 0.011) and underwent more complex surgery (RACHS-1 score 3.79 vs 3.04, p = 0.039). The two groups were indistinguishable in WBC, PMNs and band count at the time of their presentation. On multivariate analysis, CLABSI was associated with fever (adjusted OR 4.78; 95% CI, 1.6 to 5.8) and elevated CRP (adjusted OR 1.28; 95% CI, 1.09 to 1.68) after adjusting for differences between the two groups. Receiver operating characteristic analysis demonstrated the discriminatory power of both fever and CRP (area under curve 0.7247, 95% CI, 0.42 to 0.74 and 0.58, 95% CI 0.4208 to 0.7408). We calculated multilevel likelihood ratios for a spectrum of temperature and CRP values.We found commonly used serum biomarkers such as fever and CRP not to be helpful discriminators in patients following congenital heart surgery.
View details for DOI 10.1097/INF.0000000000000553
View details for PubMedID 25232780
Real-time web-based assessment of total population risk of future emergency department utilization: statewide prospective active case finding study.
Interactive journal of medical research
2015; 4 (1)
An easily accessible real-time Web-based utility to assess patient risks of future emergency department (ED) visits can help the health care provider guide the allocation of resources to better manage higher-risk patient populations and thereby reduce unnecessary use of EDs.Our main objective was to develop a Health Information Exchange-based, next 6-month ED risk surveillance system in the state of Maine.Data on electronic medical record (EMR) encounters integrated by HealthInfoNet (HIN), Maine's Health Information Exchange, were used to develop the Web-based surveillance system for a population ED future 6-month risk prediction. To model, a retrospective cohort of 829,641 patients with comprehensive clinical histories from January 1 to December 31, 2012 was used for training and then tested with a prospective cohort of 875,979 patients from July 1, 2012, to June 30, 2013.The multivariate statistical analysis identified 101 variables predictive of future defined 6-month risk of ED visit: 4 age groups, history of 8 different encounter types, history of 17 primary and 8 secondary diagnoses, 8 specific chronic diseases, 28 laboratory test results, history of 3 radiographic tests, and history of 25 outpatient prescription medications. The c-statistics for the retrospective and prospective cohorts were 0.739 and 0.732 respectively. Integration of our method into the HIN secure statewide data system in real time prospectively validated its performance. Cluster analysis in both the retrospective and prospective analyses revealed discrete subpopulations of high-risk patients, grouped around multiple "anchoring" demographics and chronic conditions. With the Web-based population risk-monitoring enterprise dashboards, the effectiveness of the active case finding algorithm has been validated by clinicians and caregivers in Maine.The active case finding model and associated real-time Web-based app were designed to track the evolving nature of total population risk, in a longitudinal manner, for ED visits across all payers, all diseases, and all age groups. Therefore, providers can implement targeted care management strategies to the patient subgroups with similar patterns of clinical histories, driving the delivery of more efficient and effective health care interventions. To the best of our knowledge, this prospectively validated EMR-based, Web-based tool is the first one to allow real-time total population risk assessment for statewide ED visits.
View details for DOI 10.2196/ijmr.4022
View details for PubMedID 25586600
Pilot Application of Magnetic Nanoparticle-Based Biosensor for Necrotizing Enterocolitis.
Journal of proteomics & bioinformatics
Necrotizing Enterocolitis (NEC) is a major source of neonatal morbidity and mortality. There is an ongoing need for a sensitive diagnostic instrument to discriminate NEC from neonatal sepsis. We hypothesized that magnetic nanopartile-based biosensor analysis of gut injury-associated biomarkers would provide such an instrument.We designed a magnetic multiplexed biosensor platform, allowing the parallel plasma analysis of C-reactive protein (CRP), matrix metalloproteinase-7 (MMp7), and epithelial cell adhesion molecule (EpCAM). Neonatal subjects with sepsis (n=5) or NEC (n=10) were compared to control (n=5) subjects to perform a proof of concept pilot study for the diagnosis of NEC using our ultra-sensitive biosensor platform.Our multiplexed NEC magnetic nanoparticle-based biosensor platform was robust, ultrasensitive (Limit of detection LOD: CRP 0.6 pg/ml; MMp7 20 pg/ml; and EpCAM 20 pg/ml), and displayed no cross-reactivity among analyte reporting regents. To gauge the diagnostic performance, bootstrapping procedure (500 runs) was applied: MMp7 and EpCAM collectively differentiated infants with NEC from control infants with ROC AUC of 0.96, and infants with NEC from those with sepsis with ROC AUC of 1.00. The 3-marker panel comprising of EpCAM, MMp7 and CRP had a corresponding ROC AUC of 0.956 and 0.975, respectively.The exploration of the multiplexed nano-biosensor platform shows promise to deliver an ultrasensitive instrument for the diagnosis of NEC in the clinical setting.
View details for PubMedID 26798207
- Virtual Pharmacist: A Platform for Pharmacogenomics. PloS one 2015; 10 (10)
Development, Validation and Deployment of a Real Time 30 Day Hospital Readmission Risk Assessment Tool in the Maine Healthcare Information Exchange.
2015; 10 (10)
Identifying patients at risk of a 30-day readmission can help providers design interventions, and provide targeted care to improve clinical effectiveness. This study developed a risk model to predict a 30-day inpatient hospital readmission for patients in Maine, across all payers, all diseases and all demographic groups.Our objective was to develop a model to determine the risk for inpatient hospital readmission within 30 days post discharge. All patients within the Maine Health Information Exchange (HIE) system were included. The model was retrospectively developed on inpatient encounters between January 1, 2012 to December 31, 2012 from 24 randomly chosen hospitals, and then prospectively validated on inpatient encounters from January 1, 2013 to December 31, 2013 using all HIE patients.A risk assessment tool partitioned the entire HIE population into subgroups that corresponded to probability of hospital readmission as determined by a corresponding positive predictive value (PPV). An overall model c-statistic of 0.72 was achieved. The total 30-day readmission rates in low (score of 0-30), intermediate (score of 30-70) and high (score of 70-100) risk groupings were 8.67%, 24.10% and 74.10%, respectively. A time to event analysis revealed the higher risk groups readmitted to a hospital earlier than the lower risk groups. Six high-risk patient subgroup patterns were revealed through unsupervised clustering. Our model was successfully integrated into the statewide HIE to identify patient readmission risk upon admission and daily during hospitalization or for 30 days subsequently, providing daily risk score updates.The risk model was validated as an effective tool for predicting 30-day readmissions for patients across all payer, disease and demographic groups within the Maine HIE. Exposing the key clinical, demographic and utilization profiles driving each patient's risk of readmission score may be useful to providers in developing individualized post discharge care plans.
View details for DOI 10.1371/journal.pone.0140271
View details for PubMedID 26448562
A novel urine peptide biomarker-based algorithm for the prognosis of necrotising enterocolitis in human infants.
2014; 63 (8): 1284-1292
Necrotising enterocolitis (NEC) is a major source of neonatal morbidity and mortality. The management of infants with NEC is currently complicated by our inability to accurately identify those at risk for progression of disease prior to the development of irreversible intestinal necrosis. We hypothesised that integrated analysis of clinical parameters in combination with urine peptide biomarkers would lead to improved prognostic accuracy in the NEC population.Infants under suspicion of having NEC (n=550) were prospectively enrolled from a consortium consisting of eight university-based paediatric teaching hospitals. Twenty-seven clinical parameters were used to construct a multivariate predictor of NEC progression. Liquid chromatography/mass spectrometry was used to profile the urine peptidomes from a subset of this population (n=65) to discover novel biomarkers of NEC progression. An ensemble model for the prediction of disease progression was then created using clinical and biomarker data.The use of clinical parameters alone resulted in a receiver-operator characteristic curve with an area under the curve of 0.817 and left 40.1% of all patients in an 'indeterminate' risk group. Three validated urine peptide biomarkers (fibrinogen peptides: FGA1826, FGA1883 and FGA2659) produced a receiver-operator characteristic area under the curve of 0.856. The integration of clinical parameters with urine biomarkers in an ensemble model resulted in the correct prediction of NEC outcomes in all cases tested.Ensemble modelling combining clinical parameters with biomarker analysis dramatically improves our ability to identify the population at risk for developing progressive NEC.
View details for DOI 10.1136/gutjnl-2013-305130
View details for PubMedID 24048736
Investigation of maternal environmental exposures in association with self-reported preterm birth.
2014; 45: 1-7
Identification of maternal environmental factors influencing preterm birth risks is important to understand the reasons for the increase in prematurity since 1990. Here, we utilized a health survey, the US National Health and Nutrition Examination Survey (NHANES) to search for personal environmental factors associated with preterm birth. 201 urine and blood markers of environmental factors, such as allergens, pollutants, and nutrients were assayed in mothers (range of N: 49-724) who answered questions about any children born preterm (delivery <37 weeks). We screened each of the 201 factors for association with any child born preterm adjusting by age, race/ethnicity, education, and household income. We attempted to verify the top finding, urinary bisphenol A, in an independent study of pregnant women attending Lucile Packard Children's Hospital. We conclude that the association between maternal urinary levels of bisphenol A and preterm birth should be evaluated in a larger epidemiological investigation.
View details for DOI 10.1016/j.reprotox.2013.12.005
View details for PubMedID 24373932
Urine protein biomarkers for the diagnosis and prognosis of necrotizing enterocolitis in infants.
journal of pediatrics
2014; 164 (3): 607-12 e1 7
To test the hypothesis that an exploratory proteomics analysis of urine proteins with subsequent development of validated urine biomarker panels would produce molecular classifiers for both the diagnosis and prognosis of infants with necrotizing enterocolitis (NEC).Urine samples were collected from 119 premature infants (85 NEC, 17 sepsis, 17 control) at the time of initial clinical concern for disease. The urine from 59 infants was used for candidate biomarker discovery by liquid chromatography/mass spectrometry. The remaining 60 samples were subject to enzyme-linked immunosorbent assay for quantitative biomarker validation.A panel of 7 biomarkers (alpha-2-macroglobulin-like protein 1, cluster of differentiation protein 14, cystatin 3, fibrinogen alpha chain, pigment epithelium-derived factor, retinol binding protein 4, and vasolin) was identified by liquid chromatography/mass spectrometry and subsequently validated by enzyme-linked immunosorbent assay. These proteins were consistently found to be either up- or down-regulated depending on the presence, absence, or severity of disease. Biomarker panel validation resulted in a receiver-operator characteristic area under the curve of 98.2% for NEC vs sepsis and an area under the curve of 98.4% for medical NEC vs surgical NEC.We identified 7 urine proteins capable of providing highly accurate diagnostic and prognostic information for infants with suspected NEC. This work represents a novel approach to improving the efficiency with which we diagnose early NEC and identify those at risk for developing severe, or surgical, disease.
View details for DOI 10.1016/j.jpeds.2013.10.091
View details for PubMedID 24433829
Risk prediction of emergency department revisit 30 days post discharge: a prospective study.
2014; 9 (11)
Among patients who are discharged from the Emergency Department (ED), about 3% return within 30 days. Revisits can be related to the nature of the disease, medical errors, and/or inadequate diagnoses and treatment during their initial ED visit. Identification of high-risk patient population can help device new strategies for improved ED care with reduced ED utilization.A decision tree based model with discriminant Electronic Medical Record (EMR) features was developed and validated, estimating patient ED 30 day revisit risk. A retrospective cohort of 293,461 ED encounters from HealthInfoNet (HIN), Maine's Health Information Exchange (HIE), between January 1, 2012 and December 31, 2012, was assembled with the associated patients' demographic information and one-year clinical histories before the discharge date as the inputs. To validate, a prospective cohort of 193,886 encounters between January 1, 2013 and June 30, 2013 was constructed. The c-statistics for the retrospective and prospective predictions were 0.710 and 0.704 respectively. Clinical resource utilization, including ED use, was analyzed as a function of the ED risk score. Cluster analysis of high-risk patients identified discrete sub-populations with distinctive demographic, clinical and resource utilization patterns.Our ED 30-day revisit model was prospectively validated on the Maine State HIN secure statewide data system. Future integration of our ED predictive analytics into the ED care work flow may lead to increased opportunities for targeted care intervention to reduce ED resource burden and overall healthcare expense, and improve outcomes.
View details for DOI 10.1371/journal.pone.0112944
View details for PubMedID 25393305
A data-driven algorithm integrating clinical and laboratory features for the diagnosis and prognosis of necrotizing enterocolitis.
2014; 9 (2)
Necrotizing enterocolitis (NEC) is a major source of neonatal morbidity and mortality. Since there is no specific diagnostic test or risk of progression model available for NEC, the diagnosis and outcome prediction of NEC is made on clinical grounds. The objective in this study was to develop and validate new NEC scoring systems for automated staging and prognostic forecasting.A six-center consortium of university based pediatric teaching hospitals prospectively collected data on infants under suspicion of having NEC over a 7-year period. A database comprised of 520 infants was utilized to develop the NEC diagnostic and prognostic models by dividing the entire dataset into training and testing cohorts of demographically matched subjects. Developed on the training cohort and validated on the blind testing cohort, our multivariate analyses led to NEC scoring metrics integrating clinical data.MACHINE LEARNING USING CLINICAL AND LABORATORY RESULTS AT THE TIME OF CLINICAL PRESENTATION LED TO TWO NEC MODELS: (1) an automated diagnostic classification scheme; (2) a dynamic prognostic method for risk-stratifying patients into low, intermediate and high NEC scores to determine the risk for disease progression. We submit that dynamic risk stratification of infants with NEC will assist clinicians in determining the need for additional diagnostic testing and guide potential therapies in a dynamic manner.http://translationalmedicine.stanford.edu/cgi-bin/NEC/index.pl and smartphone application upon request.
View details for DOI 10.1371/journal.pone.0089860
View details for PubMedID 24587080
View details for PubMedCentralID PMC3938509
AKI in Hospitalized Children: Epidemiology and Clinical Associations in a National Cohort.
Clinical journal of the American Society of Nephrology
2013; 8 (10): 1661-1669
Although AKI is common among hospitalized children, comprehensive epidemiologic data are lacking. This study characterizes pediatric AKI across the United States and identifies AKI risk factors using high-content/high-throughput analytic techniques.For the cross-sectional analysis of the 2009 Kids Inpatient Database, AKI events were identified using International Classification of Diseases, Ninth Revision, Clinical Modification codes. Demographics, incident rates, and outcome data were analyzed and reported for the entire AKI cohort as well as AKI subsets. Statistical learning methods were applied to the highly imbalanced dataset to derive AKI-related risk factors.Of 2,644,263 children, 10,322 children developed AKI (3.9/1000 admissions). Although 19% of the AKI cohort was ≤1 month old, the highest incidence was seen in children 15-18 years old (6.6/1000 admissions); 49% of the AKI cohort was white, but AKI incidence was higher among African Americans (4.5 versus 3.8/1000 admissions). In-hospital mortality among patients with AKI was 15.3% but higher among children ≤1 month old (31.3% versus 10.1%, P<0.001) and children requiring critical care (32.8% versus 9.4%, P<0.001) or dialysis (27.1% versus 14.2%, P<0.001). Shock (odds ratio, 2.15; 95% confidence interval, 1.95 to 2.36), septicemia (odds ratio, 1.37; 95% confidence interval, 1.32 to 1.43), intubation/mechanical ventilation (odds ratio, 1.2; 95% confidence interval, 1.16 to 1.25), circulatory disease (odds ratio, 1.47; 95% confidence interval, 1.32 to 1.65), cardiac congenital anomalies (odds ratio, 1.2; 95% confidence interval, 1.13 to 1.23), and extracorporeal support (odds ratio, 2.58; 95% confidence interval, 2.04 to 3.26) were associated with AKI.AKI occurs in 3.9/1000 at-risk US pediatric hospitalizations. Mortality is highest among neonates and children requiring critical care or dialysis. Identified risk factors suggest that AKI occurs in association with systemic/multiorgan disease more commonly than primary renal disease.
View details for DOI 10.2215/CJN.00270113
View details for PubMedID 23833312
Integrating multiple 'omics' analyses identifies serological protein biomarkers for preeclampsia.
2013; 11: 236-?
Preeclampsia (PE) is a pregnancy-related vascular disorder which is the leading cause of maternal morbidity and mortality. We sought to identify novel serological protein markers to diagnose PE with a multi-'omics' based discovery approach.Seven previous placental expression studies were combined for a multiplex analysis, and in parallel, two-dimensional gel electrophoresis was performed to compare serum proteomes in PE and control subjects. The combined biomarker candidates were validated with available ELISA assays using gestational age-matched PE (n=32) and control (n=32) samples. With the validated biomarkers, a genetic algorithm was then used to construct and optimize biomarker panels in PE assessment.In addition to the previously identified biomarkers, the angiogenic and antiangiogenic factors (soluble fms-like tyrosine kinase (sFlt-1) and placental growth factor (PIGF)), we found 3 up-regulated and 6 down-regulated biomakers in PE sera. Two optimal biomarker panels were developed for early and late onset PE assessment, respectively.Both early and late onset PE diagnostic panels, constructed with our PE biomarkers, were superior over sFlt-1/PIGF ratio in PE discrimination. The functional significance of these PE biomarkers and their associated pathways were analyzed which may provide new insights into the pathogenesis of PE.
View details for DOI 10.1186/1741-7015-11-236
View details for PubMedID 24195779
View details for PubMedCentralID PMC4226208
Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.
BMC research notes
2013; 6: 109-?
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation.Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/.Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
View details for DOI 10.1186/1756-0500-6-109
View details for PubMedID 23522030
View details for PubMedCentralID PMC3621609
Point-of-Care Differentiation of Kawasaki Disease from Other Febrile Illnesses
JOURNAL OF PEDIATRICS
2013; 162 (1): 183-U219
To test whether statistical learning on clinical and laboratory test patterns would lead to an algorithm for Kawasaki disease (KD) diagnosis that could aid clinicians.Demographic, clinical, and laboratory data were prospectively collected for subjects with KD and febrile controls (FCs) using a standardized data collection form.Our multivariate models were trained with a cohort of 276 patients with KD and 243 FCs (who shared some features of KD) and validated with a cohort of 136 patients with KD and 121 FCs using either clinical data, laboratory test results, or their combination. Our KD scoring method stratified the subjects into subgroups with low (FC diagnosis, negative predictive value >95%), intermediate, and high (KD diagnosis, positive predictive value >95%) scores. Combining both clinical and laboratory test results, the algorithm diagnosed 81.2% of all training and 74.3% of all testing of patients with KD in the high score group and 67.5% of all training and 62.8% of all testing FCs in the low score group.Our KD scoring metric and the associated data system with online (http://translationalmedicine.stanford.edu/cgi-bin/KD/kd.pl) and smartphone applications are easily accessible, inexpensive tools to improve the differentiation of most children with KD from FCs with other pediatric illnesses.
View details for DOI 10.1016/j.jpeds.2012.06.012
View details for Web of Science ID 000312915900040
View details for PubMedID 22819274
Peptidomic Identification of Serum Peptides Diagnosing Preeclampsia.
2013; 8 (6): e65571
We sought to identify serological markers capable of diagnosing preeclampsia (PE). We performed serum peptide analysis (liquid chromatography mass spectrometry) of 62 unique samples from 31 PE patients and 31 healthy pregnant controls, with two-thirds used as a training set and the other third as a testing set. Differential serum peptide profiling identified 52 significant serum peptides, and a 19-peptide panel collectively discriminating PE in training sets (n = 21 PE, n = 21 control; specificity = 85.7% and sensitivity = 100%) and testing sets (n = 10 PE, n = 10 control; specificity = 80% and sensitivity = 100%). The panel peptides were derived from 6 different protein precursors: 13 from fibrinogen alpha (FGA), 1 from alpha-1-antitrypsin (A1AT), 1 from apolipoprotein L1 (APO-L1), 1 from inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4), 2 from kininogen-1 (KNG1), and 1 from thymosin beta-4 (TMSB4). We concluded that serum peptides can accurately discriminate active PE. Measurement of a 19-peptide panel could be performed quickly and in a quantitative mass spectrometric platform available in clinical laboratories. This serum peptide panel quantification could provide clinical utility in predicting PE or differential diagnosis of PE from confounding chronic hypertension.
View details for DOI 10.1371/journal.pone.0065571
View details for PubMedID 23840341
View details for PubMedCentralID PMC3686758
Correlation analyses of clinical and molecular findings identify candidate biological pathways in systemic juvenile idiopathic arthritis
Clinicians have long appreciated the distinct phenotype of systemic juvenile idiopathic arthritis (SJIA) compared to polyarticular juvenile idiopathic arthritis (POLY). We hypothesized that gene expression profiles of peripheral blood mononuclear cells (PBMC) from children with each disease would reveal distinct biological pathways when analyzed for significant associations with elevations in two markers of JIA activity, erythrocyte sedimentation rate (ESR) and number of affected joints (joint count, JC).PBMC RNA from SJIA and POLY patients was profiled by kinetic PCR to analyze expression of 181 genes, selected for relevance to immune response pathways. Pearson correlation and Student's t-test analyses were performed to identify transcripts significantly associated with clinical parameters (ESR and JC) in SJIA or POLY samples. These transcripts were used to find related biological pathways.Combining Pearson and t-test analyses, we found 91 ESR-related and 92 JC-related genes in SJIA. For POLY, 20 ESR-related and 0 JC-related genes were found. Using Ingenuity Systems Pathways Analysis, we identified SJIA ESR-related and JC-related pathways. The two sets of pathways are strongly correlated. In contrast, there is a weaker correlation between SJIA and POLY ESR-related pathways. Notably, distinct biological processes were found to correlate with JC in samples from the earlier systemic plus arthritic phase (SAF) of SJIA compared to samples from the later arthritis-predominant phase (AF). Within the SJIA SAF group, IL-10 expression was related to JC, whereas lack of IL-4 appeared to characterize the chronic arthritis (AF) subgroup.The strong correlation between pathways implicated in elevations of both ESR and JC in SJIA argues that the systemic and arthritic components of the disease are related mechanistically. Inflammatory pathways in SJIA are distinct from those in POLY course JIA, consistent with differences in clinically appreciated target organs. The limited number of ESR-related SJIA genes that also are associated with elevations of ESR in POLY implies that the SJIA associations are specific for SJIA, at least to some degree. The distinct pathways associated with arthritis in early and late SJIA raise the possibility that different immunobiology underlies arthritis over the course of SJIA.
View details for DOI 10.1186/1741-7015-10-125
View details for Web of Science ID 000312394300001
View details for PubMedID 23092393
View details for PubMedCentralID PMC3523070
Proteomic studies in breast cancer (Review).
2012; 3 (4): 735–43
Breast cancer is one of the most common types of invasive cancer in females worldwide. Despite major advances in early cancer detection and emerging therapeutic strategies, further improvement has to be achieved for precise diagnosis to reduce the chance of metastasis and relapses. Recent proteomic technologies have offered a promising opportunity for the identification of new breast cancer biomarkers. Matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI-TOF MS) and the derived surface-enhanced laser desorption/ionization mass spectrometry (SELDI-TOF MS) enable the development of high-throughput proteome analysis based on comprehensive reliable biomarkers. In this review, we examined proteomic technologies and their applications, and provided focus on the proteomics-based profiling analyses of tumor tissues/cells in order to identify and confirm novel biomarkers of breast cancer.
View details for DOI 10.3892/ol.2012.573
View details for PubMedID 22740985
View details for PubMedCentralID PMC3362396
A diagnostic algorithm combining clinical and molecular data distinguishes Kawasaki disease from other febrile illnesses
Kawasaki disease is an acute vasculitis of infants and young children that is recognized through a constellation of clinical signs that can mimic other benign conditions of childhood. The etiology remains unknown and there is no specific laboratory-based test to identify patients with Kawasaki disease. Treatment to prevent the complication of coronary artery aneurysms is most effective if administered early in the course of the illness. We sought to develop a diagnostic algorithm to help clinicians distinguish Kawasaki disease patients from febrile controls to allow timely initiation of treatment.Urine peptidome profiling and whole blood cell type-specific gene expression analyses were integrated with clinical multivariate analysis to improve differentiation of Kawasaki disease subjects from febrile controls.Comparative analyses of multidimensional protein identification using 23 pooled Kawasaki disease and 23 pooled febrile control urine peptide samples revealed 139 candidate markers, of which 13 were confirmed (area under the receiver operating characteristic curve (ROC AUC 0.919)) in an independent cohort of 30 Kawasaki disease and 30 febrile control urine peptidomes. Cell type-specific analysis of microarrays (csSAM) on 26 Kawasaki disease and 13 febrile control whole blood samples revealed a 32-lymphocyte-specific-gene panel (ROC AUC 0.969). The integration of the urine/blood based biomarker panels and a multivariate analysis of 7 clinical parameters (ROC AUC 0.803) effectively stratified 441 Kawasaki disease and 342 febrile control subjects to diagnose Kawasaki disease.A hybrid approach using a multi-step diagnostic algorithm integrating both clinical and molecular findings was successful in differentiating children with acute Kawasaki disease from febrile controls.
View details for DOI 10.1186/1741-7015-9-130
View details for Web of Science ID 000298862200001
View details for PubMedID 22145762
View details for PubMedCentralID PMC3251532
- Proteomics and Biomarkers in Neonatology NeoReviews 2011; 12: 585-91
Plasma profiles in active systemic juvenile idiopathic arthritis: Biomarkers and biological implications
2010; 10 (24): 4415-4430
Systemic juvenile idiopathic arthritis (SJIA) is a chronic arthritis of children characterized by a combination of arthritis and systemic inflammation. There is usually non-specific laboratory evidence of inflammation at diagnosis but no diagnostic test. Normalized volumes from 89/889 2-D protein spots representing 26 proteins revealed a plasma pattern that distinguishes SJIA flare from quiescence. Highly discriminating spots derived from 15 proteins constitute a robust SJIA flare signature and show specificity for SJIA flare in comparison to active polyarticular juvenile idiopathic arthritis or acute febrile illness. We used 7 available ELISA assays, including one to the complex of S100A8/S100A9, to measure levels of 8 of the15 proteins. Validating our DIGE results, this ELISA panel correctly classified independent SJIA flare samples, and distinguished them from acute febrile illness. Notably, data using the panel suggest its ability to improve on erythrocyte sedimentation rate or C-reactive protein or S100A8/S100A9, either alone or in combination in SJIA F/Q discriminations. Our results also support the panel's potential clinical utility as a predictor of incipient flare (within 9 wk) in SJIA subjects with clinically inactive disease. Pathway analyses of the 15 proteins in the SJIA flare versus quiescence signature corroborate growing evidence for a key role for IL-1 at disease flare.
View details for DOI 10.1002/pmic.201000298
View details for Web of Science ID 000285882200008
View details for PubMedID 21136595
View details for PubMedCentralID PMC3517169
Integrative Urinary Peptidomics in Renal Transplantation Identifies Biomarkers for Acute Rejection
JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY
2010; 21 (4): 646-653
Noninvasive methods to diagnose rejection of renal allografts are unavailable. Mass spectrometry followed by multiple-reaction monitoring provides a unique approach to identify disease-specific urine peptide biomarkers. Here, we performed urine peptidomic analysis of 70 unique samples from 50 renal transplant patients and 20 controls (n = 20), identifying a specific panel of 40 peptides for acute rejection (AR). Peptide sequencing revealed suggestive mechanisms of graft injury with roles for proteolytic degradation of uromodulin (UMOD) and several collagens, including COL1A2 and COL3A1. The 40-peptide panel discriminated AR in training (n = 46) and test (n = 24) sets (area under ROC curve >0.96). Integrative analysis of transcriptional signals from paired renal transplant biopsies, matched with the urine samples, revealed coordinated transcriptional changes for the corresponding genes in addition to dysregulation of extracellular matrix proteins in AR (MMP-7, SERPING1, and TIMP1). Quantitative PCR on an independent set of 34 transplant biopsies with and without AR validated coordinated changes in expression for the corresponding genes in rejection tissue. A six-gene biomarker panel (COL1A2, COL3A1, UMOD, MMP-7, SERPING1, TIMP1) classified AR with high specificity and sensitivity (area under ROC curve = 0.98). These data suggest that changes in collagen remodeling characterize AR and that detection of the corresponding proteolytic degradation products in urine provides a noninvasive diagnostic approach.
View details for DOI 10.1681/ASN.2009080876
View details for Web of Science ID 000276784800017
View details for PubMedID 20150539
- Urine peptidomics for clinical biomarker discovery. Advances in clinical chemistry. 2010: 51:181-213
Urine Peptidomic and Targeted Plasma Protein Analyses in the Diagnosis and Monitoring of Systemic Juvenile Idiopathic Arthritis.
2010; 6 (4): 175–93
PURPOSE: Systemic juvenile idiopathic arthritis is a chronic pediatric disease. The initial clinical presentation can mimic other pediatric inflammatory conditions, which often leads to significant delays in diagnosis and appropriate therapy. SJIA biomarker development is an unmet diagnostic/prognostic need to prevent disease complications. EXPERIMENTAL DESIGN: We profiled the urine peptidome to analyze a set of 102 urine samples, from patients with SJIA, Kawasaki disease (KD), febrile illnesses (FI), and healthy controls. A set of 91 plasma samples, from SJIA flare and quiescent patients, were profiled using a customized antibody array against 43 proteins known to be involved in inflammatory and protein catabolic processes. RESULTS: We identified a 17-urine-peptide biomarker panel that could effectively discriminate SJIA patients at active, quiescent, and remission disease states, and patients with active SJIA from confounding conditions including KD and FI. Targeted sequencing of these peptides revealed that they fall into several tight clusters from seven different proteins, suggesting disease-specific proteolytic activities. The antibody array plasma profiling identified an SJIA plasma flare signature consisting of tissue inhibitor of metalloproteinase-1 (TIMP1), interleukin (IL)-18, regulated upon activation, normal T cell expressed and secreted (RANTES), P-Selectin, MMP9, and L-Selectin. CONCLUSIONS AND CLINICAL RELEVANCE: The urine peptidomic and plasma protein analyses have the potential to improve SJIA care and suggest that SJIA urine peptide biomarkers may be an outcome of inflammation-driven effects on catabolic pathways operating at multiple sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12014-010-9058-8) contains supplementary material, which is available to authorized users.
View details for DOI 10.1007/s12014-010-9058-8
View details for PubMedID 21124648
View details for PubMedCentralID PMC2970804
Effects of moderate versus deep hypothermic circulatory arrest and selective cerebral perfusion on cerebrospinal fluid proteomic profiles in a piglet model of cardiopulmonary bypass
JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY
2009; 138 (6): 1290-1296
Our objective was to compare protein profiles of cerebrospinal fluid between control animals and those subjected to cardiopulmonary bypass after moderate versus deep hypothermic circulatory arrest with selective cerebral perfusion.Immature Yorkshire piglets were assigned to one of four study groups: (1) deep hypothermic circulatory arrest at 18 degrees C, (2) deep hypothermic circulatory arrest at 18 degrees C with selective cerebral perfusion, (3) moderate hypothermic circulatory arrest at 25 degrees C with selective cerebral perfusion, or (4) age-matched control animals without surgery. Animals undergoing cardiopulmonary bypass were cooled to their assigned group temperature and exposed to 1 hour of hypothermic circulatory arrest. After arrest, animals were rewarmed, weaned off bypass, and allowed to recover for 4 hours. Cerebrospinal fluid collected from surgical animals after the recovery period was compared with cerebrospinal fluid from controls by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Protein spectra were analyzed for differences between groups by Mann-Whitney U test and false discovery rate analysis.Baseline and postbypass physiologic parameters were similar in all surgical groups. A total of 194 protein peaks were detected. Compared with controls, groups 1, 2, and 3 had 64, 100, and 13 peaks that were significantly different, respectively (P < .05). Three of these peaks were present in all three groups. Cerebrospinal fluid protein profiles in animals undergoing cardiopulmonary bypass with moderate hypothermic circulatory arrest (group 3) were more similar to controls than either of the groups subjected to deep hypothermia.The mass spectra of cerebrospinal fluid proteins are altered in piglets exposed to cardiopulmonary bypass and hypothermic circulatory arrest. Moderate hypothermic circulatory arrest (25 degrees C) with selective cerebral perfusion compared with deep hypothermic circulatory arrest (18 degrees C) is associated with fewer changes in cerebrospinal fluid proteins, when compared with nonbypass controls.
View details for DOI 10.1016/j.jtcvs.2009.06.001
View details for Web of Science ID 000272029800004
View details for PubMedID 19660276
Plasma Biomarkers in a Mouse Model of Preterm Labor
2009; 66 (1): 11-16
Preterm labor (PTL) is frequently associated with inflammation. We hypothesized that biomarkers during pregnancy can identify pregnancies most at risk for development of PTL. An inflammation-induced mouse model of PTL was used. Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry was used to analyze and compare the plasma protein (PP) profile between CD-1 mice injected intrauterine with either lipopolysaccharide (LPS) or PBS on d 14.5 of gestation. The median differences of normalized PP peaks between the two groups were determined using the Mann-Whitney U test and the false discovery rate. In a second series of experiments, both groups of mice were injected with a lower dose of LPS. A total of 1665 peaks were detected. Thirty peaks were highly differentially expressed (p < 0.0001) between the groups. Two 11 kDa protein peaks were identified by MALDI-TOF/TOF-MS and confirmed to be mouse serum amyloid A (SAA) 1 and 2. Plasma SAA2 levels were increased in LPS-treated animals compared with controls and in LPS-treated animals that delivered preterm vs. those that delivered at term. SAA2 has the potential to be a plasma biomarker that can identify pregnancies at risk for development of PTL.
View details for Web of Science ID 000267249300004
View details for PubMedID 19287348
FDR made easy in differential feature discovery and correlation analyses
2009; 25 (11): 1461-1462
Rapid progress in technology, particularly in high-throughput biology, allows the analysis of thousands of genes or proteins simultaneously, where the multiple comparison problems occurs. Global false discovery rate (gFDR) analysis statistically controls this error, computing the ratio of the number of false positives over the total number of rejections. Local FDR (lFDR) method can associate the corrected significance measure with each hypothesis testing for its feature-by-feature interpretation. Given the large feature number and sample size in any genomics or proteomics analysis, FDR computation, albeit critical, is both beyond the regular biologists' specialty and computationally expensive, easily exceeding the capacity of desktop computers. To overcome this digital divide, a web portal has been developed that provides bench-side biologists easy access to the server-side computing capabilities to analyze for FDR, differential expressed genes or proteins, and for the correlation between molecular data and clinical measurements.(http://translationalmedicine.stanford.edu/Mass-Conductor/FDR.html).
View details for DOI 10.1093/bioinformatics/btp176
View details for Web of Science ID 000266109500030
View details for PubMedID 19376824
- Cancer Biomarker Discovery via Targeted Profiling of Multiclass Tumor Tissue-Derived Proteomes Clinical Proteomics. 2009; 5 (3-4): 163-9.
- Urinary peptidomic analysis identifies potential biomarkers for acute rejection of renal transplantation. Clinical Proteomics. 2009; 5: 103-13.
Optimizing protein recovery for urinary proteomics, a tool to monitor renal transplantation
2008; 22 (5): 617-623
Despite attractiveness of urine for biomarker discovery for systemic and renal diseases, the confounding effect of the high abundance plasma proteins in urine, and a lack of optimization of urine protein recovery methods are bottlenecks for urine proteomics. Three methods were performed and compared for percentage protein yield, yield consistency, ease and cost of analysis: (i) organic solvent precipitation, (ii) dialysis/lyophilization, and (iii) centrifugal filtration. Urine samples were subjected to an immunoaffinity column to deplete high abundance proteins. Difference gel electrophoresis was performed to assess use of depletion strategy for detection of low abundance proteins. Urine from healthy volunteers (n = 10) and kidney transplant recipients with proteinuria (n = 11) were used. Centrifugal filtration performed best for analysis ease and yield consistency. Highest percentage yield was obtained from dialysis/lyophilization but was laborious and residual salt interfered with subsequent gel electrophoresis. Organic solvent precipitation was inexpensive, but suffered from varying yield consistency. Increased spot intensity for some low abundance and previously undetected proteins were noted after depletion of high abundance proteins. In conclusion, we compare the pros and cons of different protein recovery methods and reveal an increase in the dynamic range of protein detection after depletional strategy that could be critical for biomarker discovery, particularly with reference to processing human study samples from clinical trials.
View details for DOI 10.1111/j.1399-0012.2008.00833.x
View details for Web of Science ID 000259341800014
View details for PubMedID 18459997
Novel urinary peptidomic analysis for acute rejection monitoring
8th American Transplant Congress
WILEY-BLACKWELL. 2008: 636–637
View details for Web of Science ID 000255763202548
High throughput screening informatics
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING
2008; 11 (3): 249-257
High throughput screening (HTS), an industrial effort to leverage developments in the areas of modern robotics, data analysis and control software, liquid handling devices, and sensitive detectors, has played a pivotal role in the drug discovery process, allowing researchers to efficiently screen millions of compounds to identify tractable small molecule modulators of a given biological process or disease state and advance them into high quality leads. As HTS throughput has significantly increased the volume, complexity, and information content of datasets, lead discovery research demands a clear corporate strategy for scientific computing and subsequent establishment of robust enterprise-wide (usually global) informatics platforms, which enable complicated HTS work flows, facilitate HTS data mining, and drive effective decision-making. The purpose of this review is, from the data analysis and handling perspective, to examine key elements in HTS operations and some essential data-related activities supporting or interfacing the screening process, and outline properties that various enabling software should have. Additionally, some general advice for corporate managers with system procurement responsibilities is offered.
View details for Web of Science ID 000254653500008
View details for PubMedID 18336217
Significance analysis and multiple pharmacophore models for differentiating P-glycoprotein substrates
JOURNAL OF CHEMICAL INFORMATION AND MODELING
2007; 47 (6): 2429-2438
P-glycoprotein (Pgp) mediated drug efflux affects the absorption, distribution, and clearance of a broad structural variety of drugs. Early assessment of the potential of compounds to interact with Pgp can aid in the selection and optimization of drug candidates. To differentiate nonsubstrates from substrates of Pgp, a robust predictive pharmacophore model was targeted in a supervised analysis of three-dimensional (3D) pharmacophores from 163 published compounds. A comprehensive set of pharmacophores has been generated from conformers of whole molecules of both substrates and nonsubstrates of P-glycoprotein. Four-point 3D pharmacophores were employed to increase the amount of shape information and resolution, including the ability to distinguish chirality. A novel algorithm of the pharmacophore-specific t-statistic was applied to the actual structure-activity data and 400 sets of artificial data (sampled by decorrelating the structure and Pgp efflux activity). The optimal size of the significant pharmacophore set was determined through this analysis. A simple classification tree using nine distinct pharmacophores was constructed to distinguish nonsubstrates from substrates of Pgp. An overall accuracy of 87.7% was achieved for the training set and 87.6% for the external independent test set. Furthermore, each of nine pharmacophores can be independently utilized as an accurate marker for potential Pgp substrates.
View details for DOI 10.1021/ci700284p
View details for Web of Science ID 000251216500041
View details for PubMedID 17956085
GO-Diff: Mining functional differentiation between EST-based transcriptomes
Large-scale sequencing efforts produced millions of Expressed Sequence Tags (ESTs) collectively representing differentiated biochemical and functional states. Analysis of these EST libraries reveals differential gene expressions, and therefore EST data sets constitute valuable resources for comparative transcriptomics. To translate differentially expressed genes into a better understanding of the underlying biological phenomena, existing microarray analysis approaches usually involve the integration of gene expression with Gene Ontology (GO) databases to derive comparable functional profiles. However, methods are not available yet to process EST-derived transcription maps to enable GO-based global functional profiling for comparative transcriptomics in a high throughput manner.Here we present GO-Diff, a GO-based functional profiling approach towards high throughput EST-based gene expression analysis and comparative transcriptomics. Utilizing holistic gene expression information, the software converts EST frequencies into EST Coverage Ratios of GO Terms. The ratios are then tested for statistical significances to uncover differentially represented GO terms between the compared transcriptomes, and functional differences are thus inferred. We demonstrated the validity and the utility of this software by identifying differentially represented GO terms in three application cases: intra-species comparison; meta-analysis to test a specific hypothesis; inter-species comparison. GO-Diff findings were consistent with previous knowledge and provided new clues for further discoveries. A comprehensive test on the GO-Diff results using series of comparisons between EST libraries of human and mouse tissues showed acceptable levels of consistency: 61% for human-human; 69% for mouse-mouse; 47% for human-mouse.GO-Diff is the first software integrating EST profiles with GO knowledge databases to mine functional differentiation between biological systems, e.g. tissues of the same species or the same tissue cross species. With rapid accumulation of EST resources in the public domain and expanding sequencing effort in individual laboratories, GO-Diff is useful as a screening tool before undertaking serious expression studies.
View details for DOI 10.1186/1471-2105-7-72
View details for Web of Science ID 000235825600001
View details for PubMedID 16480524
Multiclass cancer classification and biomarker discovery using GA-based algorithms
2005; 21 (11): 2691-2697
The development of microarray-based high-throughput gene profiling has led to the hope that this technology could provide an efficient and accurate means of diagnosing and classifying tumors, as well as predicting prognoses and effective treatments. However, the large amount of data generated by microarrays requires effective reduction of discriminant gene features into reliable sets of tumor biomarkers for such multiclass tumor discrimination. The availability of reliable sets of biomarkers, especially serum biomarkers, should have a major impact on our understanding and treatment of cancer.We have combined genetic algorithm (GA) and all paired (AP) support vector machine (SVM) methods for multiclass cancer categorization. Predictive features can be automatically determined through iterative GA/SVM, leading to very compact sets of non-redundant cancer-relevant genes with the best classification performance reported to date. Interestingly, these different classifier sets harbor only modest overlapping gene features but have similar levels of accuracy in leave-one-out cross-validations (LOOCV). Further characterization of these optimal tumor discriminant features, including the use of nearest shrunken centroids (NSC), analysis of annotations and literature text mining, reveals previously unappreciated tumor subclasses and a series of genes that could be used as cancer biomarkers. With this approach, we believe that microarray-based multiclass molecular analysis can be an effective tool for cancer biomarker discovery and subsequent molecular cancer diagnosis.
View details for DOI 10.1093/bioinformatics/bti419
View details for Web of Science ID 000229441500017
View details for PubMedID 15814557
- A machine to make a future - Biotech chronicles. J Clin Invest. 2005; 115: 2303-4.
- Genomic resources for cancer biology researchers. Oncogenomics Handbook Humana Press. . 2004
A comparative analysis of HGSC and Celera human genome assemblies and gene sets
2003; 19 (13): 1597-1605
Since the simultaneous publication of the human genome assembly by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics, several comparisons have been made of various aspects of these two assemblies. In this work, we set out to provide a more comprehensive comparative analysis of the two assemblies and their associated gene sets.The local sequence content for both draft genome assemblies has been similar since the early releases, however it took a year for the quality of the Celera assembly to approach that of HGSC, suggesting an advantage of HGSC's hierarchical shotgun (HS) sequencing strategy over Celera's whole genome shotgun (WGS) approach. While similar numbers of ab initio predicted genes can be derived from both assemblies, Celera's Otto approach consistently generated larger, more varied gene sets than the Ensembl gene build system. The presence of a non-overlapping gene set has persisted with successive data releases from both groups. Since most of the unique genes from either genome assembly could be mapped back to the other assembly, we conclude that the gene set discrepancies do not reflect differences in local sequence content but rather in the assemblies and especially the different gene-prediction methodologies.
View details for DOI 10.1093/bioinformatics/btg219
View details for Web of Science ID 000185310600001
View details for PubMedID 12967954
PRC17, a novel oncogene encoding a Rab GTPase-activating protein, is amplified in prostate cancer
2002; 62 (19): 5420-5424
We used cDNA-based genomic microarrays to examine DNA copy number changes in a panel of prostate tumors and found a previously undescribed amplicon on chromosome 17 containing a novel overexpressed gene that we termed prostate cancer gene 17 (PRC17). When overexpressed in 3T3 mouse fibroblast cells, PRC17 induced growth in low serum, loss of contact inhibition, and tumor formation in nude mice. The PRC17 gene product contains a GTPase-activating protein (GAP) catalytic core motif found in various Rab/Ypt GAPs, including RN-Tre. Similar to RN-Tre, we found that PRC17 protein interacts directly with Rab5 and stimulates its GTP hydrolysis. Point mutations that alter conserved amino acid residues within the PRC17 GAP domain abolished its transforming abilities, suggesting that GAP activity is essential for its oncogenic function. Whereas PRC17 is amplified in 15% of prostate cancers, it is highly overexpressed in approximately one-half of metastatic prostate tumors. The potent oncogenic activity of PRC17 is likely to influence the tumorigenic phenotype of these prostate cancers.
View details for Web of Science ID 000178378200008
View details for PubMedID 12359748
Comparative analysis of human genome assemblies reveals genome-level differences
2002; 80 (2): 138-139
Previous comparative analysis has revealed a significant disparity between the predicted gene sets produced by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics. To determine whether the source of this discrepancy was due to underlying differences in the genomic sequences or different gene prediction methodologies, we analyzed both genome assemblies in parallel. Using the GENSCAN gene prediction algorithm, we generated predicted transcriptomes that could be directly compared. BLAST-based comparisons revealed a 20-30% difference between the transcriptomes. Further differences between the two genomes were revealed with protein domain PFAM analyses. These results suggest that fundamental differences between the two genome assemblies are likely responsible for a significant portion of the discrepancy between the transcript sets predicted by the two groups.
View details for DOI 10.1006/geno.2002.6824
View details for Web of Science ID 000177393500005
View details for PubMedID 12160725
DQ 65-79, a peptide derived from HLA class II, induces I kappa B expression
JOURNAL OF IMMUNOLOGY
2002; 168 (7): 3323-3328
A synthetic peptide corresponding to residues 65-79 of the alpha helix of the alpha-chain of the class II HLA molecule DQA03011 (DQ 65-79) inhibits the proliferation of human T lymphocytes in an allele nonrestricted manner. By using microarray technology, we found that expression of 29 genes was increased or decreased in a human CTL cell line after treatment with DQ 65-79. This study focuses on one of these genes, IkappaB-alpha, whose expression is increased by DQ 65-79. IkappaB proteins, including IkappaB-alpha and IkappaB-beta, are increased in T cells treated with DQ 65-79. Nuclear translocation of the NF-kappaB subunits p65 and p50 is decreased in T cells after treatment with DQ 65-79, while elevated levels of p65 and p50 are present in cytosol. DQ 65-79 inhibits the degradation of IkappaB-alpha mRNA and inhibits the activity of IkappaB kinase. These findings indicate that the DQ 65-79 peptide increases the level of IkappaB proteins, thereby preventing nuclear translocation of the transcription factor, NF-kappaB, and inhibiting T cell proliferation.
View details for Web of Science ID 000174566400029
View details for PubMedID 11907089
DIAN: A novel algorithm for genome ontological classification
2001; 11 (10): 1766-1779
Faced with the determination of many completely sequenced genomes, computational biology is now faced with the challenge of interpreting the significance of these data sets. A multiplicity of data-related problems impedes this goal: Biological annotations associated with raw data are often not normalized, and the data themselves are often poorly interrelated and their interpretation unclear. All of these problems make interpretation of genomic databases increasingly difficult. With the current explosion of sequences now available from the human genome as well as from model organisms, the importance of sorting this vast amount of conceptually unstructured source data into a limited universe of genes, proteins, functions, structures, and pathways has become a bottleneck for the field. To address this problem, we have developed a method of interrelating data sources by applying a novel method of associating biological objects to ontologies. We have developed an intelligent knowledge-based algorithm, to support biological knowledge mapping, and, in particular, to facilitate the interpretation of genomic data. In this respect, the method makes it possible to inventory genomes by collapsing multiple types of annotations and normalizing them to various ontologies. By relying on a conceptual view of the genome, researchers can now easily navigate the human genome in a biologically intuitive, scientifically accurate manner.
View details for Web of Science ID 000171456000019
View details for PubMedID 11591654
An immunosuppressive and anti-inflammatory HLA class I-derived peptide binds vascular cell adhesion molecule-1
2000; 70 (4): 662-667
A synthetic peptide corresponding to residues 75-84 of HLA-B2702 modulates immune responses in rodents and humans both in vitro and in vivo.We used a yeast two-hybrid screening, an in vitro biochemical method, and an in vivo animal model.Two cellular receptors for this novel immunomodulatory peptide were identified using a yeast two-hybrid screen: immunoglobulin binding protein (BiP), a member of the heat shock protein 70 family, and vascular cell adhesion molecule (VCAM)-1. Identification of BiP as a ligand for this peptide confirms earlier biochemical findings, while the interaction with VCAM-1 suggests an alternative mechanism of action. Binding to the B2702 peptide but not to closely related variants was confirmed by ligand Western blot analysis and correlated with immunomodulatory activity of each peptide. In mice, an ovalbumin-induced allergic pulmonary response was blocked by in vivo administration of either the B2702 peptide or anti-VLA-4 antibody.We propose that the immunomodulatory effect of the B2702 peptide is caused, in part, by binding to VCAM-1, which then prevents the normal interaction of VCAM-1 with VLA-4.
View details for Web of Science ID 000088986100021
View details for PubMedID 10972226
Proliferating cell nuclear antigen as the cell cycle sensor for an HLA-derived peptide blocking T cell proliferation
JOURNAL OF IMMUNOLOGY
2000; 164 (12): 6188-6192
Synthetic peptides corresponding to structural regions of HLA molecules are novel immunosuppressive agents. A peptide corresponding to residues 65-79 of the alpha-chain of HLA-DQA03011 (DQ65-79) blocks cell cycle progression from early G1 to the G1 restriction point, which inhibits cyclin-dependent kinase-2 activity and phosphorylation of the retinoblastoma protein. A yeast two-hybrid screen identified proliferating cell nuclear Ag (PCNA) as a cellular ligand for this peptide, whose interaction with PCNA was further confirmed by in vitro biochemistry. Electron microscopy demonstrates that the DQ65-79 peptide enters the cell and colocalizes with PCNA in the T cell nucleus in vivo. Binding of the DQ65-79 peptide to PCNA did not block polymerase delta (pol delta)-dependent DNA replication in vitro. These findings support a key role for PCNA as a sensor of cell cycle progression and reveal an unanticipated function for conserved regions of HLA molecules.
View details for Web of Science ID 000087508500014
View details for PubMedID 10843669
All four core histone N-termini contain sequences required for the repression of basal transcription in yeast
1996; 15 (15): 3974-3985
Nucleosomes prevent the recognition of TATA promoter elements by the basal transcriptional machinery in the absence of induction. However, while Saccharomyces cerevisiae histones H3 and H4 contain N-terminal regions involved in the activation and repression of GAL1 and in the expression of heterochromatin-like regions, the sequences involved in repressing basal transcription have not yet been identified. Here, we describe the mapping of new N-terminal domains, in all four core histones (H2A, H2B, H3 and H4), required for the repression of basal, uninduced transcription. Basal transcription was monitored by the use of a GAL1 promoter-URA3 reporter construct whose uninduced activity can be detected through cellular sensitivity to the drug, 5-fluoroorotic acid. We have found for each histone that the N-terminal sequences repressing basal activity are in a short region adjacent to the structured alpha-helical core. Analysis of minichromosome DNA topology demonstrates that the basal domains are required for the proper folding of DNA around the chromosomal particle. Deletion of the basal domain at each histone significantly decreases plasmid superhelical density, which probably reflects a release of DNA from the constraints of the nucleosome into the linker region. This provides a means by which basal factors may recognize otherwise repressed regulatory elements.
View details for Web of Science ID A1996VC66700022
View details for PubMedID 8670902
Yeast histone H3 and H4 amino termini are important for nucleosome assembly in vivo and in vitro: Redundant and position-independent functions in assembly but not in gene regulation
GENES & DEVELOPMENT
1996; 10 (6): 686-699
The hydrophilic amino-terminal sequences of histones H3 and H4 extend from the highly structured nucleosome core. Here we examine the importance of the amino termini and their position in the nucleosome with regard to both nucleosome assembly and gene regulation. Despite previous conclusions based on nonphysiological nucleosome reconstitution experiments, we find that the histone amino termini are important for nucleosome assembly in vivo and in vitro. Deletion of both tails, a lethal event, alters micrococcal nuclease-generated nucleosomal ladders, plasmid superhelicity in whole cells, and nucleosome assembly in cell extracts. The H3 and H4 amino-terminal tails have redundant functions in this regard because the presence of either tail allows assembly and cellular viability. Moreover, the tails need not be attached to their native carboxy-terminal core. Their exchange re-establishes both cellular viability and nucleosome assembly. In contrast, the regulation of GAL1 and the silent mating loci by the H3 and H4 tails is highly disrupted by exchange of the histone amino termini.
View details for Web of Science ID A1996UB72900004
View details for PubMedID 8598296
HISTONE H3 AMINO-TERMINUS IS REQUIRED FOR TELOMERIC AND SILENT MATING LOCUS REPRESSION IN YEAST
1994; 369 (6477): 245-247
Heterochromatin is a cytologically visible form of condensed chromatin capable of repressing genes in eukaryotic cells. For the yeast Saccharomyces cerevisiae, despite the absence of observable heterochromatin, there is genetic and chromatin structure data which indicate that there are heterochromatin-like repressive structures. Genes experience position effects at the silent mating loci and the telomeres, resulting in a repressed state that is inherited in an epigenetic manner. The histone H4 amino terminus is required for repression at these loci. Additional studies have indicated that the histone H3 N terminus is not important for silent mating locus repression, but redundancy of repressive elements at the silent mating loci may be responsible for masking its role. Here we report that histone H3 is required for full repression at yeast telomeres and at partially disabled silent mating loci, and that the acetylatable lysine residues of H3 play an important role in silencing.
View details for Web of Science ID A1994NM06700060
View details for PubMedID 8183346