

Featured Studies
Featured Study for Pharma Client
Introduction
-
~1 million datapoints per sample analyzed
-
No pre-filtering or feature selection
-
~180 features identified separating:
-
pre-treatment vs post-response patients
-
Key Observations
-
Signals mapped to disease-relevant biology
-
Analysis performed blinded to indication and mechanism
-
Results not attributable to random variation
Implication
Full-data analysis can identify meaningful biological signals that conventional approaches miss.
Summary
Relevant biological signals can be extracted from the full dataset—without prior assumptions.
Featured Publication
Title: Differential Serum Peptidomics Reveal Multi-Marker Models That Predict Breast Cancer Progression
Summary
-
Serum samples from early-stage (Stage I) and late-stage (Stage III) breast cancer patients were analyzed using mass spectrometry
-
10,000 molecular features per sample evaluated in initial discovery
-
65 statistically significant biomarker candidates identified
-
Multi-marker models constructed and validated on independent blinded samples
-
Best-performing models achieved:
-
AUC ~0.80–0.84
-
High specificity (up to ~88%)
-
-
Parallel machine learning analysis evaluated hundreds of thousands to millions of datapoints per sample, identifying high-information signals across the full dataset
Key Observations
-
Biological signal exists across many features—not single biomarkers
Multi-marker models outperformed individual markers in predicting disease stage -
Low-abundance serum peptides contain clinically relevant information
These signals are typically masked or lost in conventional workflows -
Full-data machine learning approaches significantly expand detectable signal
Analysis across hundreds of thousands to millions of datapoints improved classification performance -
Blinded validation confirmed predictive capability
Models retained performance when applied to independent test samples
Implication
Clinically relevant patient differences are distributed across many molecular signals—and are best captured when the full dataset is analyzed rather than reduced to a small subset.
