
Casting a Wider Net
Feature first mass spec is an experimental design where the instrument is used to generate expression data on the maximum number of molecular species. The goal is to maximize the amount of data to be analyzed. Larger data scale results in the identification of more individual datapoints with meaningful biological connections.
Feature-first mass spec sacrifices molecular identity information for larger data scale and dataset reproducibility. It repeatedly analyzes the maximum number of species and their abundance, sometimes termed the MS1 analysis. MS1 data quality is increased and more individual measurements can be made as complex samples are passed through the mass spec.
In contrast to feature-first mass spec, data scale is sacrificed for detailed molecular identity information in the reported data. A single MS1 analysis is subjected to one of several algorithms, by which mass spec software picks a small subset of the detected molecules for secondary analysis. In this secondary, MS2 analysis, the individual species are fragmented and measured to reconstruct the molecular identity of the original molecule.
While MS2 data provides molecular identity, the selection of molecules for MS2 fragmentation can reduce data scale by orders of magnitude. Besides a reduction in data scale, the selection of molecules for MS2 analysis is not reproducible; the same sample can be analyzed twice and yield abundance information for different species each time. The result is significant gaps in the resulting dataset that make data analysis and cross-sample comparison difficult.
In contrast, feature-first mass spec considers entire mass spec datasets. Once biological connections of species are established, MS2 analysis can be targeted to molecules of definite biological interest and reveal their identity.
The limitation of feature-first mass spec has been availability of tools that work at very large data scales. High resolution mass spec instruments are capable of reporting abundance information for a million molecular species or more.
The obstacle to feature-first mass spec is not one of instrumentation: it is a data science problem. Comparing expression patterns runs afoul of the n problem: computers must consider as many values as the square of the number of data points. A million mass spec data points means considering a trillion values.
The n problem is exacerbated when the comparison must be made across large numbers of samples. If expression differences are to be assessed across 1,000 samples, the number of computed values increases accordingly. For 1 million data points per sample, a quadrillion values must be computed. This data scale exceeds the computational power of any supercomputer. Feature-first mass spec suffers from a data overload problem.
The result is that feature-first mass spec is attractive in theory, but difficult to realize in practice. Feature-first mas spec studies are often limited in scope, considering difference detected in relatively low numbers of sample. Even high impact publications in top journals seldom involve more than a few hundred sample numbers.
Magellan Bioanalytics’ software is purpose-built to handle feature-first mass spec datasets that contain a million or more individual data points and allows comparison of data profiles across thousands of samples or more. With a user-friendly interface, results from large and complex experiments can be extracted and visualized in hours. These tools make feature-first proteomics accessible to any life sciences research team and dramatically enhance the productivity of research groups dedicated to mass spec techniques.
2
2