What does it mean to conduct research in this super-computing era – where a tremendous number of variables can be analyzed simultaneously and stored in a book-sized hard drive? With this accelerating surge of data generation, the limiting factor to today’s scientific progression is no longer the ability to measure but rather the ability to ask the right questions. How can we use big data in the formulation and testing of hypotheses? Moreover, should our future research be increasingly data driven rather than hypothesis driven?
Big data in molecular biology is the product of our desire to understand how complex systems crosstalk and integrate to mediate life itself. In its naïve beauty, generating big data captures a schematic of interconnecting relationships between the genome, transcriptome, proteome, and metabolome of a cell and integrates this information with incredible spatial and temporal resolution.
Analysis of cytome data allows us to make conclusions that expand our knowledge of complex cellular organization. For instance, correlation between genomic analysis and disease incidence may result in the discovery of single nucleotide polymorphisms (SNPs) associated with a certain disease; as such, we can predict disease susceptibility and response rate to certain drugs. However, these big data correlations do not unveil any direct conclusions about the biological mechanism of action; rather, they serve as a platform from which new targets can be identified and further explored by other means. Our reliance on these data sets and how we choose to interpret and analyze the results will dictate the value of producing so much information.
The advancement of science requires a systematic method in which conclusions are based on empirical evidence. Hypotheses that are tested in multiple experiments showing reproducible results become scientific theories that are either supported or refuted by future experiments. Such hypothesis-driven procedures allow scientists to demonstrate correlations in our natural world and draw unbiased conclusions about underlying biological and physical mechanisms. Data-driven research, on the other hand, despite being incredibly informative, can be misleading. Large data sets can identify rare correlations, but may also allow for statistically significant conclusions to be drawn from weak correlations that are not biologically relevant or significant; this is termed “p-hacking”. As we employ more of these techniques in our studies, a thorough understanding of big data sets and statistics is imperative for fair and objective interpretation of the results at hand.
In immunology, we often take our ability to measure multiple parameters for granted. Multi-colour flow cytometry panels, for example, are great for identifying cell population dynamics under experimental conditions, but we do not always consider the functional determinants of these “markers” in how cells behave. Each cluster of differentiation (CD) marker is independent in function ranging from cell adhesion to signal transduction, with protein expression subject to transient or long-term change based on the environmental and inflammatory context. Up-regulation of a certain protein is only scientifically meaningful if we understand the functional implication of it. Therefore, it is insufficient to simply distill large volumes of data into trend lines on a graph; supplementation of that information with additional phenotypic and functional explorations will provide a more complete story of the biological systems studied.
The future of science may be in the successful integration of hypothesis- and data-driven research. Big data allows us to cast a wide net, but simply relying on the “catch” may compromise the validity of our findings. The acquisition of massive amounts of information is a powerful tool, but it should complement – not replace – the existing techniques and procedures for scientific research. Big data has opened the floodgates to the amount of parameters and variables that can be explored. Technology brings exciting times. After all, the most scintillating aspect of research is making discoveries where they are least expected.