Microarray data stands up to scrutiny

The power and promise of microarrays are vast. Offering the ability to run tens of thousands of experiments in parallel on a small glass slide, gene chips have transformed functional genomics, whether through expression analysis or genotyping, from fantasy to reality with dizzying speed. And yet, longstanding doubts persist as to the worth of the torrent of data that microarrays produce. Experiments have proved difficult to reproduce, and the lists of genes found in similar studies often have only limited overlap. Myriad protocols and platforms, and the proliferation of homemade arrays, means that there is no one standardized set of procedures to follow when designing or replicating an experiment. Some analyses have found cross-platform reproducibility to be poor.

But, microarray technology isn’t as flawed as it was feared to be. At least, those are the findings of a trio of papers published in the May issue of Nature Methods, which together represent the most systematic effort yet to assess the reliability and reproducibility of microarray data. The authors made large-scale efforts to compare data across commercial and in-house platforms, to tease out the “platform effect” from the “lab effect,” and to discover how much biological signal really exists in the massive amounts of discordant data that microarrays often yield. “Three independent studies came up with complementary conclusions, saying microarrays are pretty close to being ready for prime time,” says John Quackenbush, who is a professor of biostatistics at Harvard’s Dana-Farber Cancer Institute, a member of the advisory board of the Microarray Gene Expression Data Society (MGED), and an author on two of the papers.

The recent studies confirmed three persistent criticisms: That the bewildering array of platforms and research protocols available can make results from different studies hard to compare; that, in the hands of less-experiened labs, homemade arrays are less dependable than commercial chips; and that different labs doing the same study can often get very different results. But the studies also had some surprisingly good news for the microarray community. First, they found that the various platforms, both commercial and homemade, could all deliver good results in experienced hands. They found also that standardized research protocols go a long way toward increasing reproducibility. And perhaps most importantly, they determined that the vast bulk of microarray data is driven by underlying biology, even in cases where platforms differ over a particular gene’s expression. Despite efforts in the three papers and other studies to crown a victor, no one microarray platform or set of experimental protocols has yet emerged as the best. Still, there are lessons to be learned.

If you’re just getting into the microarray research game, and you don’t have access to a core facility, consider using a commercial chip. The experimental protocols will be clearer and more standardized. Labs can achieve good results with in-house spotted arrays, but the recent studies indicate that success with homemade arrays depends largely on expertise and experience. And price is no longer a deal-breaker: Costs have come down in recent years to the point where paying more for a commercial chip might appear competitive with the cost of equipping a lab to produce spotted arrays.

“If you’re good at tinkering and making something work and you have a lot of time to do that, you might want to go with one of these homegrown platforms. You’ll save money and maybe get better results. But if you just want to stick something in some hole and press a button, you’re probably better off with an industrial product,” says Rafael Irizarry, an author on one of the Nature Methods papers and a biostatistician at the Johns Hopkins Bloomberg School of Public Health.

Don’t expect the magnitudes of individual gene expressions to be comparable across platforms; they’re usually not. You can make more valid comparisons by looking at relative, rather than absolute, gene expression. For a more biologically meaningful and statistically robust approach to data analysis, look at the level of the biological process (i.e., the Gene Ontology (GO) annotation) rather than the gene. “Our paper showed that the biological representations of the genes are more important than the genes themselves. Even if on a gene-by-gene basis we found inconsistency, when we looked at the processes themselves, those conserved processes are there,” says Weis.

Looking at the process level allows researchers to make sense of inherent variation, says Aviv Regev, a research fellow at Harvard’s Bauer Center for Genomics Research and coauthor of a recent Nature Genetics paper that proposed a process-level method of looking at microarray data. “When you analyze things at the level of gene sets, there is noise at the level of individual sets. This might not only be a result of methodological problems with your microarrays, they might be inherent biological variability,” she says.

One of the major barriers to the reproducibility of microarray data is a basic one: Given the plethora of platforms and protocols, it’s often hard to precisely reconstruct the experimental methods used in a study from the published paper. Several years ago, MGED developed its standard for data reporting, called Minimal Information about a Microarray Experiment (MIAME), to cut down on the widespread methodological confusion. The guidelines require researchers to go above and beyond what would ordinarily be required in a methods section, including detailed descriptions of the protocols used in RNA extraction, labeling, and hybridization; the data normalization algorithms used in preprocessing; and the design of the arrays themselves. MGED has also called for researchers to submit their raw data to public databases such as the Gene Expression Omnibus and ArrayExpress.

Although many specialized journals have yet to adopt MIAME, the standard is increasingly becoming the law of the land. Nature, Cell, and The Lancet adopted MIAME compliance requirements for their authors in 2002, and many other journals have followed suit. Several commercial array manufacturers and software developers have written protocols and software to help their users follow MIAME guidelines. (See, for instance, [] for a list) “Most of the requirements are difficult to comply with if you don’t have the tools,” says Regev.

But while standards for data reporting are becoming well established, the same is not true for data collection, says Chris Stoeckert, associate professor at the University of Pennsylvania’s Center for Bioinformatics and a member of MGED’s advisory board. As they seek to develop best practices for experimentation, MGED will be looking at studies like the ones recently published in Nature Methods to shed light on what remains a murky issue. “We all want to be able to look at a microarray experiment and assess whether it was done correctly or not. To a large degree we are unable to do that, because there are no standards yet as to how to assess it,” Stoeckert says. “We need standards on the informatics side – how do you evaluate the data that’s there – and also standards in terms of how to experimentally control for quality.”

The intense spotlight currently being cast on microarray research may shed light on other research as well. All researchers can learn a few lessons on the importance of the lab effect, says Irizarry. “It’s been known forever, not just in microarrays, that there is a lab effect. There’s a paper I cite where the speed of light is plotted from different labs across time, and the difference between labs is statistically significant,” he says. “I think microarrays are getting a bum rap because of one of their strengths, which is that they produce a lot of data. I think if other technology produced this much data, we would start seeing similar problems.” Irizarry sees the beginnings of hopeful trends toward standardization in the research community. For instance, many universities are moving towards using core facilities rather than processing microarrays in individual labs. “In academia, you are seeing a movement towards standardization,” he says. “Given what I’ve seen in terms of the lab effect, this is a move in the right direction.”

The Scientist
August 16, 2005

Original web page at The Scientist