William & Mary

One of a thousand

Big study

Big study:  Scientific studies with hundreds, and even thousands, of participants are becoming more and more common. One advantage to the trend is that it makes more opportunities for undergraduates to make real contributions to research.

A professor (and his class) contribute to a 1,000-author evolutionary study

You have to look pretty closely to find Matthew Wawersik’s name on this paper. The list of authors and their affiliations goes on for most of four pages.

But there he is, near the bottom of the first page, one of more than 1,000 researchers who made significant contributions to “Drosophila Muller F elements maintain a distinct set of genomic properties over 40 million years of evolution,” published in the journal G3: Genes, Genomes, Genetics.

Wawersik, an associate professor in William & Mary’s Department of Biology, says that the actual number of contributors to the project might be many as four times greater.

Thousands of people working together on a single project? How can this happen?

These mega-author collaborations are becoming more common in science. A group of physicists associated with CERN recently broke the record for multi-author papers with an effort that bore the names of more than 5,000 authors.

Such extensive collaborations have come about for a number of interlocking reasons. First, many of the scientific questions of today are beyond the scope of a single individual or even a small group of scientists. Secondly, the amount of data generated by complex investigations such genomics studies or high-energy physics experiments absolutely require a large number of scientists for processing. Finally, the data can be digitized and shared remotely.

Much genetic research today is conducted digitally—in silico—but Matt Wawersik also runs a wet lab where working with live Drosophila is the order of the day.

The Internet and the World Wide Web were both created so that scientists can share data. The ability to bring many minds to bear on a single project is one of the advantages brought about by the new age of in silico research, Wawersik explained. In silico means that the experimenters work with data on computers. The term is an Internet-age extension of the concepts of research conducted in vitro (under glass, such as in a petri dish) and in vivo (in the body.) In silico methods are especially suited for genomics research, such as the G3 paper, which looks at a particularly persistent chromosome that’s present in certain species of fruit flies.

A look at the genetic structure of a chromosome

The paper is a comparative examination, looking at the genetic structure of the fourth chromosome in four different species of fruit flies. Wawersik noted that the Muller F chromosome differs from the other three chromosomes in the Drosophila genome by being heterochromatic, or highly condensed.

“It’s called a dot chromosome, and it contains about 80 genes,” he said. That’s a small fraction of Drosophila’s total of 15,000 genes. Wawersik added that one interesting aspect of the Muller F is that despite its “non-standard” status, the 80 genes inside the dot chromosome stay very much on the job, producing proteins — gene expression, in other words.

“Many of these genes have what is called ‘housekeeping function,’ doing things that keep the fly alive,” he said.

The other interesting thing about the Muller F is its persistence. The dot chromosome has remained virtually intact in the genomes of a number of Drosophila species over 40 million years, Wawersik said.

“This paper is basically the work of many, many people delving into these genomes to see what they look like,” he explained. “You could do this with a sample of one — one genome. But when you’re talking about the Muller F, something that’s highly conserved evolutionarily over 40 million years, looking at multiple species allows you to examine common themes that control gene regulation, as well as uncover subtle differences amongst species.”

The “many, many people” Wawersik mentioned included a large number of undergraduates, including members of his Genomics and Functional Proteomics class from the spring semester of 2009.

BIOL 404’s enrollment is around 10 or 12 students each time it’s offered and Wawersik said each class does work that contributes to a larger study, such as the Muller F paper. All of these large studies are organized and coordinated at the Genomics Education Group, based at Washington University in St. Louis.

“It’s really Sarah Elgin who runs this project. She is a force of nature,” he said.

The Wash U.-based multi-institutional collaboration has received significant endorsement in terms of funding from the Howard Hughes Medical Institute (HHMI) Precollege and Undergraduate Science Education Professors Program, the National Institutes of Health and the National Science Foundation.

Bridging classroom instruction and research

Wawersik’s BIOL 404 course is one of many classes at William & Mary that serve as a bridge between classroom instruction and research participation. Similar classes exist at other schools, of course, and Wawersik and other Genomics Education Group participants have collaborated on papers that emphasize the educational benefits of the projects.

Such an in silico collaboration is especially attractive for community colleges or schools with small bio departments that don’t have wet-lab facilities to maintain a Drosophila program: “All you need is a student with a laptop,” he said.

“Not everyone has the lab I have here at William & Mary. I have the luxury of being able to bring my students into the lab and do directed research projects here under my supervision,” Wawersik explained. “This gives institutions that don’t have the facilities the opportunity to offer their students a real research project.”

Each participating lab takes on a portion of the study. For the Muller F project, Wawersik’s students in Genomics and Functional Proteomics worked on annotating sections of genomes of three species of fruit flies: Drosophila erecta, D. mojavensis and D. grimshawi.

“Pairs of students get 40,000 to 60,000 base pairs of genomic DNA sequence. They look into it and see if there are any genes there,” Wawersik explained. “There are gene-predictor programs. They are a starting point, but they’re only so good — they’re a computer algorithm, right?”

The gene-predictor programs flag possible gene sites in the strings of the base pairs, he said. The student teams examine each flagged area and compare it to the corresponding section of the well-mapped genome of Drosophila melanogaster to see if it matches.

“And if it does,” he said, “that’s evidence that it may be a gene.”

From in silico research to good, old-fashioned fly pushing

The annotation continues as the student teams work to find the beginning and the end of the gene, as well as the location in gene where it expresses protein. Such work is done in silico, using data provided by the central project at Wash U. But as Wawersik has the luxury of a wet lab, his students also push flies.

“Fly pushing” is a term that embraces all the activities that are part of conducting experiments with living Drosophila specimens. Fly pushing is a collection of valuable and necessary lab skills for students headed toward careers in research. Wawersik stresses that fly pushing is not about flies; it’s about understanding genetics. Drosophila are ideal lab animals for genetic study. Fruit flies have short life spans, which means that even in a one-semester class, a student can study multiple generations. They have enormous chromosomes in certain tissues, easy to see under a lens. They are small, easy to care for and they breed like, well, flies.

But most importantly, Wawersik says, Drosophila’s genome is remarkable similar to that of a human. By base-pair count, the fruit fly DNA is around 1/20 the size of ours, but about 75 percent of a human’s genes have a homologue in Drosophila. The fruit fly, he points out, has told us much about human genetics and the mechanics of evolution, and Drosophila has a lot more to tell.

When many thousands of people contribute to a study, the task of determining who among them deserves to be listed as authors on the paper is, as they say in the lab, a non-trivial task. Wawersik and his colleagues in the Genomics Education Partnership have developed a set of guidelines for the selection process. First, he draws up a list of potential authors, based on their contributions to the project.

“Then, everyone on the list of people who was going to be an author had to review the draft version of the article and submit comments,” he explained. The Muller F paper had another William & Mary author, Elizabeth McMurray Kiernan, who worked with Wawersik as a graduate student.

But there could have been more William & Mary authors. Wawersik said he put two undergraduates from the 2009 Genomics and Functional Proteomics class on his potential-authors list, along with Kiernan. He did his best to track the students down, to offer them the opportunity to read the paper and submit comments, in order to be listed among the paper’s authors.

“I tried really hard,” he said. He had a bit of trouble finding Kiernan, as she had since married and changed her surname.

“But I just couldn’t find them.”