For the last several years, the world of noncoding regulatory RNAs has been firmly in the hand of small RNAs. In virtually every eukaryotic model organism microRNAs have been shown to regulate post-transcriptional gene silencing, and in yeast small interfering (si)RNAs also have been shown to play a prominent role in transcriptional gene silencing.

Motivated by this finding, Ramin Shiekhattar from The Wistar Institute in Philadelphia and his team set out about seven years ago to find small RNAs in mammalian systems that regulate transcription. Shiekhattar sums up several years of work: “The more we looked, the less evidence we found of either small RNAs or the machinery that may mediate their actions [in transcriptional gene silencing].” Prompted by some examples of longer RNAs that are involved in imprinting and X-chromosome inactivation, they decided to look for long noncoding RNAs with a role in transcriptional regulation.

Their first question was which noncoding RNA to include in the study. A collaboration with Roderic Guigo from the Center for Genomic Regulation in Spain helped the team set the selection criteria. Guigo was one of the investigators in the Encyclopedia of DNA Elements (ENCODE) consortium, working on GENCODE, a project aimed to annotate all gene features in the human genome, including protein-coding and noncoding loci.

The GENCODE annotation uses a combination of bioinformatic pipelines and the Havana project's manual curation, which is based on alignment with expressed sequence tag cDNAs and proteins.

Shiekhattar's team included long noncoding RNAs that did not overlap with protein-coding genes or their promoters, and they excluded all known noncoding RNAs. In 2007, the GENCODE annotation covered about one-third of the human genome, so their search left them with a list of 3,000 transcripts from roughly 2,000 loci. Today GENCODE annotations cover around 70% of the genome and contain about 7,000 long noncoding RNAs in roughly 6,000 loci; the annotation is expected to expand to about 12,000 noncoding transcripts by the time the GENCODE annotation is completed.

Partial screenshot of noncoding gene transcript RP11-400N13.3-001 on chromosome 1 in the Ensembl genome browser with GENCODE and Havana annotations.

With the list of 3,000 putative noncoding transcripts in hand, the scientists first verified that these transcripts were expressed in different tissues. Then the team began to tackle the question of function. “I expected that these noncoding RNAs would have some kind of silencing effect on the neighboring genes,” Shiekhattar recalls; “that was in line with what was already understood about the function of imprinting loci where noncoding RNAs repress their neighboring genes.”

His team used siRNAs to knock down selected noncoding RNAs. In seven of twelve cases they saw evidence contrary to what they expected. Reduction of the noncoding RNA also reduced expression of its coding neighbor, pointing to a role for the noncoding RNA as an enhancer rather than a repressor. One interesting example was a noncoding RNA adjacent to the snail family of transcription factors, which play a key role in the epithelial-mesenchymal transition. On a genome-wide level, the genes regulated by the noncoding RNA overlapped with those regulated by snai1, indicating that the noncoding RNA exerts its function through this snail transcription factor.

Exactly how it does this is still an open question, one that Shiekhattar and his team plan to tackle in the near future. Whether the noncoding RNA acts as a scaffold for other factors and what those factors are will be the subject of future studies.

In general, to be certain of the functionality of any noncoding transcript, Shiekhattar advocates adding more biological proof, for example, in the form of knockout mice, to show that these molecules are important in the context of the whole animal.

Though evidence for deregulation of any of the noncoding transcripts in disease remains to be found, Shiekhattar is convinced that it exists. A first step toward testing this would be to take a closer look at the genomic regions in which these transcripts are encoded to see whether they harbor chromosomal breakpoints or disease associated single-nucleotide polymorphisms or copy-number variations.

This work clearly shows that microRNAs, which cover only about 0.1% of the genome, will have to share the spotlight with longer noncoding transcripts that are expressed on the same order as protein-coding genes.