Background
* PhD in Math from Oxford, undergrad at Princeton, started as a professor in business school at 24
* Met David Botstein to integrate in genetics and mathematics
* Sydney Brenner on “omics” has created the idea that if you get a lot of data it will all work out. ‘low input high throughput no output’.
Introduction
- Map: will focus on lnRNAs, plus some general introduction)
- Guckman and Einreitz. (colleagues)
Human genome project
- genetic map, physcial map, sequence, gene list. incremental increase. freely available without restriction.
- no centromeres, telomeres etc.
Back to buisness as usual? — no, too many maps?
- Genetic maps
- physical map
- 3D folding maps
- changed the way we thought about biology — completeness matters
- e.g. can uniquely ID proteins on mass spec since you know all possilbe options.
what have we learned?
View from 2001
- 35-120K protein genes
- lots of transposons parasites junk
- few regulatory sequences
- non-coding RNA few examples
- all not true.
Conservation
- 50 vertebrate genomes.
- draft genome couldn’t see more 30,000 phone debate of maybe 40,000 (100,000 estimated by 3billion bases 30kb each).
- protein coding genes very clear conservation signature (codons).
- nucleotides ~6% conserved, 1.2% is protein coding.
- 29 mammals 3 million 10 bp conserved elements (4.7%) — occur in gene poor regions. Do contain a gene: a developmentally important gene.
- sequences conserved to placental mammals not to marsupials. Little protein evolution, more substantial non-coding regulation.
- genome shuffling of regulatory elements by transposons? (symbiotic not parasytic, help reuse the other elements) L1 LINE element, Kanga2 transposons.
Mapping interactions
- Mikklesen, Bernstein, Meissner, Guttman Rin, Lieberman-Aiden
- ‘3D structure of the human genome’
- Q (can we barcode sequence regions and do larger scale whole genome interaction?)
- Dekker 3C 2002.
- “Turns out the genome is Scottish” plaid (chr 14). — 2 compartments ‘open and closed chromatin’
- equilibrium globule — random fold, 3D separation related to linear seperation exponent -1.5.
- fractal globule exponent ~ -1 (-1.08)
Chromatin state
- methyl states
what have we learned?
- lincRNAs (with Guttman, Engreitz and Rinn)
- low level transcription in lots of places (everywhere? – just noise?)
- not worth evolution’s trouble to stop it? [maybe some genes in some tissues need to be better silenced]
- K4me3, K36me3 regions not currently genes. Transcripts? Conserved? promoters? spice junctions? potential protein coding?
- no codon conservation. Does preserve nucleotides, more patchy.
- Ribosome profiling ‘confusion’. Ribosomes occupy non-coding RNAs. this upset people. look more like protein coding occupancy rather than 3′ UTR occupancy. Look more like 5′ UTRs though — also on ribosomes (scanning?). Presence of ribosome not indicative of making a protein.
- 3′ UTR drop indicative of ribosome release maybe a better a score of protein coding.
- who are lincRNAs coexpressed with?
- 200 mESC cells. knockdown with shRNA. 90% have effect on gene expression, ~26 needed for plurpotence maintenance, ~30 needed to repress differnet lineages.
- bind different chromatin proteins, invovled in gene regulation. Proposal: Organize proteins into complexes.
where are lincRNAs bound in the genome?
- model system Xist. (student Engreitz). 120mer antisense proves against RNA, paint entire message (120mer can wash more than shorter). Oligos have bio tags. 70% of pulldown is X chromosome.
- Xist binds very broadly, not in focal sites. Some variation though — correlates with K27me3.
- escape genes have lower coverage (like immediately upstream of Xist).
- how does it spread?
- Do 0 – 6 hr sequencing, watch Xist spread from it’s transcription focus.
- peaked spreading
- is it jumping or just spreading on the 3D sequence vs. the 1D sequence?
- Yes – these peaks seem to be close in space.
- Xist takes longer to coat active genes than inactive genes.
- if you mutate it’s ability to silence, it never spreads to the active gene regions.
- Model Pc then gets recruited and packs the gene into a biological blackhole (illustrated with compaction cartoons0.
lincRNA functions
- Modular scaffold gives patchy conservation (only conserve interacting regions).
- catalysis (rRNA)
- template mediated catalysis
the road ahead
- complete catalogs of interactions?
- Grammar of regulatory interactions — tell by looking at them what they do.
- exploit synthetic biology to build thousands of regulator reporters
Questions
- lincRNA vs lnRNA – stress the “in” ‘intergenic” to separate those that overlap promoters of known genes
- xist silences autosomes if sequence is embedded
- Bill Gelbert Q: oskar like (lnRNA and protein)?