Friday 05/15/15

9:30 am – 11:55 pm

New Data analysis tools

  • generalized Hi-C plotter with Bogdan
  • now does triangles too
  • uploaded new HiC data. Looks much smoother, replicates with different enzymes look very similar.

HiC_normalization

newDataHigherRes

Chromatin paper

  • working on manuscript revisions
  • discussed plans and progress with LM and team (6pm-8:30pm)

To do still

  • comment on cell cycle and pairing in intro section (may need additional supplemental discussion)
  • add p-values to all comparisons in Fig 2, 3, and Extended Data Fig 8 (and 6 and 7), with N cells per comparison.
    • XZ recommends combining samples of same type for the fig 2/3/6/7 comparisons.
  • check entanglement scores (rerun?)
  • fix connectors
  • combined fasta file
Posted in Summaries | Comments Off on Friday 05/15/15

Thursday 05/14/15

9:00 am – 8:30 pm

morning 9 – 10:45am

  • chat with HC
  • email Geoff, congrats and follow up
  • email DB about potential collaboration
  • create 1 page pdf of recent publication info for K99 update

Chromatin paper methods revision

  • working on description of domain selection, 10:45 am – ?

AllDomainsIn_SuppTable1

EnrichmentAtflanksOfDomains

FactorEnrichment

NewDomainScreening

Posted in Summaries | Comments Off on Thursday 05/14/15

Wednesday 05/13/15

9:00 am – 10:15 pm

Geoff’s Defense (9:30 am – 11:00 am)

  • see post
  • short follow-up discussion with BB and GN

To Do

  • Travel reimbursements
  • write to XZ about LM travel schedule (done)

Chromatin Paper

  • Built new composite data structure (allData.matb) to avoid mixed data sets and crazy indexing.
    • some manual field alignment: print elements to Excel, ID missing fields, annotate back in by hand.
    • built new domainData2 structure which has the library number and positions (just positions was getting very confusing)
  • Completed assembly of Supplemental Figure on Intact Domain Histograms

Extended data Figure revisions (3pm – 5pm)

  • renumbering extended data figures in Main text, figures, and figure captions.
  • Writing new extended data figure captions.

New Extended Data Figure 8:

  • wrote generic bootstrapping function to compute confidence intervals.
  • set confidence intervals to 78% for errorbars, So probability two data points with confidence intervals that just overlap are not different ~.22^2 = 0.05.
  • alternatively could do the traditional SEM and the error bars would be smaller still.

Still to do

  • Write cover letter
  • Write description of domain selection
  • color connecting bars in Fig E6
  • print multiple versions of green/magenta/white contrast for Fig 2 and 3
  • combine probe fasta tables and update probe names with new conventions

MERFISH-lunch

  • Jeff will send script on new probe design for human genome
  • Try building a matlab class to handle library design?
Posted in Summaries | Comments Off on Wednesday 05/13/15

Geoff Fudenberg Thesis Defense

Thesis Defense in Biophysics 05/13/15

Intro

  • many different text book picutres of mitotic chromosome
    • middle scale: rosettes? spirals of spirals? Accordion coils?
  • many functional characterizations of the 1D genome sequence
  • Hi-C method explained.
    • had to develop appropriate normalization pipeline
    • now have maps for many species

Observations

  • human contact maps (2009) — 100 fold difference in lengths, see scaling exponent ~1 across this range.
    • conclusion chromosomes are polymers
    • two compartments.
    • year 1 of Geoff’s PhD
  • TADs

cycle dependent differences

  • metaphase more gentle slope (longer range contacts)
  • more homogeneous
  • (3 cell types examined)
  • locus specific organization lost in metaphase
  • cell type specific contacts (there certainly are some) are largely lost
    • not evident in the contact range map, but didn’t expect that.
  • distinguish between hierarchical models (e.g. coils of coils). and loop array models for metazoan chromosmes

Polymer models

  • basic polymer linked monomers
  • add model features to test.
    • arrays of loops
    • confinement
    • locus specific interactions
  • loops of loops model (hierarchical distinguishable fiber — contact probability vs. distance decays rapidly (more so than observed in metaphase)
  • can find scaffold maps that produce this, IF positions of loops are variable from cell-to-cell.
  • loops not formed by dimerization. Formed by linking consecutive marks
    • random looping model not consistent with HiC data
  • Cohesin complexes have the ability to form loops and extrude the DNA through the cohesin loop.
    • see citations.
    • Q: Comment more what is known about the mechanism of loop extrusion.
  • add multiAT hook protein to xenopus chromatin and this condenses into a long shape.

Interphase organization

  • compartments vs. domains
    • domains are linked to regulation (guide regulatory elements to genes)
    • domains inhibit contacts across domains
    • Hi resolution Hi-C loops/peaks-at-corners diversity in domain structure + complex domains / loops within loops
  • could loop extrusion and boundaries to loop extrusion give rise to interphase domain organization?
  • Model
    • factors can bind, exchange with solvent, pump DNA
    • location bound is stochastic (?) — would it be averaged out across cells. ??
  • model results
    • can give rise to complex domains,
    • can you walk us through the intuition / loop combinations that give rise to one of these complex domains?
    • does multiloop / loops-within loops require
  • boundary deletion and spreading (cite Nora 2012) – re-examine this.
  • loop extrusion more like to care about direction of sites (in-pointing CTCF cites).

Questions

  • boundaries block extrusion at certain points along chromsome.
  • allow nested loops. (loops stacked completely together).
  • Kleckner — what if loops tend to form by collision, in a way limited by persistence length of fiber.
  • extrusion is an interesting explanation for how a 1D boundary becomes a 3D boundary.
    • well, really all this gives you in nearest boundary element finding.
    • not sure this works. Internal interaction within a domain?
  • is extrusion
Posted in Seminars | Comments Off on Geoff Fudenberg Thesis Defense

Tuesday 05/12/15

Chromatin Paper

  • updated PlotGenes to fix issues with arrowheads. Much nicer now.
    • applied in Extended Data Fig 3. Others still need updating.
  • Working on Extended Data Fig 6: illustrating differences between active domains of similar size.

Paper To Do list

Text

  • For the abstract and mostly intro: don’t over-emphasize epigenetic structure. We study structure at the relevant length scales for gene regulation / genome functions. Epingenetic domains are one of the indicators that this length scale is important, and it turns out a strong predictor of structure.
  • With respect to Blue internal scaling, can’t say most models — invites question of other models. Move to discussion of models, not in data. Discuss the equilibrium globule model. Say no model fits all the data for Blue.
  • Need to discuss off-trend points in model section
  • be sure word “loop” does not appear in text
  • need to write Cover Letter.

Main Figures

  • add scalebars to Fig 1a
  • need multiple examples of different contrast /saturation options for Fig 2a and b
    • contrast / saturation for Repressed domains is a bit strong.
    • images are a little blury — plot with different Gaussian widths

Extended Data Figure

  • New Fig
    • histograms of volume and radius of gyration distributions for all domains.
    • similar histograms for FG model data?
  • ED2 ‘internal scaling of repressed regions’
    • reorder panels. BX-C vol, ANTC chip, ANTC rg, ANTC vol
    • All int data plots: solid lines for int-data, dashed for old data
  • Tracing fig
    • no backbone
    • 15 spots
    • green-green junctions linked by green lines. magenta by magenta lines. Black lines between interjunctions
    • also in data follow junction code.
  • New example stats
    • select regions similar length
    • show existing stats
    • also show cell-cell variation
Posted in Summaries | Comments Off on Tuesday 05/12/15

Monday 05/11/15

9:00 am – 9:50 pm

Chromatin Manuscript

  • revising figures and color balance
  • helping XZ images and slides

Other stuff

  • discussed projects with BB
  • discussed L12 progress with JM and HC
  • helping GN troubleshoot dax readers
Posted in Summaries | Comments Off on Monday 05/11/15

Protected: Biology of Genomes 2015: Sat Morning (non-open talks)

This content is password protected. To view it please enter your password below:

Posted in Conference Notes | Comments Off on Protected: Biology of Genomes 2015: Sat Morning (non-open talks)

Biology of Genomes 2015: Sat Morning (open talks)

Translational Genomics

Ben Hayes – Genomics for livestock breeding

  • ref population with known genotypes and phenotypes.
  • generate genomic breeding value equation (linear weighted sum of SNPs)
    • selection population from Marker genotypes (with SNPs). Chose which to use for breeding
  • SNPs don’t work across population or breeds
    • not dense enough
    • accuracy errodes rapidlyu
    • genome seq data instead of 50K SNP data?
  • 27 breeds, 1000 – 2K individual cows sequenced. 35 million SNPs, 2 million INDEL
  • BayesRC
    • allows different proportion of variants in each class
    • allows tissue specific differences in expression
    • ‘class’ e.g. synomous vs. non-synonomous variations
  • two traits selected
    • lactation genes
    • temperment (farmers don’t like being kicked)
  • experiment
    • 20,000 individuals with SNP ChIPs, from Holstein (B&W), Jeresey
    • apply results in a different breed not in the original data (Red cows)
    • BAESRC makes a much clearer distinguishment of SNP in PARE(?) lactase associated gene
    • .1% of genes explain 1% of variance. 95% of genes explain 0%
  • temperment
    • TMEM113D – linked in mouse and humans to anxiety and panic behaviors.
  • conclusions
    • sequence data + improved method -> more precise mapping
    • need greater information about classes (e.g. ENCODE)

Eli Rodger-Melnick: Open chromatin

  • intro to maize: Diploid, 2.3 Gb genome, 10 chromosomes, high diversity (human-to-chimp scale). rapid LD decay
  • light and heavy MNase Digest in whole root and whole shoot. Call MNase hyper-sensitive
  • gene Tb1 regulates growth of lateral branches.
    • regulated by two transposable elements 65 Kb upstream.
    • MNase hyper sensitive sites ID enhancers and promoters
  • compare 2 lines, with 50 SNPs, test algorithm for assigning explanatory variance
  • MNase HS sites are a small subset of the low methylation regions
  • over half of the HS sites in root are shared in shoot.
  • most MNase sites are near genes (though 95% are outside of genes). Minor peaks in frequency around 100 kb out
  • GWAS hits are enriched in and around open chromatin.
  • MNase HS distribution explains nearly as much of the variance in GWAS as CDS (despite representing less total part of the genome than CDS). Larger erniched if normalized by representation
  • shoot specific data explains most traits (but most traits are shoot related).

Questions

  • why MNase instead of FAIREseq or DNH

De Groot – Cancer application

  • genomic instability, genomic heterogeneity, complex, heterogeneous spatial context
  • use tissue engineering and micro-fludics to understand
  • why micro-fluidics — more assays on same sample volume, especially
  • use passive pumping, — different in pressure, operated with pipettes, no pumps
  • exploit low mixing in microfluidic volumes to create diffusion gradients/migration experiments.
  • Application to multiple Myeloma
    • a blood cancer that affects the eldery
    • manageable (with difficulty), terminal condition, strong interaction with microenbironment
    • myeloma resides in bone marrow — contributes to drug resistance
  • setup:
    • two culture chambers, connected by central well with small channels which allow for difffusion in uniform gradients
  • findings
    • monoculture not as predictive (separated but overlappping distributions)
    • in context of whole explaint in outside chambers, response is much better seperated
  • new setup: hanging drop culture: two well (allows adding food, treatments etc (and removing?) without disrupting 3D)
    • developed porous polymeric scaffold.
    • compare direct co-culture (physical interaction) and indirect coculture (share extracellular fluid)
    • stromal cells on scaffold added to mylenoma cells in bottom of droplet

Diane Dickel: Large scale in vivo enhancer deletion with CRISPR/Cas9 (Berkeley)

  • excellent talk, not open

Denis Lo – non-invasive molecular diagnostics

  • non-invasive prenatal testing
  • Plamsa DNA can be isolated from blood
    • can sequence these and separate
  • fetal DNA fragments tend to be small.
    • also see different distribution of peaks
    • fetal DNA depleted in linker after fragmentation (less heterochromatin)
  • mtDNA is very short, no peaks (no histones)
  • the greater the fraction of the DNA coming from the fetus the shorter the size distribution
  • can ID trisomies based on this size diagnostic (not based on counting)
  • can combine counting and size for more robust
  • can we detect cancer in this way? — cancer has CNV can be detected
  • now extending this approach to cohort of 200 patients, 32 healthy, 67 HBV carriers, 90 HCC patients (tumors) 84% detected.
    • this is all traditional sequencing measure
  • more tumor DNA the more short sequences detected.
  • biggest size difference corresponds with more tumors
  • antibodies bind circulating DNA? use anti-DNA antibodies
  • patients with active SLE have a larger peak at small levels, related to the amount of anti-DNA in the plasma
    • enriched in hypomethylated DNA.
  • increased tumor cell death, plasma DNA goes up more.

Questions

  • classify cancer types? – not yet, but maybe in combinations with other data
  • histones still associated in plasma DNA? – yes

Boris Rebolledo-Jaramillo on mtDNA

  • D-loop, relatively recent part of sequence, also most frequently mutated. involved in replication and transcription (which are interdependent in mtDNA)
  • most mutations which cause disease are heteroplasmic
    • severity of disease corresponds with fraction of abberant genomes
    • heteroplasmy goes through stochastic bottleneck
  • experiment:
    • blood and cheek swab samples from mother and child. Isolate mtDNA
    • developed robust pipeline to ID artifcats from sample contamination, vial swapping
  • Focus on sites with >1000 seq depth and MAF > 1%, -> 172 sites. significant by R. Nileson method
  • validate sites using illumina vs Sanger
  • ID heteroplasmy only in child or only in mother
    • evidence of bottleneck. Can calculate quantitative effect
  • Germ-line mutation rates.
    • D-loop 0.08, full mtDNA 0.013 (lower limit due to filtering for high confidence)
  • maternal age dependence on bottleneck frequency?
    • yes, older mothers more heteroplasmies
    • older mothers, more heteroplasmies in the child.
  • disease associated alleles – 1 in 8 mothers are carriers (but all are healthy in this study)
    • in 1 child levels of heteroplasmy are comparable to those in patients with the mtDNA disesase.
  • checked inference also looking at hair: frequencies are similar.

Siim Sober (U. Tartu, Estonia): RNA-seq of placental transcriptional landscape in normal and complicated pregnacies

  • Placenta: an important organ.
  • observed in dutch hunger that starvation in mothers during pregancy leads to lasting effects of metabolism of children (e.g. diabetes)
  • placenta disorder: preeclampsia – leads to premature delivery, 5% of pregnancies, linked to proteinuria (not enough protein?)
  • Experimental design
    • 8 normal births
    • 8 of reach of 4 issues: small gestational age, large gestitational age, preeclampsia, and maternal diabetes.
    • RNA seq, average 17x coverage.
  • enriched in placental genes.
  • ID genes associated with differential expression as a function of delivery (vaginal vs c-section) fetal gender differences are all sex-chromosome genes.
  • birth length and maternal age and number of births did not lead to any differential gene expression in the placenta.
  • preclambpsia gene expression is clearly distinct for numerous genes from other pregnancies.
  • correlated expression in baby size.

Ana Vinuela

  • regulatory landscape of Islet
  • 90% of variants associated with traits and diseases by GWAS are non-coding
  • effect of non-coding variants largely studied by eQTL — (expected effect on levels of gene expression). But this must be done on the right tissue.
  • type 2 diabetes.
    • islet of langerhans produce insulin
  • setup
    • difficult to get pancreatic samples
    • combined multiple studies data: 259 samples (whole Islets). 26 samples for Beta cells
    • need approach to remove batch effects by lab
  • normalizing
    • PC1 and PC2 cluster the different labs. Correct these effects in the first 10 PCs so the difference between labs disappears.
  • now doing eQTL discovery
  • ID islet eQTLs (mostly in promoters, as typically observed).
  • have an IRX3 eQTL
Posted in Conference Notes | Comments Off on Biology of Genomes 2015: Sat Morning (open talks)

Biology of Genomes Key Notes

Dr. George Davey Smith *Bristol): Key Note 1

  • Epidemiologist
  • was claimed that vitamin E reduced heart disease (observational studies), studied carefully in RCTs NO EFFECT
    • lesson of confounding factors
  • Mendialian randomization
    • no reverse causation in genetics, instrumental variable.
    • Mendelian mutation effects something which affects something else.
  • examples C-reactive protein and interlukin 6 associate
  • inlt-6 and fibrogen interact. and c-reactive and fibrogen interact
    • all effect heart disease
    • can’t get stat effect
    • Ln C-reactive protein robustly linked (explains > 1% of the variance)
    • genotype raised concentration of CRP – no effect. Ditto for Fibrongen
    • for Il6, high Il6 is actually linked to heart disease.
  • Mendilian randomization as analogous to a randomized control trial
  • Mendilian case – meiosis randomizes SNP linked to Se
    • SNP corresponds to different genetic lvels of selenium
    • both true RCT and Mendelian approach give same results on 20,000 + case/control study
  • another example – low body fat linked to lung cancer (because smoking reduces BMI). Mendilian randomization study clearly removes this confounding.
  • Multiphenotype Mendilian Randomization (MR)
    • lipids and CHD as an example. lipid from MR not singfincant
    • adjusting HD-L lowering risk of heart disease looks good, true effect
  • Limitations: reintroducing confounding via pleiotropy
  • Egger-regression: regress effect of SNP on effect of phenotype, can test existence of pleiotropy (from x-intercept) and still measure effect from slope. – address limitation of pleiotropic effects
  • interact instrument with a second variable:

alcohol consumption

  • ALDH2 mutants: males homozygous WT drink more than hets and homo don’t drink. women don’t drink.
  • drinking alcohol actually increases blood pressure — males who don’t drink have lower blood pressure (no effect in women, allele)
  • with genetic variant the mimics drug effect, can efficiently / cheaply conduct randomized trial.

Questions

  • non-single gene loci are a problem, but with enough data with independent combinations of these, it can be addresed

Francis Collins

  • (PhD Yale, MD NC)
  • what you may not know:
    • man who led the Human Genome mapping / sequencing
  • NIH director since 2009

Reflection from HGP to Precision Medicine

  • first time back since 2011
  • Human genome 1990-2003
    • challenge to public project from private industry
  • not in the post-genome era. We’re in the genome era.

Major advances

  • tumor cancer genomics
  • explosion of human microbiome
  • chromatin open or closed
  • GTEx (3 papers today in Science) + 3 other elsewhere:
  • the Big data problem: BD2K
    • big data to knowledge project (100 million per year) BD2K
    • NIH’s 6-year iniatitive
    • NCBI 10 Tb/day, 40 Tb/day downloads, 3Tb/day interactive (exponential growth)
  • future of National Library of Medicine
    • active working group

The Case for Precision Medicine: ‘Timing is Everything’

  • form some large scale prospective cohort
    • cost per human genome 1-5K in < 1 day
    • number of smart phones
  • announced in State of Union Address ‘precision medicine initative’
    • supposed to start this October
  • what is precision medicine?
    • fit he patient (not fully new, e.g. glasses)
    • most medical things are given for the ‘average patient’ (if for any scientific reason at all)
  • why now?
    • electronic health records
    • wearable medical sensors
    • genomics
    • metabolomics
  • what’s needed now?
    • rigorous research program (need to recruit people)!

Vision

  • personal / precision medicine advanced the furthest in cancer
  • patient partnerships, Elect. Health. Rec (EHR).
  • president proposes budget increase of 215 million (mostly through NIH, 70 for cancer, 130 for cohort).
  • reasonable chance of being passed by congress.
  • ‘liquid biopsies’ (circulating tumor DNA)
  • other new technologies? Multi-therapy approaches?
  • ID mass with liquid bioposy showing tumor risk mutation.
  • Longer term: pilots to build up cohort to 1 million+ volunteers
    • already millions involved in existing NIH funded longitudal studies

Cohort

  • data driven cohorts – psychaiatric diseases for example clearly lack molecular based clustering / appropriate for
  • human knock ID – nature’s solution of diseases resiliance. ID protective genetic factors (and other factors)
  • Pharmacogeneomics – over 100 drugs list information about genetic influences on the label (largely being ignored now)
  • Annual physical exam (not so much evidence this is useful).
  • “Make no little plans, they have no magic to stir men’s blood and probably themselves will not be realized. Make big plans; aim high in hope and work” – Daniel Burnham

Questions

  • 23 and me approaching 1 million
  • what’s the role of basic research in this initative?
    • 53% of NIH to basic science
    • this will be more of a clinical / applied bent
  • provision for training doctors?
    • that’s a challenge
Posted in Summaries | Comments Off on Biology of Genomes Key Notes

Protected: Friday talks (non-tweetable)

This content is password protected. To view it please enter your password below:

Posted in Conference Notes | Comments Off on Protected: Friday talks (non-tweetable)