Wednesday 11/18/15

10:30 am – 7:00 pm


  • formatting figures for Ph Polymerization paper.
  • sent final version of figures to team.
  • discussion with XZ about proposal.
  • Phone interview (3pm -3:30pm)
  • discuss model fitting with GF, reply to team with conclusions.

Recently complete

  • flipped fly stocks yesterday
  • sent abstract to Cornell

Still to do

  • formatting of supp figures
    • adjust page sizes
    • make sure headings are uniform
    • combine into single PDF with text portion


  • practice talk Friday morning (10 am)
  • practice chalk talk Mon evening (5 pm)
  • need to revise research proposal and send to XZ
Posted in Summaries | Comments Off on Wednesday 11/18/15

Tuesday 11/10/15

10:00 am



  • Revise response to reviewer 1
  • Figure updates
    • change blue color
    • fix symbols
  • discussed changes with XZ again

Fastq files

  • run 1: 150902_NS500422_0189_AHHNKFBGXX Illumina HiSeq 2500, 100 bp single read, rapid flowcell
  • run 2: 151030_SN343_0533_AC8CW1ACXX Illumina HiSeq 2000, 50 bp single read, standard cell

Run 1 checksums

5_S5.R1.fastq.gz 35edcf8de86c0121ed0299e3026c3996
3_S3.R1.fastq.gz 225a5a5c2318e07deb044fd2bb5f5d06
2_S2.R1.fastq.gz 07203cda1dde852c01c3eb1014e97574
6_S6.R1.fastq.gz d07b0b67730adf45d5e3abf9926bf2d1
1_S1.R1.fastq.gz 6a61879b157d3710e4e81c07ef81c920
4_S4.R1.fastq.gz 63cdc9bc347d26f43e2d0e72fba469c5

windows checksum doesn’t give a checksum report, it just runs for a while and then stops.

Identified issues with Fastq files

  • some reads have no read data
  • writing script to remove reads (all 4 lines) which have no sequences (just an index primer).

More failed programs

  • Odyssey claims to have a fastqc installed but it doesn’t run.
    • typing exactly module load fastqc/0.11.3-fasrc01 gives an error.
  • tried virtualbox. Can’t access local files. This is stupid unfortunate.
  • tried bioawk. Installed on linux virtualbox. can’t get on Odyssey. Can’t access local files to run.
  • tried fastqValidate. Can’t install on windows. install fails on linux virtualbox.
  • removing blanks by hand with custom matlab script
  • accidently added spaces in printing new fastq files. Fixed this.
  • fixed files do run in bowtie, but it still complains about a bunch of reads having insufficient length
  • wrote new matlab file to cut on read lengths less than 20
    • had some errors with this script.
    • fixed issue with not checking read lines (was checking all lines, then was checking only 1st line, not read line).
  • lots of issues with disk copy speed.
  • was cutting at 20 bp reads, I think this might cut too much — not resemble data. Dropped to cutting at 10 bp reads.
Posted in Summaries | Comments Off on Tuesday 11/10/15

Monday 11/09/15

12:30 pm – 11:30 pm

(HHS appointment in morning for crushed foot).

Making RNA probes through old method as a control for RNA staining quality

  • found sna and hb probe templates
  • hb tube is empty, attempted to dissolve DNA from evaporated sample with added H2O (10 uL).
  • ran PCR reactions with m13F + m13R primers using Phusion
  • 30 cycles (reset annealing temp?)
  • run test gel (0.7% in TAE with gene ruler express)
  • sna amplified, hb didn’t. Probably ran a bit too many cycles / not enough primer: some snail concatamers. Shouldn’t really cause problems.

in vitro transcription reaction (old style as control)

  • following old protocol
  • Per 20 uL reaction
    • 4 uL 5x buffer
    • 2 uL 10x dig-mix
    • 1 uL T7
    • 1 uL RNasin
    • 1 uL DTT
    • 5 uL DNA template
    • 6 uL ddH2O
  • Run as a 40 uL reaction
    • 8 uL 5x buffer
    • 5 uL 10x dig-mix (expired)
    • 3 uL T7 (expired)
    • 2 uL RNAsin
    • 2 uL DTT
    • 10 uL DNA template
    • 10 uL ddH2O
  • ran carb treatment, STOP treatment, and percipitate with LiCl overnight at -20C.

Manuscript edits

  • discuss shortening with XZ
  • revise text to shorten
  • update calculator
  • revise figures
Posted in Summaries | Comments Off on Monday 11/09/15

Friday 11/06/15

(Wed and Thur in NYC for Dale Frey interview)

9:30 am – 7:00 pm

sequencing data processing

  • bowtie finished running on all the datasets (20)
  • Need to run cufflinks.
  • first need to sort sam data
  • using sam tools (should have set this up to run overnight, it’s quite slow)
  • also need to copy the data
  • (this RNA-seq pipeline I have sucks — 6 hours to download the data, 4 hours to unzip it, overnight to run bowtie (probably finished in less than 1 hour since I did that multicore 20 threads), 2 hours to resort the data with samtools (longer because I was messing around with trying to multicore this), then XX hours to copy the data to RC and XX hours to run cufflinks.

Sample organization:

Sample Name Tube Tindex IndexSeq NEB Description
Ph1 A 1 ATCACG 1 original Ph KD sample
M1 A 2 CGATGT 2 original WT control sample
2-4P A 3 GCCAAT 6 KD performed on day 0 and day 2, extracted on day 4
4W A 4 CAGATC 7 latest WT sample, extracted on day 4
4P A 5 ACTTGA 8 latest Ph-KD sample, extracted on day 4
10-22#1 B 1 ATCACG 1 (WT) sample extracted on day 4 (need to check ID)
10-22#2 B 2 CGATGT 2 (Ph-KD) sample extracted on day 4 (need to check ID)
2W B 3 TTAGGC 3 WT sample, extracted on day 2
2P B 4 TGACCA 4 Ph-KD sample, extracted on day 2
2-4W B 5 ACAGTG 5 WT for KD performed on day 0 and day 2, extracted on day 4

bash CufflinksArray1.bash

repo issues

  • somehow my bt2 files got put in my repo and accidently committed last night
  • BFG is an excellent tool for quickly and easily scrubbing these from the repo (, which has previously been a horrible pain.
Posted in Summaries | Comments Off on Friday 11/06/15

Tuesday 11/03/15

9:00 am – 7:30 pm

Library building code

  • fixed bugs in library design — can’t call TRDesigner with parallel pool with large off-target libraries.
    • it takes a ton of time and a ton of memory to replicate these large OTTables
    • this is blazingly fast in non-parallel mode.
  • No need to use parallelization in building OTTables — launching the parallel matlab cores is slow, uses more memory, and doesn’t speed things up for this code (at least on the scale of my current library design).
  • worked out primer building
  • don’t actually need lots of index primers, part of the beauty of the new design is to move away from having tons of individually indexed regions.
  • requested readout probes sequences from SW, stick with 30mers, these are working well.

Updated scripts

  • AssembleChrLib7_151103
  • S151103_PlotCutSitesAndChIPdata
    • (finished creating coverage vectors for all ChIP-seq data).


  • seminar on interviews (see notes)
  • practicing talk for DF interview tomorrow.
Posted in Summaries | Comments Off on Tuesday 11/03/15

Monday 11/02/15

9:15 am – 6:00pm, 8:00pm – 8:55 pm

Library building

  • goal: building 10 kb tiling regions of BX-C probes against genome
  • approach: adapt new OligoArray-free pipeline from transcriptome libraries to design genome libraries


  • crashed Morgan when all 256 GB of RAM got paged during penalty calculations
  • this may have happened because the target sequences are all included in the off-target table (the whole genome)
  • tested this hypothesis by running against chr2L only (currently constructing for BX-C on 2R)
    • this runs quickly without any sign of crashing.
    • (might have been a mis-interpretation — code is written to load an existing trDesigner rather than rerun. May not have attempted to use this off-target library at all.)
    • indeed this appears to have been the case, now rerunning.
  • wrote new function to remove regions probed from the genome sequence and create a new offTarget fasta file database of the genome for this library construction
    • writing whole-genome fasta files is not fast, even for Drosophila (nor is loading them, though the fly fasta loads pretty quickly)
    • building OTtable to whole genome duplicated to plus and minus strands, (with BXC removed) goes pretty fast — less than 400 seconds
  • this goes MUCH faster with local databases, read-write speed pretty important, doesn’t work well over network.
    • moved to new folder on MorganData (ChromatinLibraries/Library7)
    • (originally had data on Monet in Chromatin/OligoLibraries/Library7)
  • calculating penalties is pretty slow
    • this is the step that caused the memory to destroy the computer running (maxed out).
    • currently have 29 GB paged, now calc’ penalities with the chr2L_plus strand only. We’ll see how much it eats and whether this destroys the system again. (started 8:45 pm).
    • even without BX-C this does indeed use tons of memory. aside from that it’s not killer slow.
Posted in Summaries | Comments Off on Monday 11/02/15

Protected: Sunday 11/01/15

This content is password protected. To view it please enter your password below:

Posted in Summaries | Comments Off on Protected: Sunday 11/01/15

Saturday 10/31/15

12:00 pm – 6:30 pm, working on applications
Submitted all apps with Nov 1st deadlines.

Posted in Summaries | Comments Off on Saturday 10/31/15

Protected: lab meeting 10/30/15

This content is password protected. To view it please enter your password below:

Posted in Summaries | Comments Off on Protected: lab meeting 10/30/15

Friday 10/30/15

9:30 am – 5:00 pm


  • working on Harvard MCB application
  • applications won’t submit requests to letter writers until application is complete.
  • Princeton application also won’t submit requests to letter writers
  • given that a motivated candidate would like to have a tailored, carefully edited application to each school which has the most recent updates of publications, it is quite convenient to submit the final application at the last moment before it will be read. This is however highly inconvenient to letter writers, who get a minimum of advanced notice for no especially good reason.
Posted in Summaries | Comments Off on Friday 10/30/15