- Revise response to reviewer 1
- Figure updates
- change blue color
- fix symbols
- discussed changes with XZ again
- run 1: 150902_NS500422_0189_AHHNKFBGXX Illumina HiSeq 2500, 100 bp single read, rapid flowcell
- run 2: 151030_SN343_0533_AC8CW1ACXX Illumina HiSeq 2000, 50 bp single read, standard cell
Run 1 checksums
windows checksum doesn’t give a checksum report, it just runs for a while and then stops.
Identified issues with Fastq files
- some reads have no read data
- writing script to remove reads (all 4 lines) which have no sequences (just an index primer).
More failed programs
- Odyssey claims to have a fastqc installed but it doesn’t run.
- typing exactly module load fastqc/0.11.3-fasrc01 gives an error.
- tried virtualbox. Can’t access local files. This is stupid unfortunate.
- tried bioawk. Installed on linux virtualbox. can’t get on Odyssey. Can’t access local files to run.
- tried fastqValidate. Can’t install on windows. install fails on linux virtualbox.
- removing blanks by hand with custom matlab script
- accidently added spaces in printing new fastq files. Fixed this.
- fixed files do run in bowtie, but it still complains about a bunch of reads having insufficient length
- wrote new matlab file to cut on read lengths less than 20
- had some errors with this script.
- fixed issue with not checking read lines (was checking all lines, then was checking only 1st line, not read line).
- lots of issues with disk copy speed.
- was cutting at 20 bp reads, I think this might cut too much — not resemble data. Dropped to cutting at 10 bp reads.
12:30 pm – 11:30 pm
(HHS appointment in morning for crushed foot).
Making RNA probes through old method as a control for RNA staining quality
- found sna and hb probe templates
- hb tube is empty, attempted to dissolve DNA from evaporated sample with added H2O (10 uL).
- ran PCR reactions with m13F + m13R primers using Phusion
- 30 cycles (reset annealing temp?)
- run test gel (0.7% in TAE with gene ruler express)
- sna amplified, hb didn’t. Probably ran a bit too many cycles / not enough primer: some snail concatamers. Shouldn’t really cause problems.
in vitro transcription reaction (old style as control)
- following old protocol
- Per 20 uL reaction
- 4 uL 5x buffer
- 2 uL 10x dig-mix
- 1 uL T7
- 1 uL RNasin
- 1 uL DTT
- 5 uL DNA template
- 6 uL ddH2O
- Run as a 40 uL reaction
- 8 uL 5x buffer
- 5 uL 10x dig-mix (expired)
- 3 uL T7 (expired)
- 2 uL RNAsin
- 2 uL DTT
- 10 uL DNA template
- 10 uL ddH2O
- ran carb treatment, STOP treatment, and percipitate with LiCl overnight at -20C.
- discuss shortening with XZ
- revise text to shorten
- update calculator
- revise figures
(Wed and Thur in NYC for Dale Frey interview)
9:30 am – 7:00 pm
sequencing data processing
- bowtie finished running on all the datasets (20)
- Need to run cufflinks.
- first need to sort sam data
- using sam tools (should have set this up to run overnight, it’s quite slow)
- also need to copy the data
- (this RNA-seq pipeline I have sucks — 6 hours to download the data, 4 hours to unzip it, overnight to run bowtie (probably finished in less than 1 hour since I did that multicore 20 threads), 2 hours to resort the data with samtools (longer because I was messing around with trying to multicore this), then XX hours to copy the data to RC and XX hours to run cufflinks.
Sample Name Tube Tindex IndexSeq NEB Description
Ph1 A 1 ATCACG 1 original Ph KD sample
M1 A 2 CGATGT 2 original WT control sample
2-4P A 3 GCCAAT 6 KD performed on day 0 and day 2, extracted on day 4
4W A 4 CAGATC 7 latest WT sample, extracted on day 4
4P A 5 ACTTGA 8 latest Ph-KD sample, extracted on day 4
10-22#1 B 1 ATCACG 1 (WT) sample extracted on day 4 (need to check ID)
10-22#2 B 2 CGATGT 2 (Ph-KD) sample extracted on day 4 (need to check ID)
2W B 3 TTAGGC 3 WT sample, extracted on day 2
2P B 4 TGACCA 4 Ph-KD sample, extracted on day 2
2-4W B 5 ACAGTG 5 WT for KD performed on day 0 and day 2, extracted on day 4
- somehow my bt2 files got put in my repo and accidently committed last night
- BFG is an excellent tool for quickly and easily scrubbing these from the repo (https://rtyley.github.io/bfg-repo-cleaner/), which has previously been a horrible pain.
9:15 am – 6:00pm, 8:00pm – 8:55 pm
- goal: building 10 kb tiling regions of BX-C probes against genome
- approach: adapt new OligoArray-free pipeline from transcriptome libraries to design genome libraries
- crashed Morgan when all 256 GB of RAM got paged during penalty calculations
- this may have happened because the target sequences are all included in the off-target table (the whole genome)
- tested this hypothesis by running against chr2L only (currently constructing for BX-C on 2R)
- this runs quickly without any sign of crashing.
- (might have been a mis-interpretation — code is written to load an existing trDesigner rather than rerun. May not have attempted to use this off-target library at all.)
- indeed this appears to have been the case, now rerunning.
- wrote new function to remove regions probed from the genome sequence and create a new offTarget fasta file database of the genome for this library construction
- writing whole-genome fasta files is not fast, even for Drosophila (nor is loading them, though the fly fasta loads pretty quickly)
- building OTtable to whole genome duplicated to plus and minus strands, (with BXC removed) goes pretty fast — less than 400 seconds
- this goes MUCH faster with local databases, read-write speed pretty important, doesn’t work well over network.
- moved to new folder on MorganData (ChromatinLibraries/Library7)
- (originally had data on Monet in Chromatin/OligoLibraries/Library7)
- calculating penalties is pretty slow
- this is the step that caused the memory to destroy the computer running (maxed out).
- currently have 29 GB paged, now calc’ penalities with the chr2L_plus strand only. We’ll see how much it eats and whether this destroys the system again. (started 8:45 pm).
- even without BX-C this does indeed use tons of memory. aside from that it’s not killer slow.
12:00 pm – 6:30 pm, working on applications
Submitted all apps with Nov 1st deadlines.