10:00 am
Tasks
Manuscript
- Revise response to reviewer 1
- Figure updates
- change blue color
- fix symbols
- discussed changes with XZ again
Fastq files
- run 1: 150902_NS500422_0189_AHHNKFBGXX Illumina HiSeq 2500, 100 bp single read, rapid flowcell
- run 2: 151030_SN343_0533_AC8CW1ACXX Illumina HiSeq 2000, 50 bp single read, standard cell
Run 1 checksums
5_S5.R1.fastq.gz 35edcf8de86c0121ed0299e3026c3996
3_S3.R1.fastq.gz 225a5a5c2318e07deb044fd2bb5f5d06
2_S2.R1.fastq.gz 07203cda1dde852c01c3eb1014e97574
6_S6.R1.fastq.gz d07b0b67730adf45d5e3abf9926bf2d1
1_S1.R1.fastq.gz 6a61879b157d3710e4e81c07ef81c920
4_S4.R1.fastq.gz 63cdc9bc347d26f43e2d0e72fba469c5
windows checksum doesn’t give a checksum report, it just runs for a while and then stops.
Identified issues with Fastq files
- some reads have no read data
- writing script to remove reads (all 4 lines) which have no sequences (just an index primer).
More failed programs
- Odyssey claims to have a fastqc installed but it doesn’t run.
- https://portal.rc.fas.harvard.edu/apps/modules
- typing exactly module load fastqc/0.11.3-fasrc01 gives an error.
- tried virtualbox. Can’t access local files. This is stupid unfortunate.
- tried bioawk. Installed on linux virtualbox. can’t get on Odyssey. Can’t access local files to run.
- tried fastqValidate. Can’t install on windows. install fails on linux virtualbox.
- removing blanks by hand with custom matlab script
- accidently added spaces in printing new fastq files. Fixed this.
- fixed files do run in bowtie, but it still complains about a bunch of reads having insufficient length
- wrote new matlab file to cut on read lengths less than 20
- had some errors with this script.
- fixed issue with not checking read lines (was checking all lines, then was checking only 1st line, not read line).
- lots of issues with disk copy speed.
- was cutting at 20 bp reads, I think this might cut too much — not resemble data. Dropped to cutting at 10 bp reads.