Protected: project 2 03/19/14

This content is password protected. To view it please enter your password below:

Posted in Genomics | Tagged , | Comments Off on Protected: project 2 03/19/14

Monday 03/17/14

9:45a – 8:00p

Computer Maintenance

  • security updates on Cajal + reboot
  • security updates on Tuck – reboot postponed due to users still analyzing data
  • write to CDW about
    • is this software or just a number: 2012 server
    • why 2 versions of remote desktop licenses. What’s the difference?

Data processing

  • not all F12F11 data finished analyzing before files were split and deleted? Analyzing splitdax files on Tuck. — remember to reset region of interest for 256×256 on split files.
  • still splitting multi-color data (will be running another 4 days or so probably)
  • 140316_L3E7 data successfully split and averaged during acquistion, (yay!), now fitting data on Cajal
  • run all basic analysis / plots on 140316_L3E7 data
  • splitting 140313_L3E8 data
  • run TimeAverageMovie on ‘L:\140313_L3E8\splitdax\’
  • run dot fitting analysis on ‘L:\140313_L3E8\splitdax\’
  • run CopyAndAnalyzeOnArrival on 140317_L3E7 data

Chromatin Project

Data analysis

  • ChromatinCroppin’ the F12F11 data

croppingF12F11data

OligoPaints Project

  • cy5 on DNA stats
  • from movie recorded by Hao last year.
  • re-ChromatinCropped S3 (405 + secondary), S1 (secondary), and S4 (2x cy5) data

Cy5stats

AnalyzingCy5onDNA

Posted in Summaries | Tagged , | Comments Off on Monday 03/17/14

Protected: Overview of 140310 L4E4 data

This content is password protected. To view it please enter your password below:

Posted in Genomics | Tagged , , | Comments Off on Protected: Overview of 140310 L4E4 data

Sunday 03/16/14

10:30a – 6:00p

Goals

  • flip fly stocks (done)
  • passage cells (done)
  • scheduling
    • christia (May 17-20)
    • dad’s conference (April 25th)
    • dentist (May 7th)

More data processing!

  • running splitdax on 140310_L4E4 data
  • running bead averaging on 140310_L4E4 data
  • analyzing averaged beads from 140310_L4E4
  • Not all 140310_L4E4 data copied from STORM4! copying remaining data. Need to relaunch all analysis 🙁
  • running splitdax on dual color data — currently saves the 647 channel from the 750 movies because we haven’t told it to distinguish. This makes all the running a bit slower.

Issues with data

  • 2014-01-14_F03-6_F04-7 beads alignment is clearly off — systematic shift in 750 647 observed
  • 2014-01-15_F04-6_F03-7. bead alignment is okay, substantial shift (8-15 um?) in acquiring conventional images to STORM. Can’t use conventional masks! 🙁
  • 2014-01-17_F03-6_F05-7. 750 data not analyzed (only beads are analyzed in 750, hopefully using the correct parameters).

Project 2

  • looking at beads and setting up data run with Hao

Ph Project

  • Ajaz found bug in k_d equation. Rerunning simulations. I think this actually expands the parameter space that gives the decreasing Ph behavior (as suits the logic). Previous dissociation rate had wrong dependence on mutant PH concentration in the cluster.
  • more explorations.
Posted in Summaries | Tagged , , | Comments Off on Sunday 03/16/14

Rethinking Project Organization

How do you organize your research projects?

For the last three years or so, I have adopted a project organization structure along the lines recommended by William Noble in PLoS CB: A Quick Guide to Organizing Computational Biology Projects.

The basic organizational idea here is that each project get’s its own directory. The top level of that directory is organized categorically. In my case I have Doc, Data, Results, Code and sometimes some other things. Code is version managed, and split into Functions (code that takes arguments and will be reused repeatedly) and Scripts code I intend to run only once. Since Scripts rapidly gets very large, and is run only once, maybe it shouldn’t be version managed, but I do occasionally re-run or write over scripts. Also version management acts as a backup if I actually push the versions somewhere (which I don’t at present), and a fail-safe against accidental deletion and overwriting of files. I recently added a third folder to Code for a number of projects called FigureCode. These are Scripts that start off in the Scripts folder, but once they are running and producing final versions of figures for that day which I am planning to present somewhere, these functional Figure producing scripts are saved in a new folder. All Scripts and FigureCode have the date (yymmdd) appended to the end of the filename.

I’ve encountered several problems having this system work with my needs. I take the time here to reflect on these problems and postulate some ways forward.

Big data projects

My projects folders live on in My Documents on C:\, whereas my data is stretched across (at present) 19 different external hard-drives (the most recent of which have 32Tb capacity). The data goes on these hard-drives as some function of the date on which it was collected, but get shuffled around to fill dead space (e.g. drive 10 has 500 GB free but tonight’s imaging session requires 600 GB but a few weeks later I have a 500 GB set that will just fit. Or I develop a cool compression algorithm for some data and decide some other data was a failed run and delete it, also opening up space on older disks which then gets filled). Consequently these data addresses aren’t permanent upon creation.

Ideally I would like the directory for the project to have links to the relevant data folders. Maybe this just requires more manual overhead to make sure I link and re-link files to the Data folder of the project — but that still sounds like a recipe for trouble.

It would be awesome instead to just tag data with a project name and have that automatically hook the data into the correct project folder. Maybe if my data folders include categorical tags (e.g. in the filename), I could write a little Daemon that looks through the data hard-drives and the data folders and updates all the links in the data-folders. This would be great to improve the tracking of datasets. One substantial challenge would be that some of the hard-drives go offline (to free up USB ports etc) and the Daemon would need to be able to tell did this move somewhere else or is it just disconnected. I think I could do it though — any file for which it can’t find a new link it just leaves the old link intact. Also need to deal with Windows assigning different drive letters to disks as you plug them into different ports (even though the folder has a common name). I’m sure I can write something smart enough to ignore the drive letter (search all) and just use the hard-drive disk name (e.g. Alistair2) though.

Folder names — dates vs. labels

William Noble uses categorical labels at top directory level, with date labels at the next level (e.g. Results\2014-02-14\). The date labeling is great, I like the alpha-numeric sorting by date, (year-month-day is a great idea), especially since date modified indexing reflects recent modifications note date results / data were taken, which is what I want. But a big folder full of dates is actually not too transparent — I stare at these and say which date did I take that data? and then flip through a bunch of them. Sometimes I get frustrated and go look up the date in my electronic lab notebook, but it’s nicer to have an at-a-glance system.

As a work around I have adopted a mixed date / categorical structure for this second level of organization, where some short tag is appended to the date to help trigger my memory about what it was: Results\2014-02-14_PHmutPSC.

Work not related to a specific project

I also spend a decent amount of time working on code that is not for a specific project (maybe that says something about my scientific efficiency). The idea though is that these additional coding projects make multiple aspects of my research faster more accurate and better documented. For example, several of my projects use STORM as an imaging method, and all my projects use Matlab to analyze data, so in collaboration with Jeff Moffitt (a fellow post-doc in the lab) I write code for a git repository we call matlab-storm. I also have a bunch of Matlab functions from my PhD work which I called Image-Analysis. These work alright as projects in their own right, since the scope is well defined. Though maybe they shouldn’t be since they aren’t really leading towards separate publications, though I suppose they could if we found the time to write it up.

A bigger challenge has been all the little functions I write to help with one project, which I then find myself re-using for other projects. My current (trial) solution for this is taking me further away from the basic projects organization I started with.

Master Toolbox Approach: matlab-functions

I recently started a new git repository called matlab-functions. As the name suggests, the goal of this project is to collect a large family of functions in one place. This way I don’t have to remember when I want to use that cool new function I wrote last week, if it was part of the Chromatin project Functions or the Ph Polymerization project Functions or whatever. The second and more important motivation was it makes it easier to share these functions. When it’s just me it’s not hard to just have all folders that contain custom functions added to my filepath via Matlab startup.m.

I’m also experimenting with more subfolders to organize these functions. These are reasonably sepecific — for instance, qPCR, BLAST and DeepSequencing are separate subfolders instead of having Genome Tools. More subfolders has the advantage of making it easier to understand what the project contains, since files are now grouped and organized. I guess ideally I would just tag files rather than put them in different subfolders and Windows Explorer window (or Nautilus / GNOME files on my Linuxbox) could display a list of tags instead of a list of folders, but operating systems seem to be behind the curve on this one.

This repository is now a private repo on Git with my research collaborators added as collaborators.

Old vs. new projects

As I accumulate projects in my Projects directory, it would be nice to see just the active ones (sort by date does okay here). Some of these projects are all published, pretty much wrapped up, and I haven’t done further work on them in years. Some are projects I started and abandoned. And some of course are the active ones that I hop between all the time.

Posted in Research Planning | Comments Off on Rethinking Project Organization

Protected: Overview of 140308 L4E3 data

This content is password protected. To view it please enter your password below:

Posted in Genomics | Tagged , , | Comments Off on Protected: Overview of 140308 L4E3 data

Friday 03/14/14

9:45a – 9:00p

Meetings

  • meetings with G0 students, present lab research, discuss lab life

Computing resources

  • Troubleshooting Tuck
  • high flucutating memory despite little use
  • memory drops precipitously when Hao closes matlab
  • TSTORMdata drive is still flashes in use, despite no read-write activity
  • crazy / slow behavior on tuck seems to be an incompatability of tuck with the 32TB tower.

tuckRandomMemOscillations

PH project

  • working on writeup of model details (see sharelatex project)
Posted in Summaries | Comments Off on Friday 03/14/14

Thursday 03/13/14

9:00a — 11:30p

communication

  • Help Jaeger student with Unsupervised_Dotfinding.m (see email)

meetings

  • planning meeting for G0 presentation
    • jeff will embed movies
    • got slide from Bryan
    • got movies from Colenso
    • put in example images of chromatin
    • need to rehearse talk!
  • meeting for project2 — see notes

PH project

  • primary focus today.
  • see post

Project 2

  • running more splitdax (I think this is never going to end)
  • acquiring more data tonight
  • move process-on-arrival scripts to matlab-functions
Posted in Summaries | Tagged , , | Comments Off on Thursday 03/13/14

Protected: PH project update

This content is password protected. To view it please enter your password below:

Posted in Chromatin | Tagged | Comments Off on Protected: PH project update

Protected: project 2 update

This content is password protected. To view it please enter your password below:

Posted in Genomics | Tagged , , | Comments Off on Protected: project 2 update