Protected: project 2 03/19/14

Posted on March 18, 2014 by admin

Posted in Genomics | Tagged planning, project 2 | Comments Off

Monday 03/17/14

Posted on March 17, 2014 by admin

9:45a – 8:00p

Computer Maintenance

security updates on Cajal + reboot
security updates on Tuck – reboot postponed due to users still analyzing data
write to CDW about
- is this software or just a number: 2012 server
- why 2 versions of remote desktop licenses. What’s the difference?

Data processing

not all F12F11 data finished analyzing before files were split and deleted? Analyzing splitdax files on Tuck. — remember to reset region of interest for 256×256 on split files.
still splitting multi-color data (will be running another 4 days or so probably)
140316_L3E7 data successfully split and averaged during acquistion, (yay!), now fitting data on Cajal
run all basic analysis / plots on 140316_L3E7 data
splitting 140313_L3E8 data
run TimeAverageMovie on ‘L:\140313_L3E8\splitdax\’
run dot fitting analysis on ‘L:\140313_L3E8\splitdax\’
run CopyAndAnalyzeOnArrival on 140317_L3E7 data

Chromatin Project

Data analysis

ChromatinCroppin’ the F12F11 data

OligoPaints Project

cy5 on DNA stats
from movie recorded by Hao last year.
re-ChromatinCropped S3 (405 + secondary), S1 (secondary), and S4 (2x cy5) data

Posted in Summaries | Tagged chromatin, OligoPaints | Comments Off

Protected: Overview of 140310 L4E4 data

Posted on March 17, 2014 by admin

Posted in Genomics | Tagged images, project 2, troubleshooting | Comments Off

Sunday 03/16/14

Posted on March 16, 2014 by admin

10:30a – 6:00p

Goals

flip fly stocks (done)
passage cells (done)
scheduling
- christia (May 17-20)
- dad’s conference (April 25th)
- dentist (May 7th)

More data processing!

running splitdax on 140310_L4E4 data
running bead averaging on 140310_L4E4 data
analyzing averaged beads from 140310_L4E4
Not all 140310_L4E4 data copied from STORM4! copying remaining data. Need to relaunch all analysis 🙁
running splitdax on dual color data — currently saves the 647 channel from the 750 movies because we haven’t told it to distinguish. This makes all the running a bit slower.

Issues with data

2014-01-14_F03-6_F04-7 beads alignment is clearly off — systematic shift in 750 647 observed
2014-01-15_F04-6_F03-7. bead alignment is okay, substantial shift (8-15 um?) in acquiring conventional images to STORM. Can’t use conventional masks! 🙁
2014-01-17_F03-6_F05-7. 750 data not analyzed (only beads are analyzed in 750, hopefully using the correct parameters).

Project 2

looking at beads and setting up data run with Hao

Ph Project

Ajaz found bug in k_d equation. Rerunning simulations. I think this actually expands the parameter space that gives the decreasing Ph behavior (as suits the logic). Previous dissociation rate had wrong dependence on mutant PH concentration in the cluster.
more explorations.

Posted in Summaries | Tagged chromatin, Ph, project 2 | Comments Off

Rethinking Project Organization

Posted on March 16, 2014 by admin

How do you organize your research projects?

For the last three years or so, I have adopted a project organization structure along the lines recommended by William Noble in PLoS CB: A Quick Guide to Organizing Computational Biology Projects.

The basic organizational idea here is that each project get’s its own directory. The top level of that directory is organized categorically. In my case I have Doc, Data, Results, Code and sometimes some other things. Code is version managed, and split into Functions (code that takes arguments and will be reused repeatedly) and Scripts code I intend to run only once. Since Scripts rapidly gets very large, and is run only once, maybe it shouldn’t be version managed, but I do occasionally re-run or write over scripts. Also version management acts as a backup if I actually push the versions somewhere (which I don’t at present), and a fail-safe against accidental deletion and overwriting of files. I recently added a third folder to Code for a number of projects called FigureCode. These are Scripts that start off in the Scripts folder, but once they are running and producing final versions of figures for that day which I am planning to present somewhere, these functional Figure producing scripts are saved in a new folder. All Scripts and FigureCode have the date (yymmdd) appended to the end of the filename.

I’ve encountered several problems having this system work with my needs. I take the time here to reflect on these problems and postulate some ways forward.

Big data projects

My projects folders live on in My Documents on C:\, whereas my data is stretched across (at present) 19 different external hard-drives (the most recent of which have 32Tb capacity). The data goes on these hard-drives as some function of the date on which it was collected, but get shuffled around to fill dead space (e.g. drive 10 has 500 GB free but tonight’s imaging session requires 600 GB but a few weeks later I have a 500 GB set that will just fit. Or I develop a cool compression algorithm for some data and decide some other data was a failed run and delete it, also opening up space on older disks which then gets filled). Consequently these data addresses aren’t permanent upon creation.

Ideally I would like the directory for the project to have links to the relevant data folders. Maybe this just requires more manual overhead to make sure I link and re-link files to the Data folder of the project — but that still sounds like a recipe for trouble.

It would be awesome instead to just tag data with a project name and have that automatically hook the data into the correct project folder. Maybe if my data folders include categorical tags (e.g. in the filename), I could write a little Daemon that looks through the data hard-drives and the data folders and updates all the links in the data-folders. This would be great to improve the tracking of datasets. One substantial challenge would be that some of the hard-drives go offline (to free up USB ports etc) and the Daemon would need to be able to tell did this move somewhere else or is it just disconnected. I think I could do it though — any file for which it can’t find a new link it just leaves the old link intact. Also need to deal with Windows assigning different drive letters to disks as you plug them into different ports (even though the folder has a common name). I’m sure I can write something smart enough to ignore the drive letter (search all) and just use the hard-drive disk name (e.g. Alistair2) though.

Folder names — dates vs. labels

William Noble uses categorical labels at top directory level, with date labels at the next level (e.g. Results\2014-02-14\). The date labeling is great, I like the alpha-numeric sorting by date, (year-month-day is a great idea), especially since date modified indexing reflects recent modifications note date results / data were taken, which is what I want. But a big folder full of dates is actually not too transparent — I stare at these and say which date did I take that data? and then flip through a bunch of them. Sometimes I get frustrated and go look up the date in my electronic lab notebook, but it’s nicer to have an at-a-glance system.

As a work around I have adopted a mixed date / categorical structure for this second level of organization, where some short tag is appended to the date to help trigger my memory about what it was: Results\2014-02-14_PHmutPSC.

Work not related to a specific project

I also spend a decent amount of time working on code that is not for a specific project (maybe that says something about my scientific efficiency). The idea though is that these additional coding projects make multiple aspects of my research faster more accurate and better documented. For example, several of my projects use STORM as an imaging method, and all my projects use Matlab to analyze data, so in collaboration with Jeff Moffitt (a fellow post-doc in the lab) I write code for a git repository we call matlab-storm. I also have a bunch of Matlab functions from my PhD work which I called Image-Analysis. These work alright as projects in their own right, since the scope is well defined. Though maybe they shouldn’t be since they aren’t really leading towards separate publications, though I suppose they could if we found the time to write it up.

A bigger challenge has been all the little functions I write to help with one project, which I then find myself re-using for other projects. My current (trial) solution for this is taking me further away from the basic projects organization I started with.

Master Toolbox Approach: matlab-functions

I recently started a new git repository called matlab-functions. As the name suggests, the goal of this project is to collect a large family of functions in one place. This way I don’t have to remember when I want to use that cool new function I wrote last week, if it was part of the Chromatin project Functions or the Ph Polymerization project Functions or whatever. The second and more important motivation was it makes it easier to share these functions. When it’s just me it’s not hard to just have all folders that contain custom functions added to my filepath via Matlab startup.m.

I’m also experimenting with more subfolders to organize these functions. These are reasonably sepecific — for instance, qPCR, BLAST and DeepSequencing are separate subfolders instead of having Genome Tools. More subfolders has the advantage of making it easier to understand what the project contains, since files are now grouped and organized. I guess ideally I would just tag files rather than put them in different subfolders and Windows Explorer window (or Nautilus / GNOME files on my Linuxbox) could display a list of tags instead of a list of folders, but operating systems seem to be behind the curve on this one.

This repository is now a private repo on Git with my research collaborators added as collaborators.

Old vs. new projects

As I accumulate projects in my Projects directory, it would be nice to see just the active ones (sort by date does okay here). Some of these projects are all published, pretty much wrapped up, and I haven’t done further work on them in years. Some are projects I started and abandoned. And some of course are the active ones that I hop between all the time.

Posted in Research Planning | Comments Off

Protected: Overview of 140308 L4E3 data

Posted on March 16, 2014 by admin

Posted in Genomics | Tagged imaging, project 2, troubleshooting | Comments Off

Friday 03/14/14

Posted on March 14, 2014 by admin

9:45a – 9:00p

Meetings

meetings with G0 students, present lab research, discuss lab life

Computing resources

Troubleshooting Tuck
high flucutating memory despite little use
memory drops precipitously when Hao closes matlab
TSTORMdata drive is still flashes in use, despite no read-write activity
crazy / slow behavior on tuck seems to be an incompatability of tuck with the 32TB tower.

PH project

working on writeup of model details (see sharelatex project)

Posted in Summaries | Comments Off

Thursday 03/13/14

Posted on March 13, 2014 by admin

9:00a — 11:30p

communication

Help Jaeger student with Unsupervised_Dotfinding.m (see email)

meetings

planning meeting for G0 presentation
- jeff will embed movies
- got slide from Bryan
- got movies from Colenso
- put in example images of chromatin
- need to rehearse talk!
meeting for project2 — see notes

PH project

primary focus today.
see post

Project 2

running more splitdax (I think this is never going to end)
acquiring more data tonight
move process-on-arrival scripts to matlab-functions

Posted in Summaries | Tagged communication, meetings, Ph | Comments Off

Protected: PH project update

Posted on March 13, 2014 by admin

Posted in Chromatin | Tagged Ph | Comments Off

Protected: project 2 update

Posted on March 13, 2014 by admin

Posted in Genomics | Tagged images, project2, result | Comments Off

Alistair's Lab Notebook

Protected: project 2 03/19/14

Monday 03/17/14

Computer Maintenance

Data processing

Chromatin Project

Data analysis

OligoPaints Project

Protected: Overview of 140310 L4E4 data

Sunday 03/16/14

Goals

More data processing!

Issues with data

Project 2

Ph Project

Rethinking Project Organization

How do you organize your research projects?

Big data projects

Folder names — dates vs. labels

Work not related to a specific project

Master Toolbox Approach: matlab-functions

Old vs. new projects

Protected: Overview of 140308 L4E3 data

Friday 03/14/14

Meetings

Computing resources

PH project

Thursday 03/13/14

communication

meetings

PH project

Project 2

Protected: PH project update

Protected: project 2 update

Categories

Links

GitHub Projects

November 2025
M	T	W	T	F	S	S
« Aug
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Computer Maintenance

Data processing

Chromatin Project

Data analysis

OligoPaints Project

Goals

More data processing!

Issues with data

Project 2

Ph Project

How do you organize your research projects?

Big data projects

Folder names — dates vs. labels

Work not related to a specific project

Master Toolbox Approach: matlab-functions

Old vs. new projects

Meetings

Computing resources

PH project

communication

meetings

PH project

Project 2

Categories

Links

Tags

GitHub Projects