Filtering interval data

Allowing small gaps.

Some very brief changes in chromatin type may not be real: Need to allow gaps – small misclasifications

rrrrrrrrrrr kk rrrrrrrrrrrrrrrrr  bbb rrrrrrrrrrr  

should still be scored as a long red region. The kk/bbb might be background, often these short gaps don’t span a full gene / occurs in the middle of a gene (depending on size of course of “small” gap).

bbbbbbbbbbbbbbbbb kkkkkkkkk bbbbbbbbbbbbbb  

suggests that PcGs dipped below threshold but that region is primarily blue.

Minimum enrichment despite gaps

Also need to insist a certain fraction is the indicated color, even if the gaps are below threshold. Some regions change back and forth between types over a very short distance, and the gap allowance produces maximum regions which have low fraction of the desired chromatin type. if threshold is 5 for example:

 r bbbbb rr bbbbb r bbbbb rrr bbbbb rr bbb rr  

This region is mostly blue, even though it is a long string of reds separated by gaps no longer than the max gap size allowed, which can lead to false clasification as a red region.

Quality / Confidence scores?

  • However, maybe these examples aren’t strong representatives for their relative classes and we should be careful using regions with gaps?
  • can we get a confidence score based on the actual chromatin modification intensities? Would have been nice if the original paper attached weights to all the color classifications: this region is 90% confident blue. This other blue region with a little noise added to the PcG channel would become black, so it is only 55% confidence of blue.
  • How about actively transcribed vs. marked for transcription (e.g. acytelated chromatin). rather than red and yellow.

Gene models

Could take gene models / logic into account

This entry was posted in Chromatin. Bookmark the permalink.