Team Lead: Jared Andrews
Summary: This project is centered around multi-modal data integration to identify putative core regulatory circuitry (CRCs) on a sample-by-sample basis. In short, given a BED file of ATAC peaks (or H3K27ac peaks), a BED file of SE calls from ROSE, and an expression matrix of counts/TPMs for a given sample, identify putative CRCs based on bi-directional network analysis.
Current tools to perform these steps (coltron, CRCmapper, CRC) are no longer maintained, difficult to install/run, and/or out of date (python 2, old annotations/motif databases, etc). This project will create a new, simplified python package that incorporates these methods to be used on semi-processed datasets - matched ATAC-seq (optional), RNA-seq (optional), and H3K27ac SE calls for a given sample.
In addition, it'd be useful to building reports/dashboards (dash, Shiny, etc) to effectively visualize the resulting CRCs and their downstream gene targets, potentially with GO/pathway enrichment incorporated. Lastly, comparing CRCs between samples/groups in some way would help reveal variable CRC members, true "core" members, and relate CRC membership to specific phenotypic states.
A solution to this problem would allow us to include putative CRC identification into our standard workflow for sample analysis. It would also kickstart the field with a much-needed update to these methodologies and provide a mechanism to better integrate these multi-modal datasets via cohesive, holistic analyses.
Skills Needed: Python, dash, network analysis, interval operations in python (pyranges, etc), motif analysis, python package development, and basic epigenomics knowledge.