/ code & data
Software and data from the GenomeDataLab:
[ SOFTWARE ] Statistical genomics software:
HyperClust by David Mas-Ponte. https://github.com/davidmasp/hyperclust
A statistical framework to detect clustered mutations in genomes, while accounting for mutation rate heterogenety and for estimated timing of the mutations.
associated with the publication Mas-Ponte & Supek (2020) Nature Genetics "DNA mismatch repair promotes APOBEC3-mediated diffuse hypermutation in human cancers"
CellLineMutSigs by Jurica Levatić. https://github.com/jlevatic/CellLineMutSigs
extracting mutational signatures in cancer cell line genomes, and association of mutational signatures with drug activity.
associated with the publication Levatić, Salvadores, Fuster & Supek (2022) Nature Communications [in press] "Mutational signatures are markers of drug sensitivity of cancer cells"
[ SOFTWARE ] Bioinformatics and machine learning tools:
to be released soon: "Pipeline6" (working name) by Daniel Naro
A modern, extensible and scalable pipeline for cancer genomic data processing based on the NextFlow enviroment
Includes the gamut of bioinformatics tools from the Hartwig Medical Foundation "Platinum pipeline" (Sage, GRIDSS, PURPLE...) and several additional tools:
Strelka2 (SNV, indel calling), Manta (SV calling), Paragraph (germline SV genotyping), Sequenza (CNA calling) GangSTR (repeat indel calling) etc.
BioPanPipe by Daniel Ortiz-Martinez. https://daormar.github.io/bio-panpipe/
A cancer genomics pipeline implementing several tools for variant calling and download from genomics databases.
FastRandomForest2 (beta) by Jordi Piqué Sellés. https://github.com/GenomeDataScience/FastRandomForest
A re-implementation of the Random Forest classifier (RF) for the Weka machine learning environment, bringing massive speed and memory use improvements.
[ DATASETS ] CRISPR genetic screening experiment data:
APOBEC3A conditional essentiality screens by Josep Biayna (CRISPR screening experiment) and Miguel Álvarez (data analysis).
Genetic screens on lung adenocarcinoma cell lines A549 (both the TP53 wild-type and the TP53-/- isogenic pair), and LXF289.
APOBEC3A overexpression or not, multiple time points
Contains gRNA counts & enrichment analyses via MAGECK
associated with the publication Biayna, [...] Supek and Stracker (2021) PLOS Biology. Data is in supplementary material of the open-access PLOS publication >> LINK
[ TO BE RELEASED SOON ] Genetic screens on H358 lung adenocarcinoma cell line by Josep Biayna (CRISPR screening experiment) and Miguel Álvarez (data analysis).
APOBEC3A overexpression or not, multiple time points
Contains gRNA counts & enrichment analyses via MAGECK
associated with upcoming manuscript by Álvarez et al.
[ DATASETS ] genomic analysis data:
Cancer cell line ranking by their similarity to TCGA tumors, via transcriptome and DNA methylome analysis. Lists a "golden set" of preferred cell lines, as well as the cell lines putatively misanotated to the incorrect tissue. by Marina Salvadores
Associated to publication by Salvadores et al. Science Advances. "Matching cell lines with cancer type and subtype of origin via mutational, epigenomic, and transcriptomic patterns".
Dataset can be downloaded via supplementary material of the open-access Science publication >> LINK.
“To invent, you need a good imagination and a pile of junk.” ― Thomas A. Edison