/ code & data

Software and data from the GenomeDataLab:

[ SOFTWARE ] Statistical genomics software:

HyperClust by David Mas-Ponte. https://github.com/davidmasp/hyperclust

CellLineMutSigs by Jurica Levatić. https://github.com/jlevatic/CellLineMutSigs

[ SOFTWARE ] Bioinformatics and machine learning tools:

to be released soon: "Pipeline6" (working name) by Daniel Naro

  • A modern, extensible and scalable pipeline for cancer genomic data processing based on the NextFlow enviroment

  • Includes the gamut of bioinformatics tools from the Hartwig Medical Foundation "Platinum pipeline" (Sage, GRIDSS, PURPLE...) and several additional tools:

    • Strelka2 (SNV, indel calling), Manta (SV calling), Paragraph (germline SV genotyping), Sequenza (CNA calling) GangSTR (repeat indel calling) etc.


BioPanPipe by Daniel Ortiz-Martinez. https://daormar.github.io/bio-panpipe/

    • A cancer genomics pipeline implementing several tools for variant calling and download from genomics databases.

FastRandomForest2 (beta) by Jordi Piqué Sellés. https://github.com/GenomeDataScience/FastRandomForest

    • A re-implementation of the Random Forest classifier (RF) for the Weka machine learning environment, bringing massive speed and memory use improvements.

[ DATASETS ] CRISPR genetic screening experiment data:

APOBEC3A conditional essentiality screens by Josep Biayna (CRISPR screening experiment) and Miguel Álvarez (data analysis).

    • Genetic screens on lung adenocarcinoma cell lines A549 (both the TP53 wild-type and the TP53-/- isogenic pair), and LXF289.

    • [ TO BE RELEASED SOON ] Genetic screens on H358 lung adenocarcinoma cell line by Josep Biayna (CRISPR screening experiment) and Miguel Álvarez (data analysis).

      • APOBEC3A overexpression or not, multiple time points

      • Contains gRNA counts & enrichment analyses via MAGECK

      • associated with upcoming manuscript by Álvarez et al.

[ DATASETS ] genomic analysis data:

Cancer cell line ranking by their similarity to TCGA tumors, via transcriptome and DNA methylome analysis. Lists a "golden set" of preferred cell lines, as well as the cell lines putatively misanotated to the incorrect tissue. by Marina Salvadores

“To invent, you need a good imagination and a pile of junk.” ― Thomas A. Edison