You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nextflow DSL2 pipeline for allele-specific RNA-seq analysis using STAR, SNPsplit, and featureCounts.
Ultra-minimalist -- 2 files only (main.nf + nextflow.config). Designed for solo bioinformaticians.
Pipeline Overview
SRA_DL["SRA_DOWNLOAD"]
GEO["GSE / GSM"] --> RESOLVE["RESOLVE_GEO"] --> SRA_DL
FQ_DIR["FASTQ directory"]
CSV["CSV samplesheet"]
end
SRA_DL --> FASTQS(("FASTQs"))
FQ_DIR --> FASTQS
CSV --> FASTQS
DOWNLOAD["DOWNLOAD_REFERENCES"] --> GPREP["SNPSPLIT_GENOME_PREP"]
GPREP --> IDX1["STAR_INDEX (N-masked)"]
DOWNLOAD --> IDX2["STAR_INDEX (reference)"]
IDX1 --> A1["STAR_ALIGN (N-masked)"]
IDX2 --> A2["STAR_ALIGN (reference)"]
FASTQS --> A1
FASTQS --> A2
A1 --> S1["SORT_DEDUP"] --> SNP["SNPSPLIT"]
A2 --> S2["SORT_DEDUP "] --> FC3["FEATURECOUNTS (reference)"]
SNP -->|"genome1"| FC1["FEATURECOUNTS (genome1)"]
SNP -->|"genome2"| FC2["FEATURECOUNTS (genome2)"]
FC1 --> O1["genome1 counts"]
FC2 --> O2["genome2 counts"]
FC3 --> O3["reference counts"]
FC1 & FC2 & FC3 --> MQC["MULTIQC"] --> O4["MultiQC report"]
classDef input fill:#0570b0,stroke:#0570b0,color:#fff
classDef process fill:#238b45,stroke:#238b45,color:#fff
classDef key fill:#cb181d,stroke:#cb181d,color:#fff,stroke-width:3px
classDef output fill:#6a51a3,stroke:#6a51a3,color:#fff
classDef data fill:#e6550d,stroke:#e6550d,color:#fff
classDef mqc fill:#41ab5d,stroke:#41ab5d,color:#fff
class SRA,GEO,FQ_DIR,CSV input
class SRA_DL,RESOLVE,DOWNLOAD,GPREP,IDX1,IDX2,A1,S1,A2,S2,FC1,FC2, FC3 process
class SNP key
class O1,O2,O3,O4 output
class FASTQS data
class MQC mqc
" dir="auto">
classDef input fill:#0570b0,stroke:#0570b0,color:#fff classDef process fill:#238b45,stroke:#238b45,color:#fff classDef key fill:#cb181d,stroke:#cb181d,color:#fff,stroke-width:3px classDef output fill:#6a51a3,stroke:#6a51a3,color:#fff classDef data fill:#e6550d,stroke:#e6550d,color:#fff classDef mqc fill:#41ab5d,stroke:#41ab5d,color:#fff
class SRA,GEO,FQ_DIR,CSV input class SRA_DL,RESOLVE,DOWNLOAD,GPREP,IDX1,IDX2,A1,S1,A2,S2,FC1,FC2, FC3 process class SNP key class O1,O2,O3,O4 output class FASTQS data class MQC mqc
Loading
Quick Start
# From a FASTQ directory (auto-detects PE/SE) nextflow run IPNP-BIPN/SPLIT --fastq_dir /path/to/fastqs --outdir results -resume
# From SRA accessions nextflow run IPNP-BIPN/SPLIT --sra_ids "SRR1234567,SRR1234568" --outdir results -resume
# From a GEO dataset (auto-resolves GSE - SRR) nextflow run IPNP-BIPN/SPLIT --sra_ids GSE80810 --outdir results -resume
# From a samplesheet CSV nextflow run IPNP-BIPN/SPLIT --input samplesheet.csv --outdir results -resume
Optional:
sra-toolsbgzip (htslib/tabix) -- for SRA download
Nextflow >= 23.04
How it works
References are automatically downloaded from Ensembl (GRCm39 genome + GTF) and MGP (SNP VCF). Cached via storeDir -- only downloaded once.
SNPsplit genome preparation creates an N-masked genome where strain-discriminating SNP positions are replaced by N. This prevents alignment bias toward the reference allele.
Two parallel alignment tracks:
N-masked track: alignments used for allele-specific analysis (SNPsplit)
Reference track: standard alignments for total gene expression
SNPsplit assigns each read from the N-mask track to genome1 (strain1), genome2 (strain2), or unassigned based on informative SNP positions.
featureCounts produces three count tables using gene_name attribute for human-readable gene symbols (e.g., Gapdh instead of ENSMUSG00000057666).
Resume & Cache
The pipeline natively leverages Nextflow's cache (-resume). Already completed steps are automatically skipped. References (genome, GTF, VCF, STAR indexes, N-masked genome) are persisted via storeDir and reused across runs.
# Re-run after a crash -- picks up exactly where it left off nextflow run main.nf --fastq_dir fastqs --outdir results -resume
License
MIT
About
Allele-specific RNA-seq pipeline: STAR + SNPsplit + featureCounts (Nextflow DSL2)