Synthesize.bio logo

[TEMPLATE] Comparing two groups with pydeseq2

IMPORTANT

This is a preview with example data.

Scroll to explore it. To create a notebook from this template and customize the data and analysis, click the button below.
Create a notebook from this template

Introduction

This notebook performs differential expression analysis between two sample groups using DESeq2 in Python.

You can run by selecting new data

  • Edit data at the top right
  • This will take you to sample selection
  • Please select two groups of samples - they will be directly compared!
  • Note that this example compares "responders" to "non-responders" but you can change these sample group names to fit your design

Data Summary

This is a summary of the samples within our responders and non-responders sample groups.

Not run yet

PCA plot

Here we plot the scores for each sample for PC1 and PC2 using the top 3000 highly variable genes.

Not run yet

DESeq2 Results

Here is a results table showing differential expression analysis between responders and non-responders. The table includes base mean expression, log2 fold changes, and adjusted p-values for 60,684 genes.

Not run yet

Volcano Plot

Volcano plot showing the distribution of differentially expressed genes between responders and non-responders. Each point represents a gene, with log2 fold change on the x-axis and statistical significance (-log10 p-value) on the y-axis.

Not run yet

P-value distribution histogram

The p-value histogram serves as a key diagnostic visualization. For the most meaningful interpretation, it's recommended to filter out low-count genes from this plot, as these genes with minimal expression can create artificial peaks in the histogram distribution. This is why we typically only include genes with mean normalized counts above a threshold (e.g., > 1).

Not run yet

Centered Gene Expression Heatmap

To create a more informative visualization than raw expression values, we can plot how each gene's expression deviates from its mean across all samples by centering the data (subtracting each gene's average).
The resulting heatmap reveals sample-specific patterns of up- and down-regulation that might be obscured in absolute expression data.
Here we plot log2-transformed transcript per million (TPM) values.

Not run yet

Gene set enrichment results

Gene set enrichment analysis results showing significantly enriched biological pathways and processes between responders and non-responders. The analysis identifies which functional gene sets are over-represented in the differentially expressed genes.

Not run yet

This is the end of this analysis. You can add more code chunks here or go to the top and "Edit Data" or "copy notebook" to create a new analysis.

Not run yet