Gene Set Enrichment Analysis (GSEA) tutorial

1. Download app

2. Prepare Data

a. Data matrix

requirement: — .gct; tab seperated — 1st row: #1.1 or #number — 2nd row: two number: gene number and sample number — 1st column show the gene names — 2nd column show the description; fill NA if it is empty. — from 3rd column, list expression matrix in each column for each samples — the expression matrix should be normalized and log transformed.

sample:

b. Phenotype labels

— .cls format; separate with tab — there are 3 numbers in 1st row: sample amount amount of groups 1 (fixed number) — 2nd row: initial with # and separate with tab — group name for each of samples, corresponding to the columns in data matrix

50 2 1 #MUT WT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT

c. Gene set

the file is .gmt format

usually use the Molecular Signatures Database (MSigDB) offered from the

or customized gene set with

3. Load data

Number of permutations：default 1000; the larger, the more precise but consume more RAM

Collapse dataset to gene symbols：choose ‘No’ if the both of expression matrix and gene set are using the gene symbol

Permutation type：choose phenotype if sample number of each group > 7, othewise choose gene set

Plot graphs for the top sets of each phenotype: numbers of plots showed in resultes; usually use larger number if you have a large gene set

RUN