RNA-seq Bioinformatics

Introduction to bioinformatics for RNA sequence analysis

POSIT Setup

Posit setup for use in CRI 2024 workshop

This tutorial explains how Posit cloud RStudio was configured for the course. This exercise is not to be completed by the students but is provided as a reference for future course developers that wish to conduct their hands on exercises on Posit RStudio.

A Posit workspace was already created by the workshop organizers. We used Posit projects with 16GB RAM and 2 cores for the workshop with OS Ubuntu 20.04. Using these configurations, we created a template file that has all the raw data files uploaded along with the R packages needed for the workshop. From the student side, the intention is to make copies off this template so that they have an RStudio environment with the raw data files that has the packages pre-installed.

Upload raw data

Folders for uploading raw data were created using the RStudio terminal. Files were either uploaded from a local laptop/ storage1 location using the Upload feature in the bottom right pane of the RStudio window; or downloaded from genomedata.org using wget from the RStudio terminal.

mkdir data
mkdir outdir
mkdir outdir_single_cell_rna
mkdir package_installation

cd data
mkdir single_cell_rna
mkdir bulk_rna

Files in single_cell_rna

Posit requires all files to be zipped prior to uploading and automatically unzips the folder after the upload. After uploading the files, made a folder for the cellranger outputs, and moved the .h5 files there. Will also download inferCNV files using wget

#organize cellranger outputs
cd /cloud/project/data/single_cell_rna
mkdir cellranger_outputs
mv *.h5 cellranger_outputs

#download inferCNV reference files and organize all reference files
mkdir reference_files
mv m8.all.v2023.2.Mm.symbols.gmt reference_files
mv Tumor_Calls_per_Variants_for_CRI.tsv reference_files
cd reference_files
wget https://data.broadinstitute.org/Trinity/CTAT/cnv/mouse_gencode.GRCm38.p6.vM25.basic.annotation.by_gene_id.infercnv_positions
wget https://data.broadinstitute.org/Trinity/CTAT/cnv/mouse_gencode.GRCm38.p6.vM25.basic.annotation.by_gene_name.infercnv_positions

#organize vartrix files
cd /cloud/project/data/single_cell_rna
mkdir cancer_cell_id 
cd cancer_cell_id
wget http://genomedata.org/cri-workshop/somatic_variants_exome/mcb6c-exome-somatic.variants.annotated.clean.tsv

Files in bulk_rna

cd /cloud/project/data/bulk_rna
wget http://genomedata.org/rnaseq-tutorial/batch_correction/GSE48035_ILMN.Counts.SampleSubset.ProteinCodingGenes.tsv
wget http://genomedata.org/rnaseq-tutorial/results/cshl2022/rnaseq/ENSG_ID2Name.txt
wget http://genomedata.org/rnaseq-tutorial/results/cshl2022/rnaseq/gene_read_counts_table_all_final.tsv

Back-up files

Installing packages

All package installations are from CRAN or BioConductor or GitHub pages, except for CytoTRACE. That was downloaded to the package_installation folder and then installed using devtools.

#Download CytoTRACE tar.gz file
download.file("https://cytotrace.stanford.edu/CytoTRACE_0.3.3.tar.gz", destfile = "package_installation/CytoTRACE_0.3.3.tar.gz")

# Installing package installers
install.packages("devtools")
install.packages("BiocManager")

# Bulk RNA seq libraries
BiocManager::install("genefilter")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("data.table")
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("GO.db")
BiocManager::install("gage")
BiocManager::install("sva")
install.packages("gridExtra")
BiocManager::install("edgeR")
install.packages("UpSetR")
BiocManager::install("DESeq2")
install.packages("gtable")
BiocManager::install("apeglm")

# Intro to R packages
install.packages("tidyr")
install.packages("stringr")
install.packages("ggplot2")
install.packages("dplyr")
install.packages("tidyverse")
install.packages("MASS")
install.packages("ggpubr")

# Single-cell RNA seq libraries
BiocManager::install("sva") #need this for cytotrace
devtools::install_local("package_installation/CytoTRACE_0.3.3.tar.gz")
install.packages("Seurat")
install.packages("ggplot2")
install.packages("dplyr")
install.packages("Matrix")
install.packages("hdf5r")
install.packages("bench") # to mark time
install.packages("viridis")
install.packages("R.utils")
remotes::install_github("satijalab/seurat-wrappers")
BiocManager::install("celldex")
BiocManager::install("SingleR")
devtools::install_github("immunogenomics/presto")
BiocManager::install("EnhancedVolcano")
BiocManager::install("clusterProfiler")
BiocManager::install("org.Mm.eg.db")
install.packages("msigdbr")
BiocManager::install("scRepertoire")
BiocManager::install("BiocGenerics")
BiocManager::install("DelayedArray")
BiocManager::install("DelayedMatrixStats")
BiocManager::install("limma")
BiocManager::install("lme4")
BiocManager::install("S4Vectors")
BiocManager::install("SingleCellExperiment")
BiocManager::install("SummarizedExperiment")
BiocManager::install("batchelor")
BiocManager::install("HDF5Array")
BiocManager::install("terra")
BiocManager::install("ggrastr")
devtools::install_github("cole-trapnell-lab/monocle3")
install.packages("beanplot")
install.packages("mixtools")
install.packages("pheatmap")
install.packages("zoo")
install.packages("squash")
install.packages("showtext")
BiocManager::install("biomaRt")
BiocManager::install("scran")
devtools::install_github("diazlab/CONICS/CONICSmat", dep = FALSE)
install.packages("gprofiler2")
devtools::install_github(repo = "ncborcherding/scRepertoire")