How to setup your own server for genomics analyzing

  1. Create a free AWS sever,Ubuntu Server 16.04 LTS (HVM), t2.micro with 1 vCPU and 1 GiB memory, 30 GB disk space.
  2. Use SSH tool to connect to the server. I use Xshell and Xftp.
  3. Set up the software environment. I have attached my commands below.
  4. For more information, please check. Also, I’d like to thankĀ Kate Hertweck, Erin Becker fromĀ Data Carpentry for the teaching and helping.

#By Lin Su, 08-09-2017



#Prework
 !copy the FastQC and Trimmomatic-0.32 folder from datacarpentry sever to your new sever, use FTP
 If you do not have the access to datacarpentry sever, write an email to me mail@su2lin.com and I will send to you.

#Update your sever system and install softs
 sudo apt-get update
 sudo apt-get dist-upgrade
 ##sudo apt-get install fastqc
 ##sudo apt-get install trimmomatic
 sudo apt-get install bwa
 sudo apt-get install samtools
 sudo apt-get install bcftools

#Run FastQC
 #upload FastQC to ~
 chmod -R 755 ~/FastQC
 ##fastqc *.fastq
 ~/FastQC/fastqc ~/data/trimmed_fastq/*fastq

#Clean reads using Trimmomatic
 #upload Trimmomatic-0.32 to ~
 java -jar ~/Trimmomatic-0.32/trimmomatic-0.32.jar SE \
 ~/data/trimmed_fastq/SRR097977.fastq \
 ~/data/trimmed_fastq/SRR097977.fastq_trim.fastq SLIDINGWINDOW:4:20 MINLEN:20

#Setup the directories
 mkdir -p ~/results/sai ~/results/sam ~/results/bam ~/results/bcf ~/results/vcf

#Index the reference genome
 bwa index data/ref_genome/ecoli_rel606.fasta
 ##ls -alh ~/data/trimmed_fastq/SRR097977.fastq_trim.fastq

#Align reads to reference genome
 bwa aln data/ref_genome/ecoli_rel606.fasta \
 data/trimmed_fastq/SRR097977.fastq_trim.fastq > \
 results/sai/SRR097977.aligned.sai

#Convert the format of the alignment to SAM/BAM
 bwa samse data/ref_genome/ecoli_rel606.fasta \
 results/sai/SRR097977.aligned.sai \
 data/trimmed_fastq/SRR097977.fastq_trim.fastq > \
 results/sam/SRR097977.aligned.sam

#sudo apt install samtools, might need to re-install the samtools
 samtools view -S -b results/sam/SRR097977.aligned.sam > results/bam/SRR097977.aligned.bam

#Sort BAM file by coordinates
 ##samtools sort results/bam/SRR097977.aligned.bam results/bam/SRR097977.aligned.sorted
 #got sort be killed, might because the system memory is small, so use -m to set the memory use
 samtools sort -m 500M results/bam/SRR097977.aligned.bam results/bam/SRR097977.aligned.sorted

#Calculate the read coverage of positions in the genome
 samtools mpileup -g -f data/ref_genome/ecoli_rel606.fasta results/bam/SRR097977.aligned.sorted.bam > results/bcf/SRR097977_raw.bcf

#Detect the single nucleotide polymorphisms (SNPs)
 bcftools view -bvcg results/bcf/SRR097977_raw.bcf > results/bcf/SRR097977_variants.bcf

#Filter and report the SNP variants in VCF (variant calling format)
 #might has a missing vcfutils.pl, usually fixed at line 36
 bcftools view results/bcf/SRR097977_variants.bcf \ | /usr/share/samtools/vcfutils.pl varFilter - > results/vcf/SRR097977_final_variants.vcf

#Assess the alignment (visualization) - optional step
 samtools index results/bam/SRR097977.aligned.sorted.bam

#download these four results
 #results/bam/SRR097977.aligned.sorted.bam
 #results/bam/SRR097977.aligned.sorted.bam.bai
 #results/vcf/SRR097977_final_variants.vcf
 #data/ref_genome/ecoli_rel606.fasta

Please feel free to write an email or comment for any question or suggestions.

Leave a Reply

Your email address will not be published. Required fields are marked *