NGS memo

2018年9月30日星期日

DEseq2 usage

DEseq2 installation

source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")

2018年9月29日星期六

Window10 install Jbrowse on subsystem for linux (WSL)

1. install Ubuntu

1.a
Select the Start button and Open Microsoft Store
Search "linux" on Microsoft Store
Select and install Ubuntu

2018年8月7日星期二

One-line get sorted Bam file from bowtie

One line get sorted Bam file

$ read=collapsed_read.fasta       # collapsed fasta that fastx_collapser creates
$ genome_index=Genome             # basename of bowtie's .ebwt file
$ cat $read | fastx_uncollapser | bowtie -S -f $genome_index - | samtools view -u -h -F 4 - | samtools sort - -o aln.sorted.bam

"-" It mean <STDIN> that is <STDOUT> of previous command/ program.

fasta_uncollapser [-i INFILE] [-o OUTFILE]

-i [INFILE]      FASTA input file. default is STDIN.

-o [OUTFILE]     FASTA output file. default is STDOUT.

bowtie [options] <ebwt> <s> <hit>

<s>              The query input file.

                 If - is specified, Bowtie gets the reads from the "standard in" filehandle.

-f               The query input file (<s>) is FASTA files

-S/--sam         Print alignments in SAM format.

samtools view [option]

-h               Include the header in the output.

-u               Output uncompressed BAM.

                 This option saves time spent on compression/decompression

                 and is thus preferred when the output is piped to another

                 samtools command.

-F [INT]         Do not output alignments with any bits set in INT present

                 in the FLAG field. # 4 is unmapped reads

samtools sort [-o out.sam|out.bam|out.cram] [in.sam|in.bam|in.cram]

-m [INT]         Approximately the maximum required memory per thread,

                 specified either in bytes or with a K, M, or G suffix.

-@ [INT]         Set number of sorting and compression threads.

                 By default, operation is single-threaded

Finally, Create index file of Bam that some software requires

$ samtools index aln.sorted.bam aln.sorted.bam.bai

Get sequences of rRNA, tRNA, snRNA and snoRNA from Rfam database

Create a folder for storing Rfam and downloading index file, family.txt.

# Create and enter Rfam folder
$ cd $HOME
$ mkdir Rfam_ncRNA
$ cd Rfam_ncRNA
# Download Annotation of Rfam fasta file for selecting ncRNA
$ wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/database_files/family.txt.gz
$ gzip -d family.txt.gz

Create Rfam download link list

$ curl -l ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ | awk '{print "ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/"$0}' >Rfam_link.txt
# curl -l will return a list of filenames on FTP

Grep ncRNA download link list from Rfam download link list.

$ cat family.txt | awk -F "\t" '{ if($19~/snRNA/ && $19!~/snoRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_snRNA_link.txt
$ cat family.txt | awk -F "\t" '{ if($19~/snoRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_snoRNA_link.txt
$ cat family.txt | awk -F "\t" '{ if($19~/tRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_tRNA_link.txt
$ cat family.txt | awk -F "\t" '{ if($19~/rRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_rRNA_link.txt

# $ Filtered condition: awk -F "\t" '{print $1,$19}' family.txt | head
# ...
# RF00003  Gene; snRNA; splicing;   #<--- snRNA match "snRNA" and don't match "snoRNA"
# ...
# RF00012  Gene; snRNA; snoRNA; CD-box;  #<--- snoRNA match "snRNA" and "snoRNA"

Download multiple files by list file

$ mkdir snRNA
$ mkdir snoRNA
$ mkdir tRNA
$ mkdir rRNA
$ wget -P snRNA -i Rfam_snRNA_link.txt 2>snRNA.log &
$ wget -P snoRNA -i Rfam_snoRNA_link.txt 2>snoRNA.log &
$ wget -P tRNA -i Rfam_tRNA_link.txt 2>tRNA.log &
$ wget -P rRNA -i Rfam_rRNA_link.txt 2>rRNA.log &

# -P [dir]       File was stored at this directory
# -i [file]      File is a list of download links.
# 2>file.log     STDERR was written in the log file.
# &              add "&" to tail of command, the command will run in the background

Wait for download finish!

All "wget" will be disappeared below "COMMAND".

$ top -u $USER 
   # Rfam_snoRNA have many files, you will wait for about 1~2hr! 
   # show numbers of link pre file
   # wc -l Rfam_*_link.txt 
   #   15 Rfam_rRNA_link.txt
   #  753 Rfam_snoRNA_link.txt
   #   18 Rfam_snRNA_link.txt
   #    2 Rfam_tRNA_link.txt
   #  788 total

Merge multiple fasta.gz files

$ zcat snRNA/*.gz >Rfam_snRNA.fasta
$ zcat snoRNA/*.gz >Rfam_snoRNA.fasta
$ zcat rRNA/*.gz >Rfam_rRNA.fasta
$ zcat tRNA/*.gz >Rfam_tRNA.fasta

2017年7月5日星期三

Fastx-toolkit Installation

fastx-toolkit Installation

Install libgtextutils (necessary package for fastx-tookit)

$ cul -LOk https://github.com/agordon/libgtextutils/releases/download/0.7/libgtextutils-0.7.tar.gz
$ tar -zxf libgtextutils-0.7.tar.gz
$ cd libgtextutils-0.7
$ ./configure prefix=/usr/bhhou/packages
$ make
$ make install
# Tell pkg-config to look for libraries in /usr/bhhou/packages/lib.
$ echo "export PKG_CONFIG_PATH=\$PKG_CONFIG_PATH:/usr/bhhou/packages/lib" >>~/.bash_prefile

Install fastx-toolkit

$ cul -LOk https://github.com/agordon/fastx_toolkit/releases/download/0.0.14/fastx_toolkit-0.0.14.tar.bz2
$ tar -jxf fastx_toolkit-0.0.12.tar.bz2
$ cd fastx_toolkit-0.0.12
$ ./configure prefix=/usr/bhhou/SOFTWARE
$ make
$ make install
# Add /usr/bhhou/packages/lib to PATH
$ echo "export PKG_CONFIG_PATH=\$PATH:/usr/bhhou/SOFTWARE/bin" >>~/.bash_prefile

Filter non-coding RNA ,multiple hits and size-selection using bowtie and shell script for sRNA-Seq

Software requirements:

1. Bowtie

2. Fastx-toolskit

3. Linux only

Building bowtie indexes of references for mapping

index_path=/usr1/bhhou/Reference
cd $index_path
genome_fasta=Genome.fasta
cDNA_fasta=cDNA.fasta
tRNA_fasta=tRNA.fasta
rRNA_fasta=rRNA.fasta
snRNA_fasta=snRNA.fasta
snoRNA_fasta=snoRNA.fasta
ncRNA_fasta=ncRNA.fasta
cat $tRNA_fasta $rRNA_fasta $snRNA_fasta $snoRNA_fasta >ncRNA.fasta
bowtie-build $genome_fasta ${genome_fasta%%.fasta}
bowtie-build $cDNA_fasta ${cDNA_fasta%%.fasta}
bowtie-build $ncRNA_fasta ${ncRNA_fasta%%.fasta}

genome_index=$index_path/Genome
cDNA_index=$index_path/cDNA
ncRNA_index=$index_path/ncRNA

Mapping

1. Filtering t/r/sn/snoRNAs,
2. Refraining reads having more than 20 genomic locus
3. Remain mapping reads of genome and cDNA
4. Allowing 18 to 26nt reads

cd /usr1/bhhou/sRNA
output=clean_collasped
mkdir $output
for read in /usr1/bhhou/sRNA/collasped/*_trimmed.fasta
do
read_id=${read%%_trimmed*}
read_id=${read_id##*/}
echo $read_id
fastx_collapser -i $read -o ${read_id}_collapsed.fasta
bowtie -v 0 -f $ncRNA_index ${read_id}_collapsed.fasta /dev/null --un ${read_id}_ncRNA_clean.fasta
bowtie -v 0 -f $genome_index ${read_id}_ncRNA_clean.fasta /dev/null --al ${read_id}_genome_map.fasta --un ${read_id}_genome_unmap.fasta
bowtie -v 0 -f -m 20 $genome_index ${read_id}_genome_map.fasta /dev/null --al ${read_id}_genome_clean.fasta
bowtie -v 0 -f $cDNA_index ${read_id}_genome_unmap.fasta /dev/null --al ${read_id}_cDNA_clean.fasta
cat ${read_id}_genome_clean.fasta ${read_id}_cDNA_clean.fasta > ${output}/${read_id}_collapsed_clean.fasta
cat ${output}/${read_id}_collapsed_clean.fasta | fasta_formatter -t | awk -F "\t" '{ if(length($2) >=18 && length($2)<=26){ print ">"$1"\n"$2} }' >${output}/${read_id}_collapsed_clean_size_selected.fasta
rm ${read_id}_*.fasta
done

2017年6月25日星期日

DNA to RNA, Reverse Complement using perl one-liner

2018年9月30日 星期日

2018年9月29日 星期六

1. install Ubuntu

2018年8月7日 星期二

One line get sorted Bam file

fasta_uncollapser [-i INFILE] [-o OUTFILE]

bowtie [options] <ebwt> <s> <hit>

samtools view [option]

samtools sort [-o out.sam|out.bam|out.cram] [in.sam|in.bam|in.cram]

Finally, Create index file of Bam that some software requires

Create a folder for storing Rfam and downloading index file, family.txt.

Create Rfam download link list

Grep ncRNA download link list from Rfam download link list.

Download multiple files by list file

Wait for download finish!

Merge multiple fasta.gz files

2017年7月5日 星期三

fastx-toolkit Installation

Install libgtextutils (necessary package for fastx-tookit)

Install fastx-toolkit

Software requirements:

Building bowtie indexes of references for mapping

Mapping

2017年6月25日 星期日

2018年9月30日星期日

2018年9月29日星期六

2018年8月7日星期二

2017年7月5日星期三

2017年6月25日星期日