2017年7月5日 星期三

Fastx-toolkit Installation

fastx-toolkit Installation


Install libgtextutils (necessary package for fastx-tookit)

$ cul -LOk https://github.com/agordon/libgtextutils/releases/download/0.7/libgtextutils-0.7.tar.gz
$ tar -zxf libgtextutils-0.7.tar.gz
$ cd libgtextutils-0.7
$ ./configure prefix=/usr/bhhou/packages
$ make
$ make install
# Tell pkg-config to look for libraries in /usr/bhhou/packages/lib.
$ echo "export PKG_CONFIG_PATH=\$PKG_CONFIG_PATH:/usr/bhhou/packages/lib" >>~/.bash_prefile

Install fastx-toolkit

$ cul -LOk https://github.com/agordon/fastx_toolkit/releases/download/0.0.14/fastx_toolkit-0.0.14.tar.bz2
$ tar -jxf fastx_toolkit-0.0.12.tar.bz2
$ cd fastx_toolkit-0.0.12
$ ./configure prefix=/usr/bhhou/SOFTWARE
$ make
$ make install
# Add /usr/bhhou/packages/lib to PATH
$ echo "export PKG_CONFIG_PATH=\$PATH:/usr/bhhou/SOFTWARE/bin" >>~/.bash_prefile

Filter non-coding RNA ,multiple hits and size-selection using bowtie and shell script for sRNA-Seq

Software requirements:

1. Bowtie
2. Fastx-toolskit
3. Linux only

Building bowtie indexes of references for mapping

index_path=/usr1/bhhou/Reference
cd $index_path
genome_fasta=Genome.fasta
cDNA_fasta=cDNA.fasta
tRNA_fasta=tRNA.fasta
rRNA_fasta=rRNA.fasta
snRNA_fasta=snRNA.fasta
snoRNA_fasta=snoRNA.fasta
ncRNA_fasta=ncRNA.fasta
cat $tRNA_fasta $rRNA_fasta $snRNA_fasta $snoRNA_fasta >ncRNA.fasta
bowtie-build $genome_fasta ${genome_fasta%%.fasta}
bowtie-build $cDNA_fasta ${cDNA_fasta%%.fasta}
bowtie-build $ncRNA_fasta ${ncRNA_fasta%%.fasta}

genome_index=$index_path/Genome
cDNA_index=$index_path/cDNA
ncRNA_index=$index_path/ncRNA

Mapping

1. Filtering t/r/sn/snoRNAs,
2. Refraining reads having more than 20 genomic locus
3. Remain mapping reads of genome and cDNA
4. Allowing 18 to 26nt reads
cd /usr1/bhhou/sRNA
output=clean_collasped
mkdir $output
for read in /usr1/bhhou/sRNA/collasped/*_trimmed.fasta
do
read_id=${read%%_trimmed*}
read_id=${read_id##*/}
echo $read_id
fastx_collapser -i $read -o ${read_id}_collapsed.fasta
bowtie -v 0 -f $ncRNA_index ${read_id}_collapsed.fasta /dev/null --un ${read_id}_ncRNA_clean.fasta
bowtie -v 0 -f $genome_index ${read_id}_ncRNA_clean.fasta /dev/null --al ${read_id}_genome_map.fasta --un ${read_id}_genome_unmap.fasta
bowtie -v 0 -f -m 20 $genome_index ${read_id}_genome_map.fasta /dev/null --al ${read_id}_genome_clean.fasta
bowtie -v 0 -f $cDNA_index ${read_id}_genome_unmap.fasta /dev/null --al ${read_id}_cDNA_clean.fasta
cat ${read_id}_genome_clean.fasta ${read_id}_cDNA_clean.fasta > ${output}/${read_id}_collapsed_clean.fasta
cat ${output}/${read_id}_collapsed_clean.fasta | fasta_formatter -t | awk -F "\t" '{ if(length($2) >=18 && length($2)<=26){ print ">"$1"\n"$2} }' >${output}/${read_id}_collapsed_clean_size_selected.fasta
rm ${read_id}_*.fasta
done

DEseq2 usage