2017年7月5日 星期三

Filter non-coding RNA ,multiple hits and size-selection using bowtie and shell script for sRNA-Seq

Software requirements:

1. Bowtie
2. Fastx-toolskit
3. Linux only

Building bowtie indexes of references for mapping

index_path=/usr1/bhhou/Reference
cd $index_path
genome_fasta=Genome.fasta
cDNA_fasta=cDNA.fasta
tRNA_fasta=tRNA.fasta
rRNA_fasta=rRNA.fasta
snRNA_fasta=snRNA.fasta
snoRNA_fasta=snoRNA.fasta
ncRNA_fasta=ncRNA.fasta
cat $tRNA_fasta $rRNA_fasta $snRNA_fasta $snoRNA_fasta >ncRNA.fasta
bowtie-build $genome_fasta ${genome_fasta%%.fasta}
bowtie-build $cDNA_fasta ${cDNA_fasta%%.fasta}
bowtie-build $ncRNA_fasta ${ncRNA_fasta%%.fasta}

genome_index=$index_path/Genome
cDNA_index=$index_path/cDNA
ncRNA_index=$index_path/ncRNA

Mapping

1. Filtering t/r/sn/snoRNAs,
2. Refraining reads having more than 20 genomic locus
3. Remain mapping reads of genome and cDNA
4. Allowing 18 to 26nt reads
cd /usr1/bhhou/sRNA
output=clean_collasped
mkdir $output
for read in /usr1/bhhou/sRNA/collasped/*_trimmed.fasta
do
read_id=${read%%_trimmed*}
read_id=${read_id##*/}
echo $read_id
fastx_collapser -i $read -o ${read_id}_collapsed.fasta
bowtie -v 0 -f $ncRNA_index ${read_id}_collapsed.fasta /dev/null --un ${read_id}_ncRNA_clean.fasta
bowtie -v 0 -f $genome_index ${read_id}_ncRNA_clean.fasta /dev/null --al ${read_id}_genome_map.fasta --un ${read_id}_genome_unmap.fasta
bowtie -v 0 -f -m 20 $genome_index ${read_id}_genome_map.fasta /dev/null --al ${read_id}_genome_clean.fasta
bowtie -v 0 -f $cDNA_index ${read_id}_genome_unmap.fasta /dev/null --al ${read_id}_cDNA_clean.fasta
cat ${read_id}_genome_clean.fasta ${read_id}_cDNA_clean.fasta > ${output}/${read_id}_collapsed_clean.fasta
cat ${output}/${read_id}_collapsed_clean.fasta | fasta_formatter -t | awk -F "\t" '{ if(length($2) >=18 && length($2)<=26){ print ">"$1"\n"$2} }' >${output}/${read_id}_collapsed_clean_size_selected.fasta
rm ${read_id}_*.fasta
done

沒有留言:

張貼留言

DEseq2 usage