NGS memo: 8月 2018

2018年8月7日星期二

One-line get sorted Bam file from bowtie

One line get sorted Bam file

$ read=collapsed_read.fasta       # collapsed fasta that fastx_collapser creates
$ genome_index=Genome             # basename of bowtie's .ebwt file
$ cat $read | fastx_uncollapser | bowtie -S -f $genome_index - | samtools view -u -h -F 4 - | samtools sort - -o aln.sorted.bam

"-" It mean <STDIN> that is <STDOUT> of previous command/ program.

fasta_uncollapser [-i INFILE] [-o OUTFILE]

-i [INFILE]      FASTA input file. default is STDIN.

-o [OUTFILE]     FASTA output file. default is STDOUT.

bowtie [options] <ebwt> <s> <hit>

<s>              The query input file.

                 If - is specified, Bowtie gets the reads from the "standard in" filehandle.

-f               The query input file (<s>) is FASTA files

-S/--sam         Print alignments in SAM format.

samtools view [option]

-h               Include the header in the output.

-u               Output uncompressed BAM.

                 This option saves time spent on compression/decompression

                 and is thus preferred when the output is piped to another

                 samtools command.

-F [INT]         Do not output alignments with any bits set in INT present

                 in the FLAG field. # 4 is unmapped reads

samtools sort [-o out.sam|out.bam|out.cram] [in.sam|in.bam|in.cram]

-m [INT]         Approximately the maximum required memory per thread,

                 specified either in bytes or with a K, M, or G suffix.

-@ [INT]         Set number of sorting and compression threads.

                 By default, operation is single-threaded

Finally, Create index file of Bam that some software requires

$ samtools index aln.sorted.bam aln.sorted.bam.bai

Get sequences of rRNA, tRNA, snRNA and snoRNA from Rfam database

Create a folder for storing Rfam and downloading index file, family.txt.

# Create and enter Rfam folder
$ cd $HOME
$ mkdir Rfam_ncRNA
$ cd Rfam_ncRNA
# Download Annotation of Rfam fasta file for selecting ncRNA
$ wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/database_files/family.txt.gz
$ gzip -d family.txt.gz

Create Rfam download link list

$ curl -l ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ | awk '{print "ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/"$0}' >Rfam_link.txt
# curl -l will return a list of filenames on FTP

Grep ncRNA download link list from Rfam download link list.

$ cat family.txt | awk -F "\t" '{ if($19~/snRNA/ && $19!~/snoRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_snRNA_link.txt
$ cat family.txt | awk -F "\t" '{ if($19~/snoRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_snoRNA_link.txt
$ cat family.txt | awk -F "\t" '{ if($19~/tRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_tRNA_link.txt
$ cat family.txt | awk -F "\t" '{ if($19~/rRNA/){print $1} }' | grep -f - Rfam_link.txt >Rfam_rRNA_link.txt

# $ Filtered condition: awk -F "\t" '{print $1,$19}' family.txt | head
# ...
# RF00003  Gene; snRNA; splicing;   #<--- snRNA match "snRNA" and don't match "snoRNA"
# ...
# RF00012  Gene; snRNA; snoRNA; CD-box;  #<--- snoRNA match "snRNA" and "snoRNA"

Download multiple files by list file

$ mkdir snRNA
$ mkdir snoRNA
$ mkdir tRNA
$ mkdir rRNA
$ wget -P snRNA -i Rfam_snRNA_link.txt 2>snRNA.log &
$ wget -P snoRNA -i Rfam_snoRNA_link.txt 2>snoRNA.log &
$ wget -P tRNA -i Rfam_tRNA_link.txt 2>tRNA.log &
$ wget -P rRNA -i Rfam_rRNA_link.txt 2>rRNA.log &

# -P [dir]       File was stored at this directory
# -i [file]      File is a list of download links.
# 2>file.log     STDERR was written in the log file.
# &              add "&" to tail of command, the command will run in the background

Wait for download finish!

All "wget" will be disappeared below "COMMAND".

$ top -u $USER 
   # Rfam_snoRNA have many files, you will wait for about 1~2hr! 
   # show numbers of link pre file
   # wc -l Rfam_*_link.txt 
   #   15 Rfam_rRNA_link.txt
   #  753 Rfam_snoRNA_link.txt
   #   18 Rfam_snRNA_link.txt
   #    2 Rfam_tRNA_link.txt
   #  788 total

Merge multiple fasta.gz files

$ zcat snRNA/*.gz >Rfam_snRNA.fasta
$ zcat snoRNA/*.gz >Rfam_snoRNA.fasta
$ zcat rRNA/*.gz >Rfam_rRNA.fasta
$ zcat tRNA/*.gz >Rfam_tRNA.fasta

NGS memo

2018年8月7日星期二

One-line get sorted Bam file from bowtie

One line get sorted Bam file

fasta_uncollapser [-i INFILE] [-o OUTFILE]

bowtie [options] <ebwt> <s> <hit>

samtools view [option]

samtools sort [-o out.sam|out.bam|out.cram] [in.sam|in.bam|in.cram]

Finally, Create index file of Bam that some software requires

Get sequences of rRNA, tRNA, snRNA and snoRNA from Rfam database

Create a folder for storing Rfam and downloading index file, family.txt.

Create Rfam download link list

Grep ncRNA download link list from Rfam download link list.

Download multiple files by list file

Wait for download finish!

Merge multiple fasta.gz files

DEseq2 usage

搜尋此網誌

2018年8月7日 星期二

One-line get sorted Bam file from bowtie

One line get sorted Bam file

fasta_uncollapser [-i INFILE] [-o OUTFILE]

bowtie [options] <ebwt> <s> <hit>

samtools view [option]

samtools sort [-o out.sam|out.bam|out.cram] [in.sam|in.bam|in.cram]

Finally, Create index file of Bam that some software requires

Get sequences of rRNA, tRNA, snRNA and snoRNA from Rfam database

Create a folder for storing Rfam and downloading index file, family.txt.

Create Rfam download link list

Grep ncRNA download link list from Rfam download link list.

Download multiple files by list file

Wait for download finish!

Merge multiple fasta.gz files

DEseq2 usage

2018年8月7日星期二