Use a regular expression for filtering sequences by id from a FASTA file, e.g. just certain chromosomes from a genome. There are other tools as part of bigger packages to install (and no regex support), mostly awk-based awkward (sorry for the pun) bash solutions, and scripts using packages that one needs to install and with still no support for regular expressions. This however is a simple, straightforward little python script for a simple task. It doesn’t do anything else and doesn’t need anything but a stock python installation. Based on the FASTA reader snippet.

Download here.

Usage:

python FASTAfilter.py [-h] regex infile outfile

From a FASTA-file with multiple >entries, filter by sequence ids using a
regex.

positional arguments:
regex Regex to filter entry ids, e.g. ‘chr[1-4]’. Note that the id does not contain the initial > character.
infile A FASTA input file, usually with multiple entries.
outfile The new file with only the matching entries.

optional arguments:
-h, –help show this help message and exit

INSTALL:

cd /data/software
wget http://dm516.user.srcf.net/fastafilter/FASTAfilter.zip
unzip FASTAfilter.zip
easy_install argparse

USAGE:

python FASTAfilter.py   [1-9,10,11,12,13,14,15,16,17,18,X]  \
/dat2/INPUT.fa \
/dat2/OUTPUT.fa

Error:

Traceback (most recent call last):
  File "FASTAfilter.py", line 3, in <module>
    import argparse
ImportError: No module named argparse

Solution:

run "easy_install argparse" as root user.

http://dm516.user.srcf.net/?p=314

Filter FASTA files的更多相关文章

  1. Extract Fasta Sequences Sub Sets by position

    cut -d " " -f 1 sequences.fa | tr -s "\n" "\t"| sed -s 's/>/\n/g' & ...

  2. elfinder中通过DirectoryStream.Filter实现筛选隐藏目录(二)

    今天还是没事看了看elfinder源码,发现之前说的两个版本实现都是基于不同的jdkelfinder源码浏览-Volume文件系统操作类(1), 带前端页面的是基于1.6中File实现,另一个是基于1 ...

  3. OpenFileDialog.Filter 属性

    如果 Filter 属性为 Empty,将显示所有文件. 始终显示文件夹. Filter 由以下部分组成:筛选器说明,后跟竖线 (|) 和筛选模式. 筛选器可以指定一个或多个文件类型. 说明描述了对话 ...

  4. python 高阶函数之filter

    前文说到python高阶函数之map,相信大家对python中的高阶函数有所了解,此次继续分享python中的另一个高阶函数filter. 先看一下filter() 函数签名 >>> ...

  5. Falcon Genome Assembly Tool Kit Manual

    Falcon Falcon: a set of tools for fast aligning long reads for consensus and assembly The Falcon too ...

  6. Linux command line exercises for NGS data processing

    by Umer Zeeshan Ijaz The purpose of this tutorial is to introduce students to the frequently used to ...

  7. 构建NCBI本地BLAST数据库 (NR NT等) | blastx/diamond使用方法 | blast构建索引 | makeblastdb

    参考链接: FTP README 如何下载 NCBI NR NT数据库? 下载blast:ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+ 先了解 ...

  8. STAR manual

    来源:STARmanual.pdf 来源:Calling variants in RNAseq PART0 准备工作 #STAR 安装前的依赖的工具 #Red Hat, CentOS, Fedora. ...

  9. &lt;二代測序&gt; 下载 NCBI sra 文件

    本文近期更新地址: http://blog.csdn.net/tanzuozhev/article/details/51077222 随着測序技术的不断提高.二代測序数据成指数增长. NCBI提供了S ...

随机推荐

  1. Hadoop格式化HDFS报错java.net.UnknownHostException: centos64

    异常描述 在对HDFS格式化,执行hadoop namenode -format命令时,出现未知的主机名的问题,异常信息如下所示: [shirdrn@localhost bin]$ hadoop na ...

  2. php 正则表达式三.模式修正

    1.贪婪模式和懒惰模式, 贪婪模式:php中正则默认是贪婪模式,匹配尽可能多 的字符,比如 $pattern='/a+b/'; $subject='aaaaaaaaab,那么可能会preg_match ...

  3. 微信商城 Common Log Format Apache CustomLog

    w 0- /Apr/::: +] "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, ...

  4. python array

    python中通常情况下for循环会枚举各个元素不会访问下标,例如: l = [1,2,4,6] for val in l: print l 但是有时候我们会需要在便利数组的同时访问下标,这时候可以借 ...

  5. Java基础 - 面向对象 - 类的定义

    package mingri.chapter_6; import java.util.Scanner; public class Person { /* * 类变量 * 定义方法: * 数据类型 变量 ...

  6. cookies与session

    一.cookies 本质:浏览器端保存的键值对 方便客户按照自己的习惯操作页面或软件,例如:用户验证,登陆界面,右侧菜单隐藏,控制页面列表显示条数... cookies是由服务端写在浏览器端,以后每次 ...

  7. Leetcode 之 Combination Sum系列

    39. Combination Sum 1.Problem Find all possible combinations of k numbers that add up to a number n, ...

  8. Hadoop的IO操作

    Hadoop的API官网:http://hadoop.apache.org/common/docs/current/api/index.html   相关的包 org.apache.hadoop.io ...

  9. 【教程】Microsoft Visual Studio 2015 安装Android SDK

    http://blog.sina.com.cn/s/blog_51f9ffaa0102vuhy.html Hi,大家好,自vs2015发布后,有很多小伙伴尝试使用VS2015开发安卓,由于是新手,一折 ...

  10. s5_day4作业

    # #流程控制练习题: # #==========>基础部分 # #练习一: # if True or False and False: # print('yes') # else: # pri ...