HIC simple process

1，什么是Hic数据？

Hi-C是研究染色质三维结构的一种方法。Hi-C技术源于染色体构象捕获（Chromosome Conformation Capture, 3C）技术，利用高通量测序技术，结合生物信息分析方法，研究全基因组范围内整个染色质DNA在空间位置上的关系，获得高分辨率的染色质三维结构信息。

2，Hic数据的优势

通过Scaffold间的交互频率大小，可以对已组装的基因组序列进行纠错。
基因信息不再仅仅是contig片段，而是被划分至染色体上，成为染色体水平。
无需辛苦的构建群体，单一一个体就能实现染色体定位。
相比遗传图谱，标记密度更大，序列定位更完整。
可以开展染色体重排等结构变异研究。
QTL、GWAS可以定位区间到某个染色体。
可以解析该物种的三维基因结构、染色体互作及动态变化。

3，目前的处理流程

4，分析主要工具

目前针对Hi-c数据处理的工具主要是Hic-pro和juicer

#####HIC图谱，TAD结构，loop结构，3D-建模

####HiC-Pro installlation####

wget -c http://github.com/nservant/HiC-Pro/archive/refs/tags/v3.1.0.tar.gz

tar -zxvf HiC-Pro-3.1.0.tar.gz

conda env create -f /data5/tan/zengchuanj/Software/HiC-Pro-3.1.0/environment.yml -p /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro

conda activate HiC-Pro

#configure.install.txt:

PREFIX = /data5/tan/zengchuanj/Software/HiC-Pro-3.1.0

BOWTIE2_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/bowtie2

SAMTOOLS_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/samtools

R_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/R

PYTHON_PATH = /data5/tan/zengchuanj/conda/conda/envs/HiC-Pro/bin/python

CLUSTER_SYS = TORQUE

make configure

make install

ref_dir = /data5/tan/zengchuanj/pipeline/Annotation/HIC/GRCm39.genome.fa.gz

gunzip GRCm39.genome.fa.gz

#build index

pwd:/data5/tan/zengchuanj/pipeline/Annotation/HIC

bowtie2-build GRCm39.genome.fa mouse

samtools faidx GRCm39.genome.fa

#基因组中序列大小文件

awk '{print $1 "\t" $2}' GRCm39.genome.fa.fai > mouse.genome.sizes

#创建酶切位点文件

bin=/data5/tan/zengchuanj/Software/HiC-Pro-3.1.0/bin/utils/digest_genome.py

#python $bin GRCm39.genome.fa -r mobi  -o  mouse_mobi.bed

python $bin GRCm39.genome.fa -r ^GATCGATC  -o  mouse_mobi.bed 

#config-hicpro.txt:

N_CPU，CPU数目；

BOWTIE2_IDX_PATH，索引所在目录

REFERENCE_GENOME，比对参考基因组路径及前缀

GENOME_SIZE，chrom.sizes文件的路径

GENOME_FRAGMENT，酶切片段的bed文件的路径

LIGATION_SITE，酶切位点末端补平再次连接后形成的嵌合序列，例如HindIII，则为AAGCTAGCTT；如果是MboI则序列为GATCGATC；

## SYSTEM AND SCHEDULER - Start Editing Here !!

	N_CPU = 50  #CPU线程数

	LOGFILE = hicpro.log  #log文件名

	JOB_NAME = hicpro  #任务名

	JOB_MEM = 100gb  #占用内存

	JOB_WALLTIME =

	JOB_QUEUE =

	JOB_MAIL = 

	PAIR1_EXT = _R1

	PAIR2_EXT = _R2

	BOWTIE2_IDX_PATH = /data5/tan/lishix/jys/test/results/reads #比对的reads文件目录

	BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder

	BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

	GENOME_SIZE = /data5/tan/zengchuanj/pipeline/Annotation/HIC/mouse.genome.sizes #genome.sizes的绝对路径

	## Digestion Hi-C

	GENOME_FRAGMENT = /data5/tan/zengchuanj/pipeline/HIC/mouse_mobi.bed #绝对路径

	LIGATION_SITE =  GATCGATC #限制性内切酶，具体用的什么酶可以咨询测序公司，我这里用的Mboi

	MIN_FRAG_SIZE = 100

	MAX_FRAG_SIZE = 100000

	MIN_INSERT_SIZE = 100

	MAX_INSERT_SIZE = 1000

	## Contact Maps

	BIN_SIZE = 20000 40000 150000 500000 1000000 #根据自身需求设置 bin size

	MATRIX_FORMAT = upper

/data5/tan/zengchuanj/Software/HiC-Pro-3.1.0/bin/HiC-Pro -c /data5/tan/zengchuanj/pipeline/HIC/HiC-Pro/config-hicpro.txt -i /data5/tan/zengchuanj/pipeline/HIC/HiC-Pro/fastq -o /data5/tan/zengchuanj/pipeline/HIC/HiC-Pro/results

#目录构成：

	fastq/sample:

		sample_R1.fastq.gz

		sample_R2.fastq.gz

#####juicer installation####

conda create -n juicer -c bioconda bwa -y

conda activate jucier

mkdir work && mkdir references && mkdir restriction_sites

Juicer/juicer/references # 存放参考基因组相关文件的文件夹

Juicer/juicer/work # 存放样本的序列文件，和分析结果的文件夹

Juicer/juicer/restriction_sites # 存放参考基因组酶切图谱的文件夹

wget https://github.com/aidenlab/juicer/archive/refs/tags/1.6.tar.gz

tar -xzvf juicer-1.6.tar.gz

ln -s juicer/CPU scripts

# scripts 应该在juicer目录下

cd juicer/scripts/common

wget -c https://hicfiles.tc4ga.com/public/juicer/juicer_tools.1.9.9_jcuda.0.8.jar

ln -s juicer_tools.1.9.9_jcuda.0.8.jar  juicer_tools.jar

#构建基因组索引

pwd:/data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/references

bwa index GRCm39.genome.fa

#生成酶切图谱文件

python /data5/tan/zengchuanj/Software/juicer/misc/generate_site_positions.py Mboi genome /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/references/GRCm39.genome.fa

#生成染色体长度文件

# genome_DpnII.txt 文件由上一步生成

awk 'BEGIN{OFS="\t"}{print $1, $NF}'  genome_Mboi.txt > genome.chrom.sizes

cd ./references

python /data5/tan/zengchaunj/pipeline/HIC/Juicer/misc/generate_site_positions.py Mboi mm9 mm9.fasta

# 三个参数分别为 内切酶名称，参考基因组名称，参考基因组序列文件的路径

nohup bash scripts/juicer.sh -d /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/test -D /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer -y /data5/tan/lishix/HIC/opt/juicer/restriction_sites/mm39_MboI.txt  -z /data5/tan/lishix/HIC/opt/juicer/references/Mus_musculus.GRCm39.dna.toplevel.fa -p restriction_sites/genome.chrom.sizes -s MboI -t 10 2> test.txt &

Usage:

	# nohup 命令会将程序挂在后台运行

	nohup bash /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/juicer.sh \

	-z /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/references/GRCm39.genome.fa \

	-p /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/restriction_sites/genome.chrom.sizes \

	-y /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/restriction_sites/GRCm39.genome_MboI.txt \

	-s MboI \

	-d /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/ \

	-D /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer \

	-t 40 > log.txt &

	# -z参数指定参考基因组fasta所在路径，在该路径下必须同时存在对应的bwa索引

	# -p参数指定染色体长度文件；

	# -y指定基因组酶切图谱的路径；

	# -d指定样本原始文件存放的路径；

	# -D指定软件的安装路径，

	# -t指定bwa比对使用的线程数，默认是使用全部线程。

#HIC图谱绘制

data_dir = /data5/tan/lishix/jys/test/results/

species = mouse

酶：mboi

#使用HiCPlotter.py对HiC-Pro结果进行可视化

python2.7 HiCPlotter.py -o genome \

    -f genome_500000_iced.matrix \

    -r 500000 -tri 1 \

    -bed genome_500000_abs.bed \

    -n genome \

    -wg 1 -chr chromosome7

	-o 输出的文件名

	-f _500000_iced.matrix产生的矩阵文件

	-r 矩阵的分辨率

	-bed _500000_abs.bed产生的bed文件

	-n 输出图片最上方的名字

	-chr 最后一号染色体的名字 可使用"tail -n 1 *.bed"命令查看 

#使用juicer call tad

ref:https://github.com/aidenlab/juicer/wiki/Arrowhead

/data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/common/juicer_tools  arrowhead  --ignore_sparsity  /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/aligned/inter.hic   ./contact_domains_list/

##使用juicer call loop

nohup java -jar /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/common/juicer_tools.jar hiccups --cpu --threads 19 -r 5000,10000 --ignore_sparsity  /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/aligned/inter.hic inter.hic.hiccups > loop.txt &

nohup java -jar /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/scripts/common/juicer_tools.jar hiccups --gpu --threads 19 -r 2500,5000,7500,10000,12500,15000,17500,20000,22500 --ignore_sparsity  /data5/tan/zengchuanj/pipeline/HIC/Juicer/juicer/work/aligned/inter.hic inter.hic.hiccups > loop.txt &

HIC simple process的更多相关文章

[转]Design Pattern Interview Questions - Part 4
Bridge Pattern, Composite Pattern, Decorator Pattern, Facade Pattern, COR Pattern, Proxy Pattern, te ...
docker-compose编写（英文）
原文地址:https://docker.github.io/compose/compose-file/ Compose file reference The Compose file is a YAM ...
转一篇关于如何在Unity里使用Protobuf
原帖地址: http://purdyjotut.blogspot.com/2013/10/using-protobuf-in-unity3d.html 先转过来,等时间合适了,再来收拾 Using P ...
How to set up an FTP server on Ubuntu 14.04
How to set up an FTP server on Ubuntu 14.04 Setting up a fully-functional and highly secure FTP serv ...
[家里蹲大学数学杂志]第049期2011年广州偏微分方程暑期班试题---随机PDE-可压NS-几何
随机偏微分方程 Throughout this section, let $(\Omega, \calF, \calF_t,\ P)$ be a complete filtered probabili ...
Website Speed Optimization Guide for Google PageSpeed Rules
原链接地址:http://www.artzstudio.com/2016/07/website-speed-optimization-guide-for-google-pagespeed-rules/ ...
iOS 学习笔记二（2015.02.26）
How To Use Git Source Control with Xcode in iOS 6 If you're new here, you may want to subscribe to m ...
LepideMigrator for Documents Step by Step
blog: http://blog.csdn.net/foxdave A Manager Marketing Operations invite me to review their product, ...
UNDERSTANDING POSTGRESQL.CONF: CHECKPOINT_SEGMENTS, CHECKPOINT_TIMEOUT, CHECKPOINT_WARNING
While there are some docs on it, I decided to write about it, in perhaps more accessible language – ...
LINUX常用配置及命令
一. Fedora系统配置 1. [设置网卡IP] 步骤如下: 1) 用root用户登陆,打开/etc/sysconfig/network-scripts/ifcfg-eth0文 ...

随机推荐

docker——health(容器的健康检查)
容器的健康检查机制了解在dockerfile中容器的健康检查 # 在dockerfile中使用healthcheck指令,声明健康检测配置,用于判断容器主进程的服务状态是否正常,反映容器的实际健康状 ...
C# 指针简单使用
1. 使用unsafe C# 支持 unsafe 上下文,你可在其中编写不可验证的代码. 在 unsafe 上下文中,代码可使用指针.分配和释放内存块,以及使用函数指针调用方法. C# 中的不安全代码 ...
vue3 KeepAlive
在Vue.js 3中,<keep-alive> 是一个抽象组件,用于保留其子组件状态,防止在切换组件时销毁它们.这对于在页面间切换时保留组件的状态或避免重复渲染特定组件非常有用.<k ...
js数据类型的检查
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8 ...
js布尔类型
 <input type="text" disabled>  & ...
The remote name could not be resolved
HTTP The remote name could not be resolved HTTP Status:NameResolutionFailure
rust程序设计（6）枚举与模式匹配
rust中的枚举有什么用?枚举可以嵌入类型的好处是什么你可以在同一个枚举中既有单个值,也有元组或结构体. 枚举的每个变体可以拥有不同数量和类型的关联数据. 这增加了类型的灵活性和表达力,使你能够更精 ...
效率提升利器：一个在线的.NET源码查询网站
前言你是否有这样的苦恼,有时候需要查询.NET中的某个类型.方法.属性或程序集的源代码,但又不想从GitHub中下载源代码.今天大姚分享一个在线且实用的.NET源码查询网站. 在线查询地址 http ...
.NET项目中使用HtmlSanitizer防止XSS攻击
.NET项目中使用HtmlSanitizer防止XSS攻击前言最近博客也是上线了留言板功能,但是没有做审核(太懒了),然后在留言的时候可以输入<script>alert('xss')& ...
Ubuntu20.04之Nvidia驱动安装
参考:https://blog.csdn.net/xiaokedou_hust/article/details/82187860,实际操作时和该博文有些出入,故作优化. s1.连接wifi,打开终端. ...

HIC simple process

1，什么是Hic数据？

2，Hic数据的优势

3，目前的处理流程

4，分析主要工具

HIC simple process的更多相关文章

随机推荐

热门专题