2023年7月3日发(作者:)
Bulk-RNA-seq流程——从测序数据到count⽂件(AGSdata)2022-2-11软件安装conda的安装和配置wget -c /miniconda/Miniconda3-latest-Linux-x86_d 777 Miniconda3-latest-Linux-x86_ #给执⾏权限bash Miniconda3-latest-Linux-x86_ #运⾏当命令⾏前⾯出现(base)的时候说明现在已经在conda的环境中了conda config --add channels biocondaconda config --add channels conda-forge 清华镜像,哈⼯⼤镜像conda config --add channels /anaconda/pkgs/free/conda config --add channels /anaconda/pkgs/main/conda config --add channels /anaconda/cloud/conda-forge/conda config --add channels /anaconda/cloud/bioconda/conda config --add channels /anaconda/cloud/conda config --add channels /anaconda/pkgs/free/conda config --add channels /anaconda/pkgs/main/创建名为rna的环境变量:conda create -n rna python=2(许多软件依赖python2环境)环境退出:conda deactivate#创建名为rna的环境变量,许多软件依赖python2环境conda create -n rna python=2To activate this environment, use## $ conda activate rna## To deactivate an active environment, use## $ conda deactivateconda安装⽣信软件conda activate rna# 很多软件需要python版本较低conda install multiqc
conda install trim-galoreconda install hisat2conda install samtoolsconda install RSeQC测序数据处理数据下载axel -a "下载链接"质控QCfastqc⽣成质控报告,multiqc将各个样本的质控报告整合为⼀个。#! /bin/bash# Used for RNA-seq data by FastQC and multiQC#Usage# fastqc -o [output dir] --(no)extract -f [fastq|bam|sam] -c [contaminant file] seqfile1 .. seqfileN#简写代码:fastqc -t 8 -o
质控报告 Basic Statistics 从read⽔平来总览,判断测序质量。Encoding :测序平台的版本,因为不同版本的 error p的计算⽅法不⼀样。Total sequence:测序深度。⼀共测序的read数。是质量分析的主要参数。Sequence length:测序长度。%GC:GC碱基含量⽐,⼀般是物种特异性,⽐如⼈类是42%左右。
Perbase sequence quality 横坐标: 第1-100个测序得到的碱基纵坐标: 测序质量评估。这⾥的Q=-10*lg10(error P),即20%代表1%的错误读取率,30%代表0.1%的错误读取率箱型图: 红⾊线,是某个顺序下测序碱基所有测序质量的中位数。黄⾊块,是测序质量在25%-75%区域。蓝⾊线,平均数。⼀般要求: 测序箱型图10%的线⼤于Q=20。Q20过滤法。per tail sequence quality 横坐标:同上。纵坐标:tail的index编号。⽬的:防⽌测序过程中某些tail受不可控因素测序质量低。标准:蓝⾊表⽰质量⾼,浅⾊或暖⾊表⽰质量低,后续的分析可以去除低质量tail
trim_galore去除低质量readstrim_galore的参数trim_galore [options]
trim_galore -q 20 --phred33 --length 36 --stringency 3 --paired $fq1 $fq2 -o $dir
done
数据⽐较⼤,2.5G的⼤概要⼗来分钟
再次QC检验cleandata#! /bin/bashls /mnt/AGS_RNA-seq/clean_data/*gz | xargs fastqc -t 64multiqc ./
mapping
常⽤软件有star和HISAT2下载索引⽂件cd /mnt/AGS_RNA-seq/index_hisat2axel /hisat/grch38_ zxvf grch38_38_tran/grch38_tran/genome_2grch38_tran/genome_2grch38_tran/genome_2grch38_tran/genome_2grch38_tran/make_grch38_38_tran/genome_2grch38_tran/genome_2grch38_tran/genome_2grch38_tran/genome_2HISAT2⽤法HISAT2 version 2.1.0 by Daehwan Kim (infphilo@, /people/infphilo)Usage: hisat2 [options]* -x
HISAT2参数 -x
2022-2-15HISAT2对⽐#使⽤代码:hisat2 -p 6 -x
耗时:⼀个25G的sam⽂件整个流程⼤概15mins,我这⾥6个sam⼤概160G,总耗时约2hrBAM⽂件质控(可选)2022-2-16软件RSeQC RNA-Seq质控⼯具,多个python包,⽤于评估RNA-seq的数据:检查序列质量、核酸组分偏性、PCR偏性、GC含量偏性,还有RNA-seq特异性模块:评估测序饱和度、映射读数分布、覆盖均匀性、链特异性、转录⽔平RNA完整性等 参数#⽐对情况Usage: bam_ [options]Summarizing mapping statistics of a BAM or SAM s: --version show program's version number and exit -h, --help show this help message and exit -i INPUT_FILE, --input-file=INPUT_FILE Alignment file in BAM or SAM format. -q MAP_QUAL, --mapq=MAP_QUAL Minimum mapping quality (phred scaled) to determine "uniquely mapped" reads. default=30#基因组覆盖率Usage: read_ [options]Check reads distribution over exon, intron, UTR, intergenic ... etcThe following reads will be skipped: qc_failed PCR duplicate Unmapped Non-primary (or secondary)Options: --version show program's version number and exit -h, --help show this help message and exit -i INPUT_FILE, --input-file=INPUT_FILE Alignment file in BAM or SAM format. -r REF_GENE_MODEL, --refgene=REF_GENE_MODEL Reference gene model in bed format.#需要bed⽂件#下载bed⽂件/projects/rseqc/files/BED/Human_Homo_sapiens/hg38_/download#本地下载解压缩,将bed⽂件上传服务器#这个下载链接有时候没反应,科学上⽹后就能正常 ⽤法for i in {61,62,63,191,193,194}dobam_ -i /mnt/AGS_RNA-seq/bam_sort/${i}. > /mnt/AGS_RNA-seq/scripts_log/${i}_eread_ -i /mnt/AGS_RNA-seq/bam_sort/${i}. -r /mnt/AGS_RNA-seq/BED_file/hg38_ > /mnt/AGS_RNA-seq/scripts_log/${i}_distribCount:FeaturecountsRNA-seq的count有许多可选的软件,HTseq、Stringtie、featurecounts,看教程⾥HTseq和featurecounts的结果差异不⼤,但HTseq耗时太久,所以选择featurecounts进⾏下⼀步安装conda install subread⽂件准备输⼊⽂件需要⼀个注释⽂件和bam_sorted#下载注释⽂件cd /mnt/AGS_RNA-seq/GTFaxel /pub/current_gtf/homo_sapiens/Homo_ -d Homo_使⽤for i in {61,62,63,191,193,194}dodone# -T 使⽤的线程数# -p 如果是paird end 就⽤# -t 将exon作为⼀个feature# -g 将gene_id作为⼀个feature# -a 参考的gtf/gff# -o 输出⽂件# 最后加上bam⽂件,注意bam⽂件要为sorted#全部合在⼀个counts⽂件featureCounts -T 16 -p -t exon -g gene_id -a /mnt/AGS_RNA-seq/GTF/Homo_ -o /mnt/AGS_RNA-seq/featureCounts/${i}. /mnt/featureCounts -T 16 -p -t exon -g gene_id -a /mnt/AGS_RNA-seq/GTF/Homo_ -o /mnt/AGS_RNA-seq/featureCounts/ /mnt/A其他参数 输出结果 ========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ | __ | ____| / | __
===== | (___ | | | | |_) | |__) | |__ / | | | | ==== ___ | | | | _ <| _ /| __| / / | | | | ==== ____) | |__| | |_) | | | |____ / ____ | |__| | ========== |_____/ ____/|____/|_| _______/_/ ______/ v2.0.1//========================== featureCounts setting ===========================|| |||| Input files : 1 BAM file |||| o |||| |||| Output file : |||| Summary : y |||| Annotation : Homo_ (GTF) |||| Dir for temp files : /mnt/AGS_RNA-seq/featureCounts |||| |||| Threads : 16 |||| Level : meta-feature level |||| Paired-end : yes |||| Multimapping reads : not counted |||| Multi-overlapping reads : not counted |||| Min overlapping bases : 1 |||| |||| Chimeric reads : counted |||| Both ends mapped : not required |||| ||============================================================================////================================= Running ==================================|| |||| Load annotation file Homo_ ... |||| Features : 1552026 |||| Meta-features : 61541 |||| Chromosomes/contigs : 47 |||| |||| Process BAM file ... |||| Paired-end reads are included. |||| Total alignments : 30720085 |||| Successfully assigned alignments : 22645654 (73.7%) |||| Running time : 0.52 minutes |||| |||| Write the final count table. |||| Write the read assignment summary. |||| |||| Summary of counting results can be found in file "/mnt/AGS_RNA-seq/featur |||| eCounts/y" |||| ||============================================================================////================================= Running ==================================|| |||| Load annotation file Homo_ ... |||| Features : 1552026 |||| Meta-features : 61541 |||| Chromosomes/contigs : 47 |||| |||| Process BAM file ... |||| Paired-end reads are included. |||| Total alignments : 27504748 |||| Successfully assigned alignments : 19565202 (71.1%) |||| Running time : 0.40 minutes |||| |||| |||| Write the final count table. |||| Write the read assignment summary. |||| |||| Summary of counting results can be found in file "/mnt/AGS_RNA-seq/featur |||| eCounts/y" |||| ||============================================================================////================================= Running ==================================|| |||| Load annotation file Homo_ ... |||| Features : 1552026 |||| Meta-features : 61541 |||| Chromosomes/contigs : 47 |||| |||| Process BAM file ... |||| Paired-end reads are included. |||| Total alignments : 27341548 |||| Successfully assigned alignments : 17090457 (62.5%) |||| Running time : 0.37 minutes |||| |||| Write the final count table. |||| Write the read assignment summary. |||| |||| Summary of counting results can be found in file "/mnt/AGS_RNA-seq/featur |||| eCounts/y" |||| ||============================================================================////================================= Running ==================================|| |||| Load annotation file Homo_ ... |||| Features : 1552026 |||| Meta-features : 61541 |||| Chromosomes/contigs : 47 |||| |||| Process BAM file ... |||| Paired-end reads are included. |||| Total alignments : 34769594 |||| Successfully assigned alignments : 21749916 (62.6%) |||| Running time : 0.89 minutes |||| |||| Write the final count table. |||| Write the read assignment summary. |||| |||| Summary of counting results can be found in file "/mnt/AGS_RNA-seq/featur |||| eCounts/y" |||| ||============================================================================////================================= Running ==================================|| |||| Load annotation file Homo_ ... |||| Features : 1552026 |||| Meta-features : 61541 |||| Chromosomes/contigs : 47 |||| |||| Process BAM file ... |||| Paired-end reads are included. |||| Total alignments : 38580695 |||| Successfully assigned alignments : 22749538 (59.0%) |||| Running time : 0.78 minutes |||| Running time : 0.78 minutes |||| |||| Write the final count table. |||| Write the read assignment summary. |||| |||| Summary of counting results can be found in file "/mnt/AGS_RNA-seq/featur |||| eCounts/y" |||| ||============================================================================////================================= Running ==================================|| |||| Load annotation file Homo_ ... |||| Features : 1552026 |||| Meta-features : 61541 |||| Chromosomes/contigs : 47 |||| |||| Process BAM file ... |||| Paired-end reads are included. |||| Total alignments : 37682170 |||| Successfully assigned alignments : 27515658 (73.0%) |||| Running time : 0.56 minutes |||| |||| Write the final count table. |||| Write the read assignment summary. |||| |||| Summary of counting results can be found in file "/mnt/AGS_RNA-seq/featur |||| eCounts/y" |||| ||============================================================================//Stringtie#stringtie为转录本拼接和定量的软件#利⽤stringtie分别输出gene count数⽬和FPKM值,gene count数⽤户基因差异分析,FPKM值为基因表达值,⽤于heatmap等绘制#Usage: stringtie
发布者:admin,转转请注明出处:http://www.yc00.com/news/1688383655a129862.html
评论列表(0条)