Jbrowse在添加新物种时遇到的问题
本文最后更新于 331 天前,其中的信息可能已经有所发展或是发生改变。

在加入尖孢镰刀菌古巴专化型菌:NCBI地址的时候,使用脚本:

IN='/var/www/html/JBrowse-1.16.9/Foc_4/data';
OUT='Foc_4_try';
set -e;
set -x;
# format the reference sequences
/var/www/html/JBrowse-1.16.9/bin/prepare-refseqs.pl --fasta $IN/GCF_000260195.1_FO_II5_V1_genomic.fna --out $OUT;
# official ITAG2.3 gene models
#/var/www/html/JBrowse-1.16.9/bin/flatfile-to-json.pl --out $OUT     --type mRNA --gff $IN/genomic.gff  --trackLabel genes --key 'Gene models' --getSubfeatures  --className transcript  --subfeatureClasses '{"CDS": "transcript-CDS", "exon": "exon"}' --arrowheadClass arrowhead --nameAttributes "locus_tag,Name";

/var/www/html/JBrowse-1.16.9/bin/flatfile-to-json.pl --out $OUT --gff $IN/genomic.gff --type mRNA --autocomplete all --trackLabel genes --key 'Gene models' --getSubfeatures  --className transcript --subfeatureClasses '{"CDS": "transcript-CDS", "exon": "exon"}' --arrowheadClass arrowhead
 # index feature names
/var/www/html/JBrowse-1.16.9/bin/generate-names.pl --out $OUT;

后,生成的结果,在Jbrowse浏览器上浏览时,发现Primary Data 条目下 Name一栏为:XM_031197233 并非是想要的FOIG_00001
GFF文件头:

NW_022158687.1  RefSeq  region  1   4544391 .   +   .   ID=NW_022158687.1:1..4544391;Dbxref=taxon:1089451;Name=Unknown;chromosome=Unknown;forma-specialis=cubense tropical race 4;gbkey=Src;genome=genomic;mol_type=genomic DNA;old-name=Fusarium oxysporum f. sp. cubense tropical race 4 54006;strain=54006
NW_022158687.1  RefSeq  gene    138 518 .   +   .   ID=gene-FOIG_00001;Dbxref=GeneID:42025176;Name=FOIG_00001;end_range=518,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=FOIG_00001;partial=true;start_range=.,138
NW_022158687.1  RefSeq  mRNA    138 518 .   +   .   ID=rna-XM_031195970.1;Parent=gene-FOIG_00001;Dbxref=GeneID:42025176,GenBank:XM_031195970.1;Name=XM_031195970.1;end_range=518,.;gbkey=mRNA;locus_tag=FOIG_00001;orig_protein_id=gnl|WGS:AGND|FOIG_00001T0;orig_transcript_id=gnl|WGS:AGND|mrna_FOIG_00001T0;partial=true;product=uncharacterized protein;start_range=.,138;transcript_id=XM_031195970.1
NW_022158687.1  RefSeq  exon    138 518 .   +   .   ID=exon-XM_031195970.1-1;Parent=rna-XM_031195970.1;Dbxref=GeneID:42025176,GenBank:XM_031195970.1;end_range=518,.;gbkey=mRNA;locus_tag=FOIG_00001;orig_protein_id=gnl|WGS:AGND|FOIG_00001T0;orig_transcript_id=gnl|WGS:AGND|mrna_FOIG_00001T0;partial=true;product=uncharacterized protein;start_range=.,138;transcript_id=XM_031195970.1
NW_022158687.1  RefSeq  CDS 138 518 .   +   0   ID=cds-XP_031071737.1;Parent=rna-XM_031195970.1;Dbxref=GeneID:42025176,GenBank:XP_031071737.1;Name=XP_031071737.1;gbkey=CDS;locus_tag=FOIG_00001;orig_transcript_id=gnl|WGS:AGND|mrna_FOIG_00001T0;product=uncharacterized protein;protein_id=XP_031071737.1

遂修改脚本的

/var/www/html/JBrowse-1.16.9/bin/flatfile-to-json.pl --out $OUT --gff $IN/ne.gff --type mRNA    --autocomplete all --trackLabel genes --key 'Gene models' --getSubfeatures  --className transcript --subfeatureClasses '{"CDS": "transcript-CDS", "exon": "exon"}' --arrowheadClass arrowhead ; # index feature names

中的 --type mRNA --type gene 希望能正确识别为ID=gene-FOIG_00001 里的结果,可惜不行,于是就对gff文件进行修改:
使用以下python脚本

import re

def replace_name_with_locus_tag(file_name):
    with open(file_name, 'r') as file:
        for line in file:
            columns = line.split('\t')
            if len(columns) > 2 and columns[2] == 'mRNA':
                name_search = re.search('Name=(.*?);', line)
                locus_tag_search = re.search('locus_tag=(.*?);', line)
                if name_search and locus_tag_search:
                    old_name = name_search.group(1)
                    locus_tag = locus_tag_search.group(1)
                    line = line.replace(f'Name={old_name}', f'Name={locus_tag}')
                    line = line.replace(f'locus_tag={locus_tag}', f'locus_tag={old_name}')
                print(line)
            else:
                print(line)

replace_name_with_locus_tag('genomic.gff')

用法为:’genomic.gff’替换为你的gff文件的路径和名称然后

python3 script.py > new_file.gff
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇