本文最后更新于 331 天前,其中的信息可能已经有所发展或是发生改变。
在加入尖孢镰刀菌古巴专化型菌:NCBI地址的时候,使用脚本:
IN='/var/www/html/JBrowse-1.16.9/Foc_4/data';
OUT='Foc_4_try';
set -e;
set -x;
# format the reference sequences
/var/www/html/JBrowse-1.16.9/bin/prepare-refseqs.pl --fasta $IN/GCF_000260195.1_FO_II5_V1_genomic.fna --out $OUT;
# official ITAG2.3 gene models
#/var/www/html/JBrowse-1.16.9/bin/flatfile-to-json.pl --out $OUT --type mRNA --gff $IN/genomic.gff --trackLabel genes --key 'Gene models' --getSubfeatures --className transcript --subfeatureClasses '{"CDS": "transcript-CDS", "exon": "exon"}' --arrowheadClass arrowhead --nameAttributes "locus_tag,Name";
/var/www/html/JBrowse-1.16.9/bin/flatfile-to-json.pl --out $OUT --gff $IN/genomic.gff --type mRNA --autocomplete all --trackLabel genes --key 'Gene models' --getSubfeatures --className transcript --subfeatureClasses '{"CDS": "transcript-CDS", "exon": "exon"}' --arrowheadClass arrowhead
# index feature names
/var/www/html/JBrowse-1.16.9/bin/generate-names.pl --out $OUT;
后,生成的结果,在Jbrowse浏览器上浏览时,发现Primary Data 条目下 Name一栏为:XM_031197233 并非是想要的FOIG_00001
GFF文件头:
NW_022158687.1 RefSeq region 1 4544391 . + . ID=NW_022158687.1:1..4544391;Dbxref=taxon:1089451;Name=Unknown;chromosome=Unknown;forma-specialis=cubense tropical race 4;gbkey=Src;genome=genomic;mol_type=genomic DNA;old-name=Fusarium oxysporum f. sp. cubense tropical race 4 54006;strain=54006
NW_022158687.1 RefSeq gene 138 518 . + . ID=gene-FOIG_00001;Dbxref=GeneID:42025176;Name=FOIG_00001;end_range=518,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=FOIG_00001;partial=true;start_range=.,138
NW_022158687.1 RefSeq mRNA 138 518 . + . ID=rna-XM_031195970.1;Parent=gene-FOIG_00001;Dbxref=GeneID:42025176,GenBank:XM_031195970.1;Name=XM_031195970.1;end_range=518,.;gbkey=mRNA;locus_tag=FOIG_00001;orig_protein_id=gnl|WGS:AGND|FOIG_00001T0;orig_transcript_id=gnl|WGS:AGND|mrna_FOIG_00001T0;partial=true;product=uncharacterized protein;start_range=.,138;transcript_id=XM_031195970.1
NW_022158687.1 RefSeq exon 138 518 . + . ID=exon-XM_031195970.1-1;Parent=rna-XM_031195970.1;Dbxref=GeneID:42025176,GenBank:XM_031195970.1;end_range=518,.;gbkey=mRNA;locus_tag=FOIG_00001;orig_protein_id=gnl|WGS:AGND|FOIG_00001T0;orig_transcript_id=gnl|WGS:AGND|mrna_FOIG_00001T0;partial=true;product=uncharacterized protein;start_range=.,138;transcript_id=XM_031195970.1
NW_022158687.1 RefSeq CDS 138 518 . + 0 ID=cds-XP_031071737.1;Parent=rna-XM_031195970.1;Dbxref=GeneID:42025176,GenBank:XP_031071737.1;Name=XP_031071737.1;gbkey=CDS;locus_tag=FOIG_00001;orig_transcript_id=gnl|WGS:AGND|mrna_FOIG_00001T0;product=uncharacterized protein;protein_id=XP_031071737.1
遂修改脚本的
/var/www/html/JBrowse-1.16.9/bin/flatfile-to-json.pl --out $OUT --gff $IN/ne.gff --type mRNA --autocomplete all --trackLabel genes --key 'Gene models' --getSubfeatures --className transcript --subfeatureClasses '{"CDS": "transcript-CDS", "exon": "exon"}' --arrowheadClass arrowhead ; # index feature names
中的
希望能正确识别为ID=gene-FOIG_00001 里的结果,可惜不行,于是就对gff文件进行修改: --type mRNA
为
--type gene
使用以下python脚本
import re
def replace_name_with_locus_tag(file_name):
with open(file_name, 'r') as file:
for line in file:
columns = line.split('\t')
if len(columns) > 2 and columns[2] == 'mRNA':
name_search = re.search('Name=(.*?);', line)
locus_tag_search = re.search('locus_tag=(.*?);', line)
if name_search and locus_tag_search:
old_name = name_search.group(1)
locus_tag = locus_tag_search.group(1)
line = line.replace(f'Name={old_name}', f'Name={locus_tag}')
line = line.replace(f'locus_tag={locus_tag}', f'locus_tag={old_name}')
print(line)
else:
print(line)
replace_name_with_locus_tag('genomic.gff')
用法为:’genomic.gff’替换为你的gff文件的路径和名称然后
python3 script.py > new_file.gff