Usage

1. Prepare protein database

fGAP requires protein database in FASTA file. We recommend 3 ~ 4 organisms’ proteome to save running time. For this, we provide a script to build your database using NCBI API.

Usage:

# Assume you downloaded fGAP in $HOME/fGAP
python $HOME/fGAP/fgap/download_sister_orgs.py\
 --download_dir <download_directory>\
 --taxon <taxon>\
 --num_sisters <number_of_sisters>\
 --email_address <your_email_address>

E-mail address is needed for NCBI Entrez.

Example command:

python $HOME/fGAP/fgap/download_sister_orgs.py\
 --download_dir sister_orgs\
 --taxon "Schizophyllum"\
 --num_sisters 3\
 --email_address mbnmbn00@gmail.com

All taxon levels are allowed for –taxon argument, but genus level is appropriate. Now make protein database:

cd sister_orgs/
gunzip *.faa.gz
cat ./*faa > prot_db.faa

You can now input prot_db.faa to fGAP.

2. Run fGAP

To run fGAP, you need three main arguments

  • Genome assembly (FASTA)
  • Transcriptomic reads (FASTQ)
  • Protein database (FASTA)

Currently, fgap gets only Illumina paired-end reads files. The file names of two paired-end reads should have suffix like ‘XX_1.fastq’ and ‘XX_2.fastq’. The prefixes should be same without ‘_’ character. For example, File names would be like hyphae_1.fastq and hyphae_2.fastq.

Usage:

# Assume you downloaded fGAP in $HOME/fGAP
python $HOME/fGAP/fgap/fgap.py\
 --output_dir <output_directory>\
 --trans_read_files <transcriptome_reads_fastqs>\
 --project_name <project_name_without_space>\
 --genome_assembly <genome_assembly_fasta>\
 --augustus_species <augustus_species>\
 --org_id <organism_id>\
 --sister_proteome <sister_proteome>\
 --num_cores <number_of_cpus_to_be_used>\
  • Augustus species: you should provide one augustus_species used in Augustus. This is the list what Augustus provides.
Phylum Class Species augustus_species
Ascomycota Eurotiomycetes Aspergillus fumigatus aspergillus_fumigatus
Ascomycota Eurotiomycetes Aspergillus nidulans aspergillus_nidulans
Ascomycota Eurotiomycetes Aspergillus oryzae aspergillus_oryzae
Ascomycota Eurotiomycetes Aspergillus terreus aspergillus_terreus
Ascomycota Leotiomycetes Botrytis cinerea botrytis_cinerea
Ascomycota Saccharomycetes Candida albicans candida_albicans
Ascomycota Saccharomycetes Candida guilliermondii candida_guilliermondii
Ascomycota Saccharomycetes Candida tropicalis candida_tropicalis
Ascomycota Sordariomycetes Chaetomium globosum chaetomium_globosum
Ascomycota Eurotiomycetes Coccidioides immitis coccidioides_immitis
Basidiomycota Agaricomycetes Coprinus cinereus coprinus
Basidiomycota Agaricomycetes Coprinus cinereus coprinus_cinereus
Basidiomycota Agaricomycetes Cryptococcus neoformans gattii cryptococcus_neoformans_gattii
Basidiomycota Agaricomycetes Cryptococcus neoformans gattii cryptococcus_neoformans_neoformans_B
Basidiomycota Agaricomycetes Cryptococcus neoformans gattii cryptococcus_neoformans_neoformans_JEC21
Ascomycota Saccharomycetes Debaryomyces hansenii debaryomyces_hansenii
Microsporidia   Encephalitozoon cuniculi encephalitozoon_cuniculi_GB
Ascomycota Saccharomycetes Eremothecium gossypii eremothecium_gossypii
Ascomycota Sordariomycetes Fusarium graminearum fusarium_graminearum
Ascomycota Eurotiomycetes Histoplasma capsulatum histoplasma_capsulatum
Ascomycota Saccharomycetes Kluyveromyces lactis kluyveromyces_lactis
Basidiomycota Agaricomycetes Laccaria bicolor laccaria_bicolor
Ascomycota Saccharomycetes Lodderomyces elongisporus lodderomyces_elongisporus
Ascomycota Sordariomycetes Magnaporthe grisea magnaporthe_grisea
Ascomycota Sordariomycetes Neurospora crassa neurospora_crassa
Basidiomycota Agaricomycetes Phanerochaete chrysosporium phanerochaete_chrysosporium
Ascomycota Saccharomycetes Pichia stipitis pichia_stipitis
Mucoromycotina Mucorales Rhizopus oryzae rhizopus_oryzae
Ascomycota Saccharomycetes Saccharomyces cerevisiae saccharomyces_cerevisiae_S288C
Ascomycota Saccharomycetes Saccharomyces cerevisiae saccharomyces_cerevisiae_rm11-1a_1
Ascomycota Schizosaccharomycetes Schizosaccharomyces pombe schizosaccharomyces_pombe
Basidiomycota Ustilaginomycetes Ustilago maydis ustilago_maydis
Ascomycota Saccharomycetes Yarrowia lipolytica yarrowia_lipolytica
  • Organism ID will be used in naming gene ID

3. Output

Final output will be located in output directory you gave in the arguments

  • fgap_output_prot.faa
  • fgap_output.gff3
  • fgap_output_stats.html

4. Trouble-shootings

This is very beta version of the software, so please don’t hesistate reporting any bug or error you have encountered at mbnmbn00@korea.ac.kr or mbnmbn00@gmail.com.