Usage

BioKit is a Dart package for Bioinformatics.

Ensure that you have BioKit installed before continuing.

This document is intended to make you proficient with BioKit in the least amount of time possible; you can read through it sequentially, or if you're reading this on biokit.org, use the heading menu on the right side of the page to jump to a topic of interest.

If you want a deeper look at how BioKit works, view our API Reference.

Creating Sequences#

Create a DNA, RNA or Peptide instance:

DNA dnaSeq = DNA(seq: 'ATGCTA');
RNA rnaSeq = RNA(seq: 'AUGCUA');
Peptide pepSeq = Peptide(seq: 'MSLAKR');

DNA and RNA classes must be initialized with a String of at least six valid nucleotides, while the Peptide class requires a minimum of two valid amino acids.

If any monomer in the sequence passed to the seq parameter is not valid for the class, an error is thrown.

Add Sequence Metadata#

Optionally, you can add name, id, and desc metadata when you instantiate the class. Using DNA as an example:

DNA dnaSeq = DNA(seq: 'ATGCTA', name: 'My Name', id: 'My ID', desc: 'My Description');

If you do not set a value for the name, id, or desc fields at the time of instantiation, each will receive a default String value.

Get Properties#

Return the values of the properties of a DNA, RNA, or Peptide instance:

dnaSeq.seq;
// ATGCTA
dnaSeq.len;
// 6
dnaSeq.id;
// Default ID
dnaSeq.name;
// Default name
dnaSeq.desc;
// Default description
dnaSeq.type;
// dna

Set Properties#

Update the properties of a DNA, RNA, or Peptide instance:

dnaSeq.name = 'New name';
dnaSeq.id = 'New ID';
dnaSeq.desc = 'New description';

Sequence Info#

View information about a DNA, RNA, or Peptide instance by calling its info() method or printing it to the console:

dnaSeq.info();
/*
{
"seq":"ATGCTA",
"type":"dna",
"monomers":6,
"name":"New Name",
"id":"New ID",
"desc":"New description"
}
*/
print(dnaSeq);
/*
{
"seq":"ATGCTA",
"type":"dna",
"monomers":6,
"name":"New Name",
"id":"New ID",
"desc":"New description"
}
*/

Random Sequences#

Return a random DNA, RNA, or Peptide instance with the random() method and pass the desired length of the sequence to the len parameter:

// A random DNA instance with 20 nucleotides.
DNA dnaSeq = DNA.random(len: 20);
dnaSeq.info();
/*
{
"seq":"TAACTTCGATCGCTCTGGCA",
"type":"dna",
"monomers":20,
"name":"Default Name",
"id":"Default ID",
"desc":"Default description"
}
*/

FASTA Data#

BioKit contains a number of methods and functions for working with FASTA formatted data.

Uniprot ID#

Return a String of protein data in FASTA format using the static uniprotIdToFASTA() method from the Utils class:

String proteinFASTA = await Utils.uniprotIdToFASTA(uniprotId: 'B5ZC00');
/*
>sp|B5ZC00|SYG_UREU1 Glycine--tRNA ligase OS=Ureaplasma urealyticum ...
MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTKQ
KDVVGLDSAIILNPLVWKASGHLDNFS ...
*/

Note that this method requires network access.

Read String#

Use the readFASTA() method to parse FASTA formatted String data.

readFASTA() is able to parse FASTA files containing multiple sequences, and hence returns a List:

List<Map<String, String>> proteinMaps = await Utils.readFASTA(str: proteinFASTA);
/*
[
{
"seq":"MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTK ... ",
"id":"sp|B5ZC00|SYG_UREU1",
"desc":"Glycine--tRNA ligase OS=Ureaplasma urealyticum serovar 10 (... "
}
]
*/

Read File#

Read in data from a FASTA formatted txt file:

List<Map<String, String>> dnaMaps = await Utils.readFASTA(path: './gene_bank.txt');
/*
[
{
"seq":"GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGG ... ",
"id":"HSBGPG",
"desc":"Human gene for bone gla protein (BGP)"
},
{
"seq":"CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGTAGAGACTAAATACCATA ... ",
"id":"HSGLTH1",
"desc":"Human theta 1-globin gene"
}
]
*/

Write File#

Write the contents of a DNA, RNA, or Peptide instance to a FASTA formatted txt file using the toFASTA() method:

// Get the first Map object.
Map<String, String> firstSeq = dnaMaps.first;
// Create a new DNA instance.
DNA dnaSeq = DNA(seq: firstSeq['seq']!, id: firstSeq['id']!, desc: firstSeq['desc']!);
// Write the instance contents to FASTA formatted file.
dnaSeq.toFASTA(path: '../deliverables', filename: 'my_dna_seq');
/*
>HSBGPG Human gene for bone gla protein (BGP)
GGCAGATTCCCCCTAGACCCGCCCGCACCATGGTCAGGCATGCCCCTCCTCATCGCTGGG
CACAGCCCAGAGGGTATAAACAGTGCTGGAGGCTGGCGGGGCAGGCCAGCTGAGTCCTGA
GCAGCAGCCCAGCGCAGCCACCGAGACA ...
*/

DNA Analysis Report#

Create a DNA analysis report by calling the report() method on a DNA instance:

dnaSeq.report(path: '../deliverables', creator: 'John Doe', title: 'BGP Report');

+ Operator#

Return the concatenated sequence result of two or more DNA, RNA, or Peptide instance sequences, of the same type, with the + operator:

RNA rnaSeq1 = RNA(seq: 'AUGCAG');
RNA rnaSeq2 = RNA(seq: 'GCUGAA');
rnaSeq1 + rnaSeq2;
// "AUGCAGGCUGAA"

Reversing#

Reverse a DNA, RNA, or Peptide instance's sequence with the reverse() method:

Peptide pepSeq = Peptide(seq: 'MPAG');
pepSeq.reverse();
// GAPM

Point Mutations#

Return the number of positional-differences between two DNA, RNA, or Peptide instance sequences, of the same type, with the difference() method:

DNA dnaSeq1 = DNA(seq: 'ATGCAT');
// Difference: "A" at index 1, and "T" at index 4.
DNA dnaSeq2 = DNA(seq: 'AAGCTT');
dnaSeq1.difference(oSeq: dnaSeq2)
// 2

Motif Detection#

BioKit has a number of functions and methods to convert and detect matches between a motif and the sequence of a DNA, RNA, or Peptide instance.

Find Motifs#

Return the indices of all matches between a DNA, RNA, or Peptide instance's sequence and the sequence passed to the findMotif() method's motif parameter:

RNA rnaSeq = RNA(seq: 'GAUAUAUC');
rnaSeq.findMotif(motif: 'AUAU');
/*
{
"matchCount":2,
"matchIndices":[
{
"match":"AUAU",
"startIndex":1,
"endIndex":4
},
{
"match":"AUAU",
"startIndex":3,
"endIndex":6
}
]
}
*/

Set overlap to false to return only the match indices that do not overlap:

rnaSeq.findMotif(motif: 'AUAU', overlap: false);
/*
{
"matchCount":1,
"matchIndices":[
{
"match":"AUAU",
"startIndex":0,
"endIndex":3
}
]
}
*/

Shared Motifs#

Return the longest shared motif between two DNA, RNA, or Peptide instance sequences, of the same type:

DNA dnaSeq1 = DNA('GATATA');
DNA dnaSeq2 = DNA('AGCATA');
dnaSeq1.sharedMotif(oSeq: dnaSeq2);
// ATA

Manually Convert Motif to Regex#

The findMotif() method automatically converts motifs passed to its motif parameter to regular-expression format, however, you can also perform the conversion manually using the motifToRe() function:

Utils.motifToRe(motif: 'N{P}[ST]{P}');
// 'N[^P][S|T|][^P]'
// No change needs to be made.
Utils.motifToRe(motif: 'ATGC');
// ATGC

Splicing#

Return a sequence with all occurrences of a motif removed from a DNA, RNA, or Peptide instance's sequence using the splice method, and passing the motif to the motif parameter:

RNA rnaSeq = RNA(seq: 'AUCAUGU');
// Removes all occurrences of 'AU'.
rnaSeq.splice(motif: 'AU');
// CGU

Monomer Frequency#

Return the frequency of each monomer in a DNA, RNA, or Peptide instance's sequence with the freq() method:

DNA dnaSeq = DNA(seq: 'AGCTTTTCAGC');
dnaSeq.freq();
/*
{
"A":2.0,
"G":2.0,
"C":3.0,
"T":4.0
}
*/

Percentage of Total#

Return the percentage of the total that each monomer count represents in the sequence by passing true to the norm parameter of the freq() method:

dnaSeq.freq(norm: true);
/*
{
"A":18.2,
"G":18.2,
"C":27.3,
"T":36.4
}
*/

Ignore the Stop Amino Acid#

When the translate() method is called on DNA or RNA instances, BioKit returns an amino acid sequence; when BioKit encounters a stop codon, rather than stoping translation, or ignoring the stop codon, BioKit places an "X" character at that position in the amino acid sequence:

// UAG is a stop codon
RNA rnaSeq = RNA(seq: 'CGGUAGACU');
rnaSeq.translate();
/*
{
"aaSeq":"RXT",
"nucCount":8,
"aaCount":3
}
*/

Therefore, If you use the aaSeq key's value to create a new Peptide instance, and then execute the freq() method, the "X" character will be taken into account as part of the calculation:

// Create a Peptide instance using the RNA instance translation product.
Peptide pepSeq = Peptide(seq: rnaSeq.translate()['aaSeq']!);
pepSeq.freq();
/*
{
"R":1.0,
"X":1.0,
"T":1.0
}
*/

However, if you do not want the "X" character to be taken into account as part of the calculation, pass true to the ignoreStopAA parameter of the freq() method:

pepSeq.freq(ignoreStopAA: true);
/*
{
"R":1.0,
"T":1.0
}
*/

Modified Sequence Length#

In addition to being able to return the length of a DNA, RNA, or Peptide instance's sequence by using the len getter:

DNA dnaSeq = DNA(seq: 'ATGCGAT');
dnaSeq.len;
// 7

You can also return the length of the sequence minus a particular monomer by using the lenMinus() method, and passing the monomer you'd like to discount:

dnaSeq.lenMinus(monomer: 'A');
// 5

Generate Combinations#

Return all possible combinations of a DNA, RNA, or Peptide instance's sequence using the combinations() method:

Peptide pepSeq = Peptide(seq: 'MSTC');
pepSeq.combinations();
// [M, MS, MST, MSTC, S, ST, STC, T, TC]

Sort the combinations by setting sorted to true:

pepSeq.combinations(sorted: true);
// [MSTC, MST, STC, MS, ST, TC, M, S, T]

Codon Frequency#

Return the frequency of a codon in a DNA or RNA instance's sequence using the codonFreq() method, passing the codon of interest to the codon parameter:

RNA rnaSeq = RNA(seq: 'AUGAGGAUGCACAUG');
rnaSeq.codonFreq(codon: 'AUG');
// 3

Be aware that codonFreq() scans the sequence in batches of three nucleotides per step, starting with the first three nucleotides in the sequence. Therefore, the exact codon must be present in a batch in order to be detected.

Complementary Strand#

Return the complementary strand to a DNA or RNA instance sequence's with the complementary() method:

DNA dnaSeq = DNA(seq: 'AAACCCGGT');
dnaSeq.complementary();
// TTTGGGCCA

To return the reverse complementary strand, pass true to the rev parameter:

dnaSeq.complementary(rev: true);
// ACCGGGTTT

Guanine & Cytosine Content#

Return the percentage of Guanine and Cytosine content in a DNA or RNA instance's sequence with the gcContent() method:

DNA dnaSeq = DNA(seq: 'TCCCTACGCCG');
dnaSeq.gcContent();
// 72.73

Translation#

Return the amino acid translation product from a DNA or RNA instance's sequence, using the translate() method:

RNA rnaSeq = RNA(seq: 'AUGGCCAUGGCGCCCAGAACU');
rnaSeq.translate();
/*
{
"aaSeq":"MAMAPRT",
"nucCount":20,
"aaCount":7
}
*/

Return the reverse complementary translation strand by passing true to the rev parameter:

rnaSeq.translate(rev: true);
/*
{
"aaSeq":"SSGRHGH",
"nucCount":20,
"aaCount":7
}
*/

Modify the index in which translation starts by passing the desired start index to the startIdx parameter:

rnaSeq.translate(startIdx: 2);
/*
{
"aaSeq":"GHGAQN",
"nucCount":18,
"aaCount":6
}
*/

Generate Proteins#

Return proteins from open reading frames present in a DNA or RNA instance sequence's with the proteins() method:

DNA dnaSeq = DNA(seq: 'AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCTGAATGATCCGAGTAGCATCTCAG');
dnaSeq.proteins();
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M, M]

Return only unique proteins by passing true to the unique parameter:

dnaSeq.proteins(unique: true);
// [MLLGSFRGHPHVT, MGMTPE, MTPE, M]

Transcription#

Return the RNA transcription product from a DNA instance's sequence using the transcribe() method:

DNA dnaSeq = DNA(seq: 'TACGTAA');
dnaSeq.transcribe();
// UACGUAA

Change where transcription starts from by passing the desired start index to the startIdx parameter:

dnaSeq.transcribe(startIdx: 3);
// GUAA

Restriction Sites#

Return restriction sites in a DNA instance's sequence with the restrictionSites() method:

DNA dnaSeq = DNA(seq: 'TGCATGTCTATATG');
dnaSeq.restrictionSites();
/*
{
"TGCA":[
{
"startIdx":0,
"endIndex":4
}
],
"CATG":[
{
"startIdx":2,
"endIndex":6
}
],
"TATA":[
{
"startIdx":8,
"endIndex":12
}
],
"ATAT":[
{
"startIdx":9,
"endIndex":13
}
]
}
*/

Pass values to the minSiteLen and maxSiteLen parameters to change the restriction site search length.

Transition/Transversion Ratio#

Return the transition/transversion ratio between two DNA instance sequences with the tranRatio() method:

DNA dnaSeq1 = DNA(seq: 'GACTGGTGGAAGT');
DNA dnaSeq2 = DNA(seq: 'TTATCGGCTGAAT');
dnaSeq1.tranRatio(oSeq: dnaSeq2);
// 0.29

Note that if the number of transversions is equal to 0, the method returns -1, as division by 0 is undefined and leads to a result of inf.

Double Helix Geometric Length#

Return the geometric length (nm) of a double helix formed by a DNA instance's sequence using the dHelixGeoLen() method:

DNA dnaSeq = DNA(seq: 'ATGCATGC');
dnaSeq.dHelixGeoLen();
// 2.72

Double Helix Turns#

Return the number of turns in a double helix formed by a DNA instance's sequence using the dHelixTurns() method:

DNA dnaSeq = DNA(seq: 'ATGCATGCATGCATGC');
dnaSeq.dHelixTurns();
// 1.6

Reverse Transcription#

Return the reverse transcription product from an RNA instance's sequence using the revTranscribe() method:

RNA rnaSeq = RNA(seq: 'AUGCUAGU');
rnaSeq.revTranscribe();
// ATGCTAGT

Monoisotopic Mass#

Return the Monoisotopic mass (Da) of a Peptide instance's sequence using the monoMass() method:

Peptide pepSeq = Peptide(seq: 'MSTGARVD');
pepSeq.monoMass();
// 817.38

Modify the number of decimal places by passing a the desired number of decimals to the decimals parameter:

pepSeq.monoMass(decimals: 1);
// 817.4

Return the Monoisotopic mass in kDa by passing true to the kDa parameter:

pepSeq.monoMass(kDa: true);
// 0.82