BioPHP: PHP for Biocomputing


[ Home Page ] [ I/O Demo Scripts Page ]

Source Code Listing of kegg.inc.php

			

Note: This is part of BioPHP 1.1 alpha code set. This (1,600+ lines) is still rough. It also contains some rather comments (in green) and lengthy discussions (in red) of algorithms or rules used. Lastly, it depends on two include files, the alpha versions of "seqdb.inc.php" and "etc.inc.php", which I will post shortly. Suggestions on improving it are very welcome!
// This code was written by Serge Gregorio, Jr. (serge525@hotmail.com) // and is under the General Public License (GPL) 2.0 license. <?php require_once("seqdb.inc.php"); require_once("etc.inc.php"); /* kegg.inc - include file containing the classes for the Kegg Ligand Database objects - Enzyme, Reaction, Molecule, Ligand, and Compound. Description: Contains the definition of the above classes and some helper functions. Author: Serge Gregorio, Jr. Date: May 4, 2003 License: General Public License 2.0 This code has been written as part of the GenePHP/BioPHP project, located at: http://genephp.sourceforge.net */ class Compound { var $entry; var $name; var $formula; var $reaction; var $pathway; var $enzyme; var $dblink; // var $structures; - sample compound file I've obtained from Kegg does not seem to // contain any STRUCTURE/S entry. } class Enzyme { var $entry; var $name; var $class; var $sysname; var $reaction; var $substrate; var $product; var $comment; var $reference; var $pathway; var $ortholog; var $genes; var $disease; var $motif; var $structures; var $dblinks; } class Reaction { var $entry; var $name; var $definition; var $pathway; // array e.g. ( ('MAP00220', 'Urea cycle'), ('MAP00221', 'Citric cycle') ) var $equation; var $enzyme; // array e.g. ( '3.6.3.50', '3.5.3.49', '3.6.1.8', ... ) } class Ortholog { var $entry; var $name; var $definition; var $class; var $dblinks; var $genes; } class Ligand { // Still empty. Anyone interested in writing this? } class Mol { // Still empty. Anyone interested in writing this? } class ECRel { var $enzyme1; var $enzyme2; var $pathway; } // Example line: // ec:1.1.1.1 ec:1.2.1.3 ECrel C00577 map00561 C00071 map00071 C05445 map00120 function parse_ecrel_kegg($fline) { $words = preg_split("/[\s]+/", $fline, -1, PREG_SPLIT_NO_EMPTY); $enzyme1 = $words[0]; $enzyme2 = $words[1]; $wno = 3; $outer = array(); while(1) { if ($wno < count($words)) { $inner = array(); $inner[] = $words[$wno]; $inner[] = $words[$wno+1]; $outer[] = $inner; $wno = $wno + 2; } else break; } $pathway = $outer; $oECRel = new ECRel(); $oECRel->enzyme1 = $enzyme1; $oECRel->enzyme2 = $enzyme2; $oECRel->pathway = $pathway; return $oECRel; } class Genome { var $entry; var $name; var $definition; var $taxonomy; // Parser for these data fields not yet implemented var $lineage; var $reference; var $authors; var $title; var $journal; var $chromosome; var $chrom_seq; var $chrom_len; var $scaffold; var $scaf_seq; var $scaf_len; var $plasmid; var $plasm_seq; var $plasm_len; var $statistics; var $genome_map; } function parse_genome_kegg($flines) { // Initialize variables used inside the WHILE parser loop. $in_def_flag = FALSE; $def_string = ""; $tax_string = ""; while( list($lineno, $linestr) = each($flines) ) { // OPENS outermost while() loop $label = trim(left($linestr, 12)); // Assume that ENTRY is always one string in one line. if ($label == "ENTRY") $entry = trim(substr($linestr,12)); // Assume that NAME is always one string in one line. if ($label == "NAME") $name = trim(substr($linestr,12)); // DEFINITION field may be one or more lines - follows CONTINUATION ($) convention if ($label == "DEFINITION") { $def_string = trim(substr($linestr,12)); $in_def_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_def_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') $def_string .= rtrim(substr($linestr,13)); else $def_string .= " " . trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_def_flag) ) { $def_string = trim($def_string); $in_def_flag = FALSE; } // Assume that TAXONOMY is always one string in one line of the form: // TAXONOMY TAX:1234 if ($label == "TAXONOMY") { $tax_string = trim(substr($linestr,12)); $tax_array = preg_split("/:/", $tax_string, -1, PREG_SPLIT_NO_EMPTY); $taxonomy = trim($tax_array[1]); } if ($label == "///") break; } $oGenome = new Genome(); $oGenome->entry = $entry; $oGenome->name = $name; $oGenome->definition = $def_string; $oGenome->taxonomy = $taxonomy; /* This section is still unfinished. Here, we are supposed to set the values of other attributes of the oGenome class like "lineage", "authors", "plasmid", etc. */ return $oGenome; } function parse_ortho_kegg($flines) { // Initialization of variables. $def_string = ""; $in_def_flag = FALSE; $class_string = ""; $in_class_flag = FALSE; $aClasses = array(); $in_dblink_flag = FALSE; $aDblinks = array(); $in_genes_flag = FALSE; $genes_string = ""; $aGenes = array(); $in_dblink_flag = FALSE; $dblink_string = ""; $aDblinks = array(); while( list($lineno, $linestr) = each($flines) ) { // OPENS outermost while() loop $label = trim(left($linestr, 12)); // Assume that ENTRY is always one line. if ($label == "ENTRY") { $entry = trim(substr($linestr,12,28)); $entry_type = trim(substr($linestr,40,2)); } // Assume that NAME is always one line. if ($label == "NAME") $name = trim(substr($linestr, 12)); // DEFINITION field may be one or more lines - follows CONTINUATION ($) convention if ($label == "DEFINITION") { $def_string = trim(substr($linestr,12)); $in_def_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_def_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') $def_string .= rtrim(substr($linestr,13)); else $def_string .= " " . trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_def_flag) ) { $def_string = trim($def_string); $in_def_flag = FALSE; } /* CLASS field - first line is the class, 2nd is subclass, 3rd is the sub-subclass. Store as an array. Example: CLASS Metabolism; Carbohydrate Metabolism; Glycolysis / Gluconeogenesis [PATH:ot00010] Metabolism; Lipid Metabolism; Fatty acid metabolism [PATH:ot00071] Metabolism; Lipid Metabolism; Bile acid biosynthesis [PATH:ot00120] Metabolism; Amino Acid Metabolism; Tyrosine metabolism [PATH:ot00350] Metabolism; Metabolism of Complex Lipids; Glycerolipid metabolism [PATH:ot00561] Syntax: CLASS class1; subclass1; sub-subclass1 [PATH:path_id1] class2; subclass2; sub-subclass2 [PATH:path_id2] Algorithm: 1) Concatenate all lines, adding a whitespace char in between lines. 2) Separate/split by the ] character, e.g. Metabolism; Carbo meta; Glycolysis [PATH:0t0001 3) Further split each token by the [ character. Metabolism; Carbo meta; Glycosis PATH:ot0001 4) Further split the first token by ; 5) Further split the second token by : 6) We now have: class => Metabolism subclass => Carbo meta subsubclass => Glycolysis path_id => ot0001 */ if ($label == "CLASS") { $aClasses = array(); // $aClasses[] = trim(substr($linestr,12)); $class_string .= trim(substr($linestr,12)) . " "; $in_class_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_class_flag) ) { $class_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_class_flag) ) { $class_string = trim($class_string); $temp = preg_split("/\]/", $class_string, -1, PREG_SPLIT_NO_EMPTY); foreach($temp as $temp2) { $inner = array(); // Example of $temp2: Metabolism; Carbo meta; Glycolysis [PATH:0t0001 $temp3 = preg_split("/[;\[\:]/", $temp2, -1, PREG_SPLIT_NO_EMPTY); // Example of $temp3: Metabolism, Carbo meta, Glycolysis, PATH, ot0001 $inner["CLASS"] = trim($temp3[0]); $inner["SUBCLASS"] = trim($temp3[1]); $inner["SUBSUBCLASS"] = trim($temp3[2]); $inner["PATH_ID"] = trim($temp3[4]); $aClasses[] = $inner; } $in_class_flag = FALSE; } // Like the GENES section of the ENZYME class, we assume here one dbsource : one or more dblinks : one line if ($label == "DBLINKS") { $aDblinks = array(); $dblink_string = trim(substr($linestr,12)); $dblink_array = preg_split("/: /", $dblink_string, -1, PREG_SPLIT_NO_EMPTY); $db_code = trim($dblink_array[0]); $dblinks = trim($dblink_array[1]); $links_array = preg_split("/\s+/", $dblinks, -1, PREG_SPLIT_NO_EMPTY); array_walk($links_array, "trim_element"); $aDblinks[$db_code] = $links_array; $in_dblink_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_dblink_flag) ) { // at the 2nd, 3rd, etc. line of the DBLINKS data field. $dblink_string = trim(substr($linestr,12)); $dblink_array = preg_split("/: /", $dblink_string, -1, PREG_SPLIT_NO_EMPTY); $db_code = trim($dblink_array[0]); $dblinks = trim($dblink_array[1]); $links_array = preg_split("/\s+/", $dblinks, -1, PREG_SPLIT_NO_EMPTY); array_walk($links_array, "trim_element"); $aDblinks[$db_code] = $links_array; } elseif ( (strlen($label) > 0) and ($in_dblink_flag) ) { $in_dblink_flag = FALSE; } /* MLO: mll6011 mll6012 mll7727 mll7731 SME: SMb21122(mccA) SMb21124(mccB) ATU: Atu3478(mccB) Atu3479(mccA) ATC: AGR_L_2704 AGR_L_2706 Unlike in the enzyme parser, here we do not assume 1 organism : one line : one or more gene names separated by whitespace(s). The code here supports multiple lines per organism, e.g. MLO: gene1 gene2 gene3 gene4 gene5 gene6 SME: gene1 gene2 */ // Let me try a different approach. if ($label == "GENES") { $genes_string .= trim(substr($linestr,12)) . " "; $in_genes_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_genes_flag) ) // at the 2nd, 3rd, etc. line of the GENES data field. $genes_string .= trim(substr($linestr,12)) . " "; elseif ( (strlen($label) > 0) and ($in_genes_flag) ) { $genes_string = trim($genes_string); $genes_array = preg_split("/: /", $genes_string, -1, PREG_SPLIT_NO_EMPTY); // ABC, "A123 B234 C345 BCD", "X123 Y234 Z345 CDE", ..., "D123 E234 F345" // The last token in each string (except for the first and last) represents // a 3-letter organism code. // Desired output format: ( "ABC" => (A123, ...), "BCD" => (B123, ...) ) $ctr = 0; $item_count = count($genes_array); foreach($genes_array as $gene) { $ctr++; if ($ctr == 1) $prev_label = $gene; if ( ($ctr > 1) and ($ctr < $item_count) ) { $gene_names = preg_split("/\s+/", $gene, -1, PREG_SPLIT_NO_EMPTY); $name_count = count($gene_names); $last_item = array_pop($gene_names); $gene_names = array_slice($gene_names, 0, $name_count-1); array_walk($gene_names, "trim_element"); $aGenes[$prev_label] = $gene_names; $prev_label = $last_item; } if ($ctr == $item_count) { $gene_names = preg_split("/\s+/", $gene, -1, PREG_SPLIT_NO_EMPTY); array_walk($gene_names, "trim_element"); $aGenes[$prev_label] = $gene_names; } } $in_genes_flag = FALSE; } if ($label == '///') break; } // CLOSES outermost while() loop // Save parsed values to a new Reaction object $oOrtholog = new Ortholog(); $oOrtholog->entry = $entry; $oOrtholog->entry_type = $entry_type; $oOrtholog->name = $name; $oOrtholog->definition = $def_string; $oOrtholog->class = $aClasses; $oOrtholog->dblinks = $aDblinks; $oOrtholog->genes = $aGenes; return $oOrtholog; } function parse_compound_kegg($flines) { // Initialization of variables. $in_name_flag = FALSE; $aNames = array(); $name_string = ""; $in_path_flag = FALSE; $path_string = ""; $aPaths = array(); $in_react_flag = FALSE; $react_string = ""; $in_enzyme_flag = FALSE; $enzyme_string = ""; $in_dblink_flag = FALSE; $aDblinks = array(); while( list($lineno, $linestr) = each($flines) ) { // OPENS outermost while() loop $label = trim(left($linestr, 12)); // Assume that ENTRY is always one line. if ($label == "ENTRY") $entry = trim(substr($linestr, 12)); // NAME entry is made up of one or more names, the preferred name is at // the first line, other alternative names are in succeeding lines. It // is possible for a long name to occupy two or more lines. But for now, // let's assume one name in one line. if ($label == "NAME") { $aNames = array(); $aNames[] = trim(substr($linestr,12)); $in_name_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_name_flag) ) { $aNames[] = trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_name_flag) ) { $in_name_flag = FALSE; } // Assume FORMULA is only one line. if ($label == "FORMULA") { $formula = trim(substr($linestr,12)); } if ($label == "PATHWAY") { // at the first line of PATHWAY data field. $path_string = trim(substr($linestr,12)) . " "; $in_path_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_path_flag) ) { // at the 2nd, 3rd, etc. line of the PATHWAY data field. $path_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_path_flag) ) { // at the first line of the next data section (or end of entry). $pathway = trim($path_string); $paths = preg_split("/PATH:/", $pathway, -1, PREG_SPLIT_NO_EMPTY); array_walk($paths, "trim_element"); $aPaths = array(); foreach($paths as $path) { $path_id = trim(left($path,10)); $path_desc = trim(substr($path,10)); $path_array = array($path_id, $path_desc); $aPaths[] = $path_array; } // set the flag to indicate end of PATHWAY data field. $in_path_flag = FALSE; } // REACTION field is like ENZYME, it's made up of one or more lines // containing REACTION IDs (of fixed length) separated by spaces. if ($label == "REACTION") { $react_string = trim(substr($linestr,12)) . " "; $in_react_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_react_flag) ) { $react_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_react_flag) ) { $react = trim($react_string); $aReactions = preg_split("/\s/", $react, -1, PREG_SPLIT_NO_EMPTY); array_walk($aReactions, "trim_element"); $in_react_flag = FALSE; } // Assume that ENZYME data field is always one line. if ($label == "ENZYME") { $enzyme_string = trim(substr($linestr,12)) . " "; $in_enzyme_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_enzyme_flag) ) { $enzyme_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_enzyme_flag) ) { $enzyme = trim($enzyme_string); $aEnzymes = preg_split("/\s/", $enzyme, -1, PREG_SPLIT_NO_EMPTY); array_walk($aEnzymes, "trim_element"); $in_enzyme_flag = FALSE; } // It appears there are no STRUCTURE/S entries in the 'COMPOUND' file. if ($label == "DBLINKS") { $aDblinks = array(); $dblink = preg_split("/: /", trim(substr($linestr,12)), -1, PREG_SPLIT_NO_EMPTY); array_walk($dblink, "trim_element"); $aDblinks[] = $dblink; $in_dblink_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_dblink_flag) ) { // at the 2nd, 3rd, etc. line of the DBLINK data field. $dblink = preg_split("/: /", trim(substr($linestr,12)), -1, PREG_SPLIT_NO_EMPTY); array_walk($dblink, "trim_element"); $aDblinks[] = $dblink; } elseif ( (strlen($label) > 0) and ($in_dblink_flag) ) { $in_dblink_flag = FALSE; } if ($label == '///') break; } // CLOSES outermost while() loop // Save parsed values to a new Reaction object $oCompound = new Compound(); $oCompound->entry = $entry; $oCompound->name = $aNames; $oCompound->formula = $formula; $oCompound->pathway = $aPaths; $oCompound->reaction = $aReactions; $oCompound->enzyme = $aEnzymes; $oCompound->dblink = $aDblinks; return $oCompound; } // Each enzyme file should end with a blank line after the last record terminator ('///'), // otherwise, the parser will skip over the last entry of the file which does not follow // this format/rule. function parse_enzyme_kegg($flines) { // Initialization of variables. $in_name_flag = FALSE; $aNames = array(); $name_string = ""; $in_sysname_flag = FALSE; $sysname_string = ""; $in_react_flag = FALSE; $react_string = ""; $aReactions = array(); $in_sub_flag = FALSE; $sub_string = ""; $in_prod_flag = FALSE; $prod_string = ""; $aProducts = array(); $in_comm_flag = FALSE; $comm_string = ""; $in_ref_flag = FALSE; $ref_string = ""; $aRefs = array(); $inner = array(); $in_path_flag = FALSE; $path_string = ""; $aPaths = array(); $in_dblink_flag = FALSE; $aDblinks = array(); $in_struct_flag = FALSE; $struct_string = ""; $aStructs = array(); $in_ortho_flag = FALSE; $ortho_string = ""; $aOrthologs = array(); $in_genes_flag = FALSE; $genes_string = ""; $aGenes = array(); $in_dis_flag = FALSE; $dis_string = ""; $aDiseases = array(); $in_motif_flag = FALSE; $motif_string = ""; $aMotifs = array(); while( list($lineno, $linestr) = each($flines) ) { // OPENS outermost while() loop $label = trim(left($linestr, 12)); // Assume that ENTRY is always one line. if ($label == "ENTRY") $entry = trim(substr($linestr, 12)); // NAME entry is made up of one or more names, the preferred name is at // the first line, other alternative names are in succeeding lines. It // is possible for a long name to occupy two or more lines. But for now, // let's assume one name in one line. if ($label == "NAME") { $aNames = array(); $aNames[] = trim(substr($linestr,12)); $in_name_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_name_flag) ) { $aNames[] = trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_name_flag) ) { $in_name_flag = FALSE; } // CLASS field - first line is the class, 2nd is subclass, 3rd is // the sub-subclass. Store as an array. if ($label == "CLASS") { $aClasses = array(); $aClasses[] = trim(substr($linestr,12)); $in_class_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_class_flag) ) { $aClasses[] = trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_class_flag) ) { $in_class_flag = FALSE; } // SYSNAME field may be one or more lines. SYSNAME follows conventions // for multiple lines as stated in the Ligand Manual (e.g. uses $ sign). if ($label == "SYSNAME") { $sysname_string = trim(substr($linestr,12)); $in_sysname_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_sysname_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') $sysname_string .= trim(substr($linestr,13)); else $sysname_string .= " " . trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_sysname_flag) ) { $sysname = trim($sysname_string); $in_sysname_flag = FALSE; } // REACTION field may be one or more lines. Format is different from // the REACTION field in 'COMPOUNDS' file. Follows the '$' convention // for multiple lines. When doing a scan(?) of $aReactions array, it // splits each string into one or more lines (\n) but the value is not // affected (it's still one string). if ($label == "REACTION") { $react_string = substr($linestr,12); $in_react_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_react_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') $react_string .= substr($linestr,13); else $react_string .= " " . substr($linestr,12); } elseif ( (strlen($label) > 0) and ($in_react_flag) ) { $reaction = trim($react_string); $aReactions = preg_split("/;/", $reaction, -1, PREG_SPLIT_NO_EMPTY); array_walk($aReactions, "trim_element"); $in_react_flag = FALSE; } // SUBSTRATE field - contains entries found in the LEFT SIDE of the // REACTION equation (before the '=' sign). Items are to be stored // in an array. Parsing technique will be a combination of those // used for CLASS and REACTION (array of possibly multi-line entries, // following the "$ convention"). /* Get one line. Store in tempstring. Get next line. Check if first char is '$'. If it is, concatenate to tempstring. If not, then it's a new entry, store tempstring to array of substrate entries. */ if ($label == "SUBSTRATE") { $aSubstrates = array(); $sub_string = substr($linestr,12); $in_sub_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_sub_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') { // current line is a continuation of the previous line(s). $sub_string .= substr($linestr,13); } else { // $aSubstrates[] = trim($sub_string); $sub_string = substr($linestr,12); } } elseif ( (strlen($label) > 0) and ($in_sub_flag) ) { $aSubstrates[] = trim($sub_string); $in_sub_flag = FALSE; } // PRODUCT field - contains entries found in the RIGHT SIDE of the // REACTION field. Parsed the same way as SUBSTRATE. if ($label == "PRODUCT") { $aProducts = array(); $prod_string = substr($linestr,12); $in_prod_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_prod_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') { // current line is a continuation of the previous line(s). $prod_string .= substr($linestr,13); } else { // $aProducts[] = trim($prod_string); $prod_string = substr($linestr,12); } } elseif ( (strlen($label) > 0) and ($in_prod_flag) ) { $aProducts[] = trim($prod_string); $in_prod_flag = FALSE; } // DEFINITION field may be one or more lines if ($label == "COMMENT") { $comm_string .= trim(substr($linestr,12)) . " "; $in_comm_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_comm_flag) ) { // at the 2nd, 3rd, etc. line of the DEFINITION data field. $comm_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_comm_flag) ) { $comment = trim($comm_string); $in_comm_flag = FALSE; } // REFERENCE field - skip this for now. /* REFERENCE 1 Branden, G.-I., Jornvall, H., Eklund, H. and Furugren, B. Alcohol dehydrogenase. In: Boyer, P.D. (Ed.), The Enzymes, 3rd ed., vol. 11, Academic Press, New York, 1975, p. 103-190. 2 [UI:77115786] Jornvall, H. Differences between alcohol dehydrogenases. Structural properties and evolutionary aspects. Eur. J. Biochem. 72 (1977) 443-452. 3 Negelein, E. and Wulff, H.-J. Diphosphopyridinproteid ackohol, acetaldehyd. Biochem. Z. 293 (1937) 351-389. 4 Sund, H. and Theorell, H. Alcohol dehydrogenase. In: Boyer, P.D., Lardy, H. and Myrback, K. (Eds.), The Enzymes, 2nd ed., vol. 7, Academic Press, New York, 1963, p. 25-83. 5 Theorell, H. Kinetics and equilibria in the liver alcohol dehydrogenase system. Adv. Enzymol. Relat. Subj. Biochem. 20 (1958) 31-49. Sample output: ( 0 => (ID => [UI:xxx], AUTHORS => (Branden, G.I., Jornvall, H., Eklund, H., Furugren, B.), TITLE => Alcohol dehydrogenase, ... ), 1 => (ID => [UI:xxx], AUTHORS => (Branden, G.I., Jornvall, H., Eklund, H., Furugren, B.), TITLE => Alcohol dehydrogenase, ... ) ) */ if ($label == "REFERENCE") { $ref_ctr = 0; $aRefs = array(); $inner = array(); // we are at the start of a reference set and the first reference entry. $ref_string = trim(substr($linestr,12)); $ref_array = preg_split("/[\s]+/", $ref_string); $inner["ID"] = $ref_array[1]; // $aRefs[] = $inner; $ref_string = ""; $in_ref_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_ref_flag) ) { $current_string = trim(substr($linestr,12)); $ref_array = preg_split("/[\s]+/", $current_string); $first_token = $ref_array[0]; if (is_numeric($first_token) == TRUE) { // we are at the start of a reference (entry) within a reference set. $ref_ctr++; $inner["REFSTR"] = $ref_string; $aRefs[] = $inner; $inner = array(); $ref_string = ""; } else $ref_string .= $current_string . " "; } elseif ( (strlen($label) > 0) and ($in_ref_flag) ) { $ref_ctr++; $inner["REFSTR"] = $ref_string; $aRefs[] = $inner; $ref_string = ""; $in_ref_flag = FALSE; $tempo = array(); foreach($aRefs as $refno => $ref_array) { // opens FOREACH $ref_string = $ref_array["REFSTR"]; $ref_string_len = strlen($ref_string); $period_flag = FALSE; $space_flag = FALSE; $author_string = ""; $refstr = preg_split("/\.\s[^a]/", $ref_string, -1, PREG_SPLIT_NO_EMPTY); $pos = strpos($ref_string, $refstr[1]); $rest_ref = substr($ref_string, $pos-1); $authors = $refstr[0] . "."; $authors_array = preg_split("/\.\,/", $authors, -1, PREG_SPLIT_NO_EMPTY); array_walk($authors_array, "trim_element"); $auth_len = count($authors_array); $ctr = 0; $temp = array(); foreach($authors_array as $auth) { $ctr++; if ($ctr == $auth_len) { $last_auth = preg_split("/\sand\s/", $auth, -1, PREG_SPLIT_NO_EMPTY); $temp[] = $last_auth[0]; $temp[] = $last_auth[1]; } else { $temp[] = $auth . "."; } } $tempo["ID"] = $ref_array["ID"]; $tempo["AUTHORS"] = $temp; $tempo["REST_REF"] = $rest_ref; $outer[] = $tempo; } // closes FOREACH } // PATHWAY field - may be one or more lines. if ($label == "PATHWAY") { // at the first line of PATHWAY data field. $path_string = trim(substr($linestr,12)) . " "; $in_path_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_path_flag) ) { // at the 2nd, 3rd, etc. line of the PATHWAY data field. $path_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_path_flag) ) { // at the first line of the next data section (or end of entry). $pathway = trim($path_string); $paths = preg_split("/PATH:/", $pathway, -1, PREG_SPLIT_NO_EMPTY); array_walk($paths, "trim_element"); $aPaths = array(); foreach($paths as $path) { $path_id = trim(left($path,10)); $path_desc = trim(substr($path,10)); $path_array = array($path_id, $path_desc); $aPaths[] = $path_array; } // set the flag to indicate end of PATHWAY data field. $in_path_flag = FALSE; } if ($label == "STRUCTURES") { $struct_string = trim(substr($linestr, 12)); $aStructs = preg_split("/: /", $struct_string, -1, PREG_SPLIT_NO_EMPTY); $struct_db = $aStructs[0]; $struct_string = $aStructs[1]; $in_struct_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_struct_flag) ) { $struct_string .= " " . trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_struct_flag) ) { $aStructs = preg_split("/\s/", $struct_string, -1, PREG_SPLIT_NO_EMPTY); $in_struct_flag = FALSE; } if ($label == "DBLINKS") { $aDblinks = array(); $dblink = preg_split("/: /", trim(substr($linestr,12)), -1, PREG_SPLIT_NO_EMPTY); array_walk($dblink, "trim_element"); $aDblinks[] = $dblink; $in_dblink_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_dblink_flag) ) { // at the 2nd, 3rd, etc. line of the DBLINK data field. $dblink = preg_split("/: /", trim(substr($linestr,12)), -1, PREG_SPLIT_NO_EMPTY); array_walk($dblink, "trim_element"); $aDblinks[] = $dblink; } elseif ( (strlen($label) > 0) and ($in_dblink_flag) ) { $in_dblink_flag = FALSE; } /* ORTHOLOG KO: K01967 3-methylcrotonyl-CoA carboxylase KO: K01968 3-methylcrotonyl-CoA carboxylase alpha subunit KO: K01969 3-methylcrotonyl-CoA carboxylase beta subunit PATHWAY PATH: MAP00010 Glycolysis / Gluconeogenesis PATH: MAP00071 Fatty acid metabolism */ if ($label == "ORTHOLOG") { // at the first line of ORTHOLOG data field. $ortho_string = trim(substr($linestr,12)) . " "; $in_ortho_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_ortho_flag) ) { // at the 2nd, 3rd, etc. line of the ORTHOLOG data field. $ortho_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_ortho_flag) ) { // at the first line of the next data section (or end of entry). $ortholog = trim($ortho_string); $orthos = preg_split("/KO:/", $ortholog, -1, PREG_SPLIT_NO_EMPTY); array_walk($orthos, "trim_element"); $aOrthologs = array(); foreach($orthos as $ortho) { $ortho_id = trim(left($ortho,8)); $ortho_desc = trim(substr($ortho,8)); $ortho_array = array($ortho_id, $ortho_desc); $aOrthologs[] = $ortho_array; } // set the flag to indicate end of ORTHOLOG data field. $in_ortho_flag = FALSE; } /* MLO: mll6011 mll6012 mll7727 mll7731 SME: SMb21122(mccA) SMb21124(mccB) ATU: Atu3478(mccB) Atu3479(mccA) ATC: AGR_L_2704 AGR_L_2706 For now, assume one organism (and its 3-letter code) : one line : one or more genes separated by whitespace(s). */ if ($label == "GENES") { $aGenes = array(); $genes_string = trim(substr($linestr,12)); $genes_array = preg_split("/: /", $genes_string, -1, PREG_SPLIT_NO_EMPTY); $org_code = $genes_array[0]; $genes_codes = $genes_array[1]; $genes_array = preg_split("/\s/", $genes_codes, -1, PREG_SPLIT_NO_EMPTY); $aGenes[] = array($org_code, $genes_array); $in_genes_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_genes_flag) ) { // at the 2nd, 3rd, etc. line of the GENES data field. $genes_string = trim(substr($linestr,12)); $genes_array = preg_split("/: /", $genes_string, -1, PREG_SPLIT_NO_EMPTY); $org_code = $genes_array[0]; $genes_codes = $genes_array[1]; $genes_array = preg_split("/\s/", $genes_codes, -1, PREG_SPLIT_NO_EMPTY); $aGenes[] = array($org_code, $genes_array); } elseif ( (strlen($label) > 0) and ($in_genes_flag) ) { $in_genes_flag = FALSE; } // DISEASE data field - code will work even if one entry (beginning with MIM:) occupies 2 // or more lines before the next MIM: is found. if ($label == "DISEASE") { $aDiseases = array(); // at the first line of DISEASE data field. $dis_string = trim(substr($linestr,12)) . " "; $in_dis_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_dis_flag) ) { // at the 2nd, 3rd, etc. line of the DISEASE data field. $dis_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_dis_flag) ) { // at the first line of the next data section (or end of entry). $disease = trim($dis_string); $diseases = preg_split("/MIM:/", $disease, -1, PREG_SPLIT_NO_EMPTY); array_walk($diseases, "trim_element"); $aDiseases = array(); foreach($diseases as $dis) { $dis_id = trim(left($dis,8)); $dis_desc = trim(substr($dis,8)); $dis_array = array($dis_id, $dis_desc); $aDiseases[] = $dis_array; } // set the flag to indicate end of DISEASE data field. $in_dis_flag = FALSE; } /* Assumption: When broken into two or more lines, a motif is to be reconstructed by concatenating the different lines WITHOUT a whitespace in between lines. Other multi-line data fields are connected with whitespaces in between lines. MOTIF PS: PS01055 K-[LIVM]-D-G-[LIVM]-[SA]-x(4)-Y-x(2)-G-x-[LF]-x(4)- [ST]-R-G-[DN]-G-x(2)-G-[DE]-[DENL] PS: PS01056 [IV]-G-[KR]-[ST]-G-x-[LIVM]-[STNK]-x-[VT]-x(2)-L-x- [PS]-[IV] PS: PS50172 BRCT domain profile PS: PS00061 [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K- {PC}-[SAGFYR]-[LIVMSTAGD]-x(2)-[LIVMFYW]-x(3)- [LIVMFYWGAPTHQ]-[GSACQRHM] */ if ($label == "MOTIF") { $motif_string = trim(substr($linestr,12)); $in_motif_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_motif_flag) ) { $motif_string .= trim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_motif_flag) ) { $motif = trim($motif_string); $motifs = preg_split("/PS:/", $motif, -1, PREG_SPLIT_NO_EMPTY); array_walk($motifs, "trim_element"); $aMotifs = array(); foreach($motifs as $motif) { $motif_id = trim(left($motif,9)); $motif_desc = trim(substr($motif,9)); $motif_array = array($motif_id, $motif_desc); $aMotifs[] = $motif_array; } $in_motif_flag = FALSE; } if ($label == '///') break; } // CLOSES outermost while() loop $oEnzyme = new Enzyme(); $oEnzyme->entry = $entry; $oEnzyme->name = $aNames; $oEnzyme->class = $aClasses; $oEnzyme->sysname = $sysname; $oEnzyme->reaction = $aReactions; $oEnzyme->substrate = $aSubstrates; $oEnzyme->product = $aProducts; $oEnzyme->comment = $comment; $oEnzyme->reference = $outer; $oEnzyme->pathway = $aPaths; $oEnzyme->structures = $aStructs; $oEnzyme->dblinks = $aDblinks; $oEnzyme->ortholog = $aOrthologs; $oEnzyme->genes = $aGenes; $oEnzyme->disease = $aDiseases; $oEnzyme->motif = $aMotifs; return $oEnzyme; } function parse_ligand_kegg() { } function parse_mol_kegg() { } // This has the weird behavior of not reading the last entry/record if the // file doesn't end with a blank line (below the END OF RECORD line). Fix // this later. function parseall($files, $object, $format) { $aoObj = array(); foreach($files as $fname) { $fp = fopen($fname, "r"); if ($fp == FALSE) die("Cannot open $fname!"); $flines = array(); $eor = get_eor($format); $funcname = get_parsefunc($object, $format); while(1) { $linestr = fgets($fp, 141); if (feof($fp) == TRUE) break; $flines[] = $linestr; if (trim($linestr) == $eor) { $evalstr = "\$aoObj[] = " . $funcname . "(\$flines)" . ";"; eval($evalstr); $flines = array(); } } fclose($fp); } return $aoObj; } function get_eor($format) { $format = strtoupper($format); switch($format) { case "GENBANK": return '//'; case "GB": return '//'; case "SWISSPROT": return '//'; case "SWP": return '//'; case "KEGG": return '///'; case "NCBILIT": return '--------------------------------------------------------'; } } // $files is an array of name of all the files to be parsed. function parseall_reaction_kegg($files) { $aoReactions = array(); foreach($files as $fname) { $fp = fopen($fname, "r"); if ($fp == FALSE) die("Cannot open $fname!"); $flines = array(); while(1) { $linestr = fgets($fp, 141); if (feof($fp) == TRUE) break; $flines[] = $linestr; if (trim($linestr) == '///') { $aoReactions[] = parse_reaction_kegg($flines); $flines = array(); } } fclose($fp); } return $aoReactions; } // We will require the webpage(s) to pass this function an assoc array of // values, where the keys correspond to the name of fields in a REACTION // record in UPPERCASE. Example keys: ENTRY, DEFINITION, EQUATION, etc. // Format: $fields = array("ENTRY" => value, "NAME" => value, ...) function write_reaction_kegg($fields) { $result = str_pad("ENTRY", 12) . $fields["ENTRY"]; $result .= "\n"; if ( strlen(trim($fields["NAME"])) > 0 ) $result .= str_pad("NAME",12) . linechop(rtrim($fields["NAME"]), 68, str_pad("$", 13, " ", STR_PAD_LEFT)); // no need to put a "\n" because linechop puts one at the end already $result .= str_pad("DEFINITION",12) . linechop(rtrim($fields["DEFINITION"]), 68, str_pad("$", 13, " ", STR_PAD_LEFT)); // no need to put a "\n" because linechop puts one at the end already $result .= str_pad("EQUATION",12) . linechopw($fields["EQUATION"], 68, " "); if ( (count($fields["PATHWAY"]) > 0) or (strlen($fields["PATHWAY_ADD"]) > 0) ) { $pway_add_tokens = preg_split("/::/", $fields["PATHWAY_ADD"], -1, PREG_SPLIT_NO_EMPTY); if ( (count($pway_add_tokens) == 0) and (strlen($fields["PATHWAY_ADD"]) > 0) ) // there is exactly one PATHWAY entry in the text box $pway_add_tokens = array($fields["PATHWAY_ADD"]); $new_pway = array_merge($fields["PATHWAY"], $pway_add_tokens); $result .= str_pad("PATHWAY",12) . fmt_pway($new_pway, " "); } if ( (count($fields["ENZYME"]) > 0) or (strlen($fields["ENZYME"]) > 0) ) { // Example: $fields["ENZYME_ADD"] = "1.1.1.1 2.2.2.2 3.3.3.3" // $enz_add = array("1.1.1.1", "2.2.2.2", ... ) $enz_add_tokens = preg_split("/\s+/", $fields["ENZYME_ADD"], -1, PREG_SPLIT_NO_EMPTY); $new_enzyme = array_merge($fields["ENZYME"], $enz_add_tokens); $result .= str_pad("ENZYME",12) . multientry_pad($new_enzyme, " ", 16, STR_PAD_RIGHT, 3, " "); $result .= "\n"; } $result .= "///"; return $result; } /* PATHWAY PATH: MAP00290 Valine, leucine and isoleucine biosynthesis PATH: MAP00650 Butanoate metabolism PATH: MAP00010 Glycolysis / Gluconeogenesis */ function fmt_pway($aPways, $prefix = "") { $result = ""; $line_ctr = 0; foreach($aPways as $pway) { $line_ctr++; $pway_tokens = preg_split("/\s+/", $pway, -1, PREG_SPLIT_NO_EMPTY); array_walk($pway_tokens, "trim_element"); $pway_line = "PATH: " . $pway_tokens[0] . " " . $pway_tokens[1] . "\n"; if ($line_ctr == 1) $result .= $pway_line; else $result .= $prefix . $pway_line; } return $result; } function parse_reaction_kegg($flines) { // Initialization of variables. $entry_ctr = 0; $in_name_flag = FALSE; $in_def_flag = FALSE; $in_eq_flag = FALSE; $in_path_flag = FALSE; $in_enzyme_flag = FALSE; // Note: Should this function be modified to parse more than one record, // each of these lines should be placed whenever the end of a multi-line // data field is detected. $name_string = ""; $def_string = ""; $eq_string = ""; $path_string = ""; while( list($lineno, $linestr) = each($flines) ) { // OPENS outermost while() loop $entry_ctr++; $label = trim(left($linestr, 12)); // Assume that ENTRY is always one line. if ($label == "ENTRY") $entry = trim(substr($linestr, 12)); // NAME field may be one or more lines. Uses the $ sign notation. // June 5, 2003: Inside this IF stmt, changed all trim() to rtrim(). if ($label == "NAME") { $name_string = rtrim(substr($linestr,12)); $in_name_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_name_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') $name_string .= rtrim(substr($linestr,13)); else $name_string .= " " . rtrim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_name_flag) ) { $name = rtrim($name_string); $in_name_flag = FALSE; } /* DEFINITION data field - multi-word, multi-line, single entry. Follows the $ convention for line continuation. Example: DEFINITION sn-Glycerol 3-phosphate + H2O <=> Glycerol + Orthophosphate Syntax: DEFINITION subs1 + subs2 + ... <=> prod1 + prod2 + ... */ if ($label == "DEFINITION") { $def_string .= rtrim(substr($linestr,12)); $in_def_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_def_flag) ) { if (left(trim(substr($linestr,12)),1) == '$') $def_string .= rtrim(substr($linestr,13)); else $def_string .= " " . rtrim(substr($linestr,12)); } elseif ( (strlen($label) > 0) and ($in_def_flag) ) { $def_string = rtrim($def_string); $in_def_flag = FALSE; } /* The ligand user manual says DEFINITION and EQUATION are mandatory and that the latter immediately follows the former. EQUATION label signals end of DEFINITION field. */ if ($label == "EQUATION") { // trim "accumulated string" and store in $definition variable. $definition = trim($def_string); // set flag to FALSE to indicate we've exited the DEFINITION data field. $in_def_flag = FALSE; $eq_string = trim(substr($linestr,12)) . " "; $in_eq_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_eq_flag) ) { // at the 2nd, 3rd, etc. line of the EQUATION data field. $eq_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_eq_flag) ) { // at the first line of the next data section (or end of entry). $equation = trim($eq_string); $in_eq_flag = FALSE; } // PATHWAY data field may have one or more lines. if ($label == "PATHWAY") { // at the first line of PATHWAY data field. $path_string = trim(substr($linestr,12)) . " "; $in_path_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_path_flag) ) { // at the 2nd, 3rd, etc. line of the PATHWAY data field. $path_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_path_flag) ) { // at the first line of the next data section (or end of entry). $pathway = trim($path_string); $paths = preg_split("/PATH:/", $pathway, -1, PREG_SPLIT_NO_EMPTY); array_walk($paths, "trim_element"); $aPaths = array(); foreach($paths as $path) { $path_id = trim(left($path,10)); $path_desc = trim(substr($path,10)); $path_array = array($path_id, $path_desc); $aPaths[] = $path_array; } // set the flag to indicate end of PATHWAY data field. $in_path_flag = FALSE; } // ENZYME data field may be one or more lines. if ($label == "ENZYME") { $enzyme_string = trim(substr($linestr,12)) . " "; $in_enzyme_flag = TRUE; } elseif ( (strlen($label) == 0) and ($in_enzyme_flag) ) { $enzyme_string .= trim(substr($linestr,12)) . " "; } elseif ( (strlen($label) > 0) and ($in_enzyme_flag) ) { $enzyme = trim($enzyme_string); $aEnzymes = preg_split("/\s/", $enzyme, -1, PREG_SPLIT_NO_EMPTY); array_walk($aEnzymes, "trim_element"); $in_enzyme_flag = FALSE; } if ($label == '///') break; } // CLOSES outermost while() loop // Save parsed values to a new Reaction object $oReaction = new Reaction(); $oReaction->entry = $entry; $oReaction->name = $name; $oReaction->definition = $definition; $oReaction->pathway = $aPaths; $oReaction->equation = $equation; $oReaction->enzyme = $aEnzymes; return $oReaction; } /* RULES USED IN GENERATING KEGG REACTION FILES FROM WEB INPUT EXAMPLE ENTRY: ENTRY R00841 NAME Glycerol-3-phosphate phosphohydrolase DEFINITION sn-Glycerol 3-phosphate + H2O <=> Glycerol + Orthophosphate EQUATION C00093 + C00001 <=> C00116 + C00009 PATHWAY PATH: MAP00561 Glycerolipid metabolism ENZYME 3.1.3.21 /// RULES FOR CONTINUATION LINES FOR SELECTED DATA FIELDS/ENTRIES: 1) The name of an enzyme or a chemical compound is sometimes too long to fit in one line, in which case continuation lines are used. The continuation line is indicated by the dollar sign ($) on column 13. Note that a long name is simply separated into two lines without any hyphenation. This rule applies to the following data items: NAME, SYSNAME, REACTION, SUBSTRATE, PRODUCT, and DEFINITION. RULES FOR FORMING VALID REACTION RECORDS: 1) A Kegg reaction record consists of six (6) fields: a) ENTRY b) NAME c) DEFINITION d) EQUATION e) PATHWAY f) ENZYME 2) Each field has an entry label and entry data. The field labels are the same as above. 3) All field labels occupy positions 1 to 12 of the entry's first line. Succeeding lines do not have field labels, and are instead left blank. 4) All field data occupy positions 13 to 80. The following describes each data field in a REACTION record: 5) ENTRY Example: ENTRY R00841 5.1. ENTRY data consists of a SINGLE WORD on a SINGLE LINE representing the reaction accession number and is mandatory for all REACTION records. 5.2. OBS: The reaction accession number is a 6-character code. The first character is the letter "R", the 2nd to 6th are numeric digits (0-9). 5.3. OBS: The reaction accession number is incremented by 1 in successive records (although a few exceptions have been observed.) 6) NAME Example: NAME Glycerol-3-phosphate phosphohydrolase 6.1. The NAME data item contains MULTIPLE WORDS on MULTIPLE LINES containing the recommended name. The $ convention for line continuations is followed for this entry. Note: MULTIPLE = one or more WORD = a string of one or more non-whitespace characters. 6.2. This field is optional. 7) DEFINITION Example: DEFINITION sn-Glycerol 3-phosphate + H2O <=> Glycerol + Orthophosphate Syntax: DEFINITION subs1 + subs2 + ... <=> prod1 + prod2 + ... where subs is a substrate compound name, prod is a product compound name. 7.1. MAN: The DEFINITION data item contains the chemical reaction in the form of an equation; substrates and products are separated by '<=>', and each compound in substrates and products is separated by ' + '. There may be a coefficient before the compound name. This item is mandatory for all entries. 7.2. From 7.1, we can conclude that this is MULTI-WORD and MULTI-LINE. 7.3. The DEFINITION entry follows the $ convention for line continuations. 7.4. From 7.1, this entry contains the ff. special WORDS: +, <=> 8. EQUATION Example: EQUATION C00093 + C00001 <=> C00116 + C00009 Syntax: EQUATION compid1 + compid2 + ... <=> compid1' + compid2' + ... 8.1. MAN: The EQUATION data item also contains the chemical reaction in the form of an equation. This item represent the chemical compounds by their compound IDs, whereas the name of the compounds are used in the DEFINITION data item. This item is mandatory for all entries. 8.2. From 8.1, this entry is MULTI-WORD. 8.3. OBS: This entry is a SPACED MULTI-LINE. 8.4. From 8.1, this entry contains the ff. special WORDS: +, <=> 9. PATHWAY Example: PATHWAY PATH: MAP00220 Urea cycle and metabolism of amino groups PATH: MAP00910 Nitrogen metabolism PATH: MAP00791 Atrazine degradation Syntax: PATHWAY PATH: [ PATH: ... ] 9.1. MAN: The PATHWAY data item contains the link information to the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database: the pathway map accession number followed by the description. By clicking on this number in the WWW version of DBGET, the pathway diagram is displayed. 9.2. OBS: The Pathway map accession number is of the form: MAP99999, where each represents a digit from 0-9. 9.3. From 9.1, one PATHWAY entry the ff. special WORDS: "PATH:" 9.4. This is a MULTI-WORD, MULTI-LINE and MULTI-ENTRY field. It follows ONE LINE-ONE ENTRY rule. 9.5. This field is OPTIONAL. 10. ENZYME Examples: ENZYME 3.1.3.21 ENZYME 1.2.3.- 1.2.3.13 Syntax: ENZYME [ ... ] [ ... ] 10.1. MAN: The ENZYME data item contains the link information to the ENZYME section. The enzyme entries that catalyse the corresponding reaction, are listed by their EC numbers. 10.2. From 10.1, this is MULTI-WORD. 10.3. OBS: From the parser code (which is based on OBS), it appears that this is MULTI-ENTRY, MULTI-LINE. It follows ONE-LINE-MULTI-ENTRY rule, with ONE OR MORE SPACES as delimiter. The number of entries in a single line is indeterminate at this time. 10.4. This field is OPTIONAL. 11. The END-OF-RECORD (EOR) marker 11.1. MAN: The end-of-entry data item marks the end of the entry. It is denoted by the identifier consisting of three consecutive slashes, '///'. This item is mandatory for all entries. 11.2. Corollary to 11.1, the /// is mandatory even of the last entry. Also, it is a good idea to place an extra blank line after the last ///. */ ?>

[ Home Page ] [ I/O Demo Scripts Page ]

 


Copyright © 2003 by Sergio Gregorio, Jr.
All rights reserved.