GenPerl consists of a collection of OO Perl modules and an API for interacting with GenPerl objects. Functionality currently implemented in the GenPerl API is mainly related to managing the persistence of GenPerl objects in a relational database. This includes simple create, read, update and delete operations as well as more complex querying functionality. There is also a small amount of analysis functionality implemented including the formatting of files for analysis by commonly used genetic analysis software.
GenPerl is designed to facilitate the development of Perl scripts for applications related to genetic analysis. As such, it does not include ready to use programs or friendly graphical user interfaces (GUIs). Instead, GenPerl provides reusable Perl modules that facilitate writing Perl scripts for the storage, formatting and analysis of a wide range of genetic data. GenPerl enables the development applications which can process and analyze large quantities of data in customized ways that are typically difficult or impossible with large GUI-based systems.
The original version of GenPerl was developed in the Research Department of Genomica Corp. where it was used to produce research protytypes and to manage and analyze data related to research projects with which Genomica was involved.
GenPerl should be considered alpha software.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Steve Mathias
Genomica Corporation
1745 38th Street
Boulder, CO 80301
The following convention is observed in the naming of the attributes and relationships of genperl objects:
The tables below detail the internal structure of the GenPerl object. When creating and updating objects, there are many instances where it is necessary to reference other objects. This is done in a general way via IDs or importIDs. IDs can only be used to reference objects that have already been saved to the database, and have thus been given a database ID. Otherwise, importIDs must be used. Basically, if you want to reference an object that already exists in the database, use ids; if you are making references to objects that are being imported together as part of the same import session, use importIDs.
Internally, all object references are anonymous hashes with the following key/value structure:
        {  name  => String 
           id/importID  => Integer/String }
The name is optional and is allowed for convenience. Either an id or an importID is required and identifies the referenced object. See the methods in Genetics::API::DB::Insert for details on how importIDs are saved and used.
Keep in mind that genperl objects may be instantiated and used without necessarily being stored in a genperl database. Therefore, Genetics::Object and its subclasses do not enforce the data type and format constraints required if the objects are to be saved to the genperl schema via the API. Any additional format or size requirements of the API are noted parenthetically in the tables below.
Genetics::Object is the abstract superclass for all GenPerl objects.
The GenPerl objects with which one interacts directly are:
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| name | String (<= 120 characters) | The name of the object. | 
| id | Integer | A database generated id. This should only be used for updating and deleting objetcs | 
| OR | ||
| importID | String | An identifier that should be unique within an import session. These get stored in the database as keywords. | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| dateCreated | String (``YYYY-MM-DD'') | |
| dateModified | String (``YYYY-MM-DD'') | This field is maintained by the database and is ignored on import and update. | 
| url | String (< 65638 bytes) | |
| comment | String (< 65638 bytes) | |
| NameAliases | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: name => String (<= 120 characters) comment => String (< 120 characters) | |
| Contact | Hash pointer.  The referenced hash should have the following key/value structure: name => String (<= 120 characters) organization => String (<= 120 characters) comment => String (< 65638 bytes) | |
| DBXReferences | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: accessionNumber => String (<= 32 characters) databaseName => String (<= 32 characters) schemaName => String (<= 120 characters) comment => String (< 65638 bytes) | |
| Keywords | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: name => String (<= 32 characters) dataType => String (<= 32 characters) value => String (<= 32 characters) description => String (<= 255 characters) | |
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| clusterType | String ("Mixed", "Subject", "Kindred", "Marker", "SNP", "Genotype", "StudyVariable", "Phenotype", "HaplotypeMarkerCollection", "Haplotype", "Map", "FrequencySource", "DNASample", or "TissueSample") | The type of object the Cluster references. | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| Contents | Array pointer to a list of object references. | These are the references to the Cluster's contents. Be careful when making references by importID to objects already saved to the database. This will likely have unpredictable results unless all the ImportID Keywords in the database are unique. | 
Example:
$cluster = new Genetics::Cluster(name => 'All Subjects',
				 importID => 271,
				 dateCreated => $today,
				 clusterType => "Subject", 
				 comment => "All Subjects imported from DMs Study JX", 
				 Keywords => [ {name => "Test Data", 
						dataType => "Boolean", 
						value => 1}, 
					     ], 
				 Contents=> [ {name => "JXPed1-1", importID => 12}, 
					      {name => "JXPed1-2", importID => 32}, 
					      {name => "JXPed1-3", importID => 22},
					      {name => "JXPed2-1", importID => 42},
					      {name => "JXPed2-2", importID => 43},
					      {name => "JXPed2-3", importID => 44},
					    ],
				) ;
| Optional Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| dateCollected | String ("YYYY-MM-DD") | Date the sample was collected/prepared. | 
| amount | Float (6,3) | |
| amountUnits | String ("g", "mg", "ug", or "ng") | |
| concentration | Float (6,3) | |
| concUnits | String ("mg/ml", "ug/ml", "ug/ul", or "ng/ul") | |
| Subject | Subject reference | Reference to the Subject from which the Sample is derived. | 
| Genotypes | Array pointer to a list of Genotype references | Reference(s) to Genotype(s) derived from the Sample. | 
Example:
$sample = new Genetics::DNASample(name => 'SM20.1-3',
				  importID => 272,
				  dateCreated => $today,
				  comment => "Third attempt to get DNA from this Sample", 
				  Keywords => [ {name => "Test Data", 
						 dataType => "Boolean", 
						 value => 1}, 
					      ], 
				  dateCollected => "2001-01-18",
				  amount => 3.26,
				  amountUnits => "mg",
				  concentration => 1.1,
				  concUnits => "mg/ml",
				  Subject => {name => 'EAPed20.1',
					      importID => 12},
				  Genotypes => [ {name => '1-D12S91',
						  importID => 13},
						 {name => '1-EAEx1.1',
						  importID => 14},
					       ],
				 ) ;
| Optional Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| ObsAlleleFrequencies | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: Allele => { Marker => Marker Reference, name => String (<= 4 characters), type => String ("Code", "Size", "RepeatNumber", "Nucleotide", or "Undefined") } frequency => Float | |
| ObsHtFrequencies | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: Haplotype => Haplotype Reference frequency => Float | |
Example:
$fs = new Genetics::FrequencySource(name => 'WICGR SNP Freqs',
				    importID => 270,
				    dateCreated => $today,
				    Keywords => [ {name => "Test Data", 
						   dataType => "Boolean", 
						   value => 1}, 
						], 
				    ObsAlleleFrequencies => [ {Allele => {Marker => {name => "EAEx1.1", 
										     importID => 265}, 
									  name => "T", 
									  type => "nucleotide"}, 
							       frequency => "0.64",
							      },
							      {Allele => {Marker => {name => "EAEx1.1", 
										     importID => 265}, 
									  name => "C", 
									  type => "nucleotide"}, 
							       frequency => "0.36",
							      }
							    ], 
				    ObsHtFrequencies => [ {Haplotype => {name => '12pEA1.2',
									 importID => 269,},
							   frequency => 1.00,
							  }
							]
				   ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| isActive | Boolean (1 or 0) | |
| Subject | Subject Reference | |
| Marker | Marker Reference | |
| AlleleCalls | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: alleleName => String (<= 4 characters) alleleType => String ("Code", "Size", "RepeatNumber", "Nucleotide", or "Undefined") phase => String ("Unknown", "Maternal", or "Paternal") AssayAttrs => Array pointer to a list of AssayAttribute hash pointers. | The AssayAttributes in here apply to individual AlleleCalls. | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| icResult | String ("Pass", "Fail", "Ambiguous", or "Unknown") | Inheritance check result. | 
| dateCollected | String ("YYYY-MM-DD") | |
| Sample | DNASample Reference | |
| AssayAttrs | Array pointer to a list of AssayAttribute hash pointers. | These are AssayAttributes that apply to the Genotype as a whole. | 
Example:
$gt = new Genetics::Genotype(name => '1-D12S91',
			     importID => 13,
			     dateCreated => $today,
			     Keywords => [ {name => "Test Data", 
					    dataType => "Boolean", 
					    value => 1}, 
					 ], 
			     isActive => 1,
			     icResult => "Pass",
			     dateCollected => "1993-11-13",
			     Subject => {name => "EAPed20.1", importID => 12},
			     Marker => {name => "D12S91", importID => 1},
			     AssayAttrs => [ {name => "lab",
					      dataType=> "string",
					      value => "Lab 6"}, 
					     {name => "machineID",
					      dataType => "String",
					      value => "ABC1234"}
					   ],
			     AlleleCalls => [ {alleleName => 3, 
					       alleleType => "Code", 
					       phase => "Maternal", 
					       AssayAttrs => [ {name => "peakHeight",
								dataType => "Number",
								value => 367} ]
					      },
					      {alleleName => 1, 
					       alleleType => "Code", 
					       phase => "Paternal", 
					       AssayAttrs => [ {name => "peakHeight",
								dataType => "Number",
								value => 435} ]
					      },
					    ]
			    ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| MarkerCollection | HtMarkerCollection Reference | |
| Alleles | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: name => String (<= 4 characters) type => String ("Code", "Size", "RepeatNumber", "Nucleotide", or "Undefined") | |
Example:
$ht = new Genetics::Haplotype(name => '12pEA1.1',
			      importID => 268,
			      dateCreated => $today,
			      Keywords => [ {name => "Test Data", 
					     dataType => "Boolean", 
					     value => 1}, 
					  ], 
			      MarkerCollection => {name => "12pEA1", importID => 267},
			      Alleles => [ {name => 2, type => "code"}, 
					   {name => "C", type => "nucleotide"} ], 
			     ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| Markers | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: Marker => Marker References distToNext => Float (10,5) | The order of the Marker references should reflect the map order of the markers, and thus the corresponding allele order in the haplotypes derived from the marker collection. | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| distanceUnits | String ("cM", "bp", "Kb", "Mb", "cR", "cR3000", "cR10000", or "Theta") | |
Example:
$hmc = new Genetics::HtMarkerCollection(name => '12pEA1',
					importID => 267,
					dateCreated => $today,
					Keywords => [ {name => "Test Data", 
						       dataType => "Boolean", 
						       value => 1}, 
						    ], 
					Markers => [ {Marker => {name => "D12S91", 
                                                                 importID => 1}, 
						      distToNext => "1.2"}, 
						     {Marker => {name => "EAEx1.1", 
						                 importID => 265},
						     }
						   ], 
					distanceUnits => "cM",
				       ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| isDerived | Boolean (1 or 0) | |
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| Subjects | Array pointer to a list of Subject References | |
| DerivedFrom | Kindred Reference | |
Example:
$kindred = new Genetics::Kindred(name => 'JXPed2',
				 importID => 45,
				 dateCreated => $today,
				 comment => "Litt et. al. (1994)", 
				 Keywords => [ {name => "Test Data", 
						dataType => "Boolean", 
						value => 1}, 
					       {name => "Disease", 
						dataType => "String", 
						value => "Episodic Ataxia"} ], 
				 NameAliases => [ {name => "Ped20", 
						   contactName => "J.P. Morgan"}, ], 
				 Subjects => [ {name => "EAPed20.1", importID => 42}, 
					       {name => "EAPed20.1000", importID => 43},
					       {name => "EAPed20.1001", importID => 44},
					     ], 
				) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| orderingMethod | String ("Relative" or "Global") | Default value is Relative. | 
| distanceUnits | String ("cM", "bp", "Kb", "Mb", "cR", "cR3000", "cR10000", or "Theta") | Default value is cM | 
| OrderedMapElements | Array pointer to a list of hash pointers.  The referenced hashes should ahve the following key/value structure: Marker => Marker References distance => Float (10,5) | The order of the OME/Marker references should reflect the map order of the markers. | 
Example:
$map = new Genetics::Map(name => 'Chr12 2 PO',
			 importID => 121,
			 dateCreated => $today,
			 comment => "Stupid 2-marker map.", 
			 Keywords => [ {name => "Test Data", 
					dataType => "Boolean", 
					value => 1}, 
				     ], 
			 chromosome => "12",
			 orderingMethod => "Relative",
			 distanceUnits => "cM",
			 Organism => {genusSpecies => "Pongo pongo"},
			 OrderedMapElements => [ {SeqObj => {name => "D12S91", importID => 1},
						  distance => 1.3},
						 {SeqObj => {name => "EAEx1.1", importID => 265}}
					       ],
			) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| malePloidy | Integer (<= 99) | Default value = 2. | 
| femalePloidy | Integer (<= 99) | Default value = 2. | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| chromosome | String (<= 8 characters) | |
| polymorphismType | String () | |
| polymorphismIndex1 | Integer (<= 16777215) | |
| polymorphismIndex2 | Integer (<= 16777215) | |
| repeatSequence | String (<= 8 characters) | |
| Alleles | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: name => String (<= 4 characters) type => String ("Code", "Size", "RepeatNumber", "Nucleotide", or "Undefined") | |
| Sequence | Hash pointer.  The referenced hash should have the following key/value structure: sequence => String (<= 65535 characters) length => Integer (<= 16777215) lengthUnits => String ("bp", "Kb", or "Mb") | This is the DNA sequence associated with the marker. | 
| ISCNMapLocations | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: chrNumber => String (<= 8 characters) chrArm => String (<= 8 characters) band => String (<= 16 characters) bandingMethod => String (<= 32 characters) | These represent cytogenetic map locations associated with the marker. | 
| Organism | Hash pointer.  The referenced hash should have the following key/value structure: genusSpecies => String (<= 255 characters) subspecies => String (<= 120 characters) strain => String (<= 120 characters) | |
Example:
$marker = new Genetics::Marker(name => 'D12S91',
			       importID => 1,
			       dateCreated => $today,
			       comment => "Marker in EA critical region",
			       NameAliases => [ {name => "d12s6666"}
					      ], 
			       Contact => {name => "Jean Weissenbach",
					   organization => "Genethon"}, 
			       DBXReferences => [ {accessionNumber => "NT_009758.3",
						   databaseName => "GenBank"}
						],
			       Keywords => [ {name => "Test Data", 
					      dataType => "Boolean", 
					      value => 1},
					   ], 
			       chromosome => "12", 
			       malePloidy => 2,
			       femalePloidy => 2,
			       polymorphismType => "Repeat", 
			       polymorphismIndex1 => 7, 
			       polymorphismIndex2 => 11, 
			       repeatSequence => "CA",
			       Organism => {genusSpecies => "Homo sapiens"},
			       Alleles => [ {name => 1, type => "Code"}, 
					    {name => 2, type => "Code"},
					    {name => 3, type => "Code"},
					    {name => 4, type => "Code"},
					    {name => 5, type => "Code"},
					    {name => 6, type => "Code"} ],
			       ISCNMapLocations => [ {chrNumber => "12", 
						      chrArm => "p", 
						      band => "12.2.1", 
						      bandingMethod => "Geimsa"}, ],
			       Sequence => {lengthUnits => "bp",
					    length => 17,
					    sequence => "ACGTUMRCACAWSYKVHDBXN"},
			      ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| value | Float (12,5) OR String ("YYYY-MM-DD") OR Integer (<= 99) | The datatype of the value depends on the format of the StudyVariable with which the Phenotype is associated. | 
| isActive | Boolean (0 or 1) | |
| Subject | Subject Reference | |
| StudyVariable | StudyVariable Reference | |
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| dateCollected | String ("YYYY-MM-DD") | |
| AssayAttrs | Array pointer to a list of AssayAttribute hash pointers. | |
Example:
$pt = new Genetics::Phenotype(name => 'JXPed1-1-Age',
			      importID => 266,
			      dateCreated => $today,
			      dateCollected => "1987-03-17",
			      Keywords => [ {name => "Test Data", 
					    dataType => "Boolean", 
					    value => 1}, 
					 ], 
			      Subject => {name => "JXPed1-1", importID => 12},
			      StudyVariable => {name => "Age", importID => 444},
			      AssayAttrs => [ {name => "Clinic Name",
					       dataType => "String",
					       value => "Sister of Gracious Mercy and Hope"},
					    ],
			      value => 12,
			      isActive => 1,
			     ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| snpType | String () | |
| functionClass | String () | |
| isConfirmed | Boolean (0 or 1) | Default value = 1. | 
| malePloidy | Integer (<= 99) | Default value = 2. | 
| femalePloidy | Integer (<= 99) | Default value = 2. | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| chromosome | String (<= 8 characters) | |
| snpIndex | Integer (<= 16777215) | |
| confirmMethod | String (<= 255 characters) | |
| Alleles | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: name => String (<= 4 characters) type => String ("Code", "Size", "RepeatNumber", "Nucleotide", or "Undefined") | |
| Sequence | Hash pointer.  The referenced hash should have the following key/value structure: sequence => String (<= 65535 characters) length => Integer (<= 16777215) lengthUnits => String ("bp", "Kb", or "Mb") | This is the DNA sequence associated with the marker. | 
| ISCNMapLocations | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: chrNumber => String (<= 8 characters) chrArm => String (<= 8 characters) band => String (<= 16 characters) bandingMethod => String (<= 32 characters) | These represent cytogenetic map locations associated with the marker. | 
| Organism | Hash pointer.  The referenced hash should have the following key/value structure: genusSpecies => String (<= 255 characters) subspecies => String (<= 120 characters) strain => String (<= 120 characters) | |
Example:
$snp = new Genetics::SNP(name => 'EAEx1.1',
			 importID => 265,
			 dateCreated => $today,
			 Keywords => [ {name => "Test Data", 
					dataType => "Boolean", 
					value => 1},
					   ], 
			 chromosome => "12", 
			 malePloidy => 2,
			 femalePloidy => 2,
			 snpType => "Substitution",
			 functionClass => "Synonymous",
			 snpIndex => 2,
			 isConfirmed => 1,
			 confirmMethod => "Resequencing",
			 comment => "Highly expressed in muscle",
			 Organism => {genusSpecies => "Homo sapiens"},
			 Alleles => [ {name => "C", type => "Nucleotide"}, 
				      {name => "T", type => "Nucleotide"} ],
			 Sequence => {lengthUnits => "bp",
				      length => 17,
				      sequence => "ACGTUMRWSYKVHDBXN"},
			 ISCNMapLocations => [ {chrNumber => "12", 
						chrArm => "p", 
						band => "12.2", 
						bandingMethod => "Geimsa"}, 
					     ],
			) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| category | String ("Trait", "StaticAffectionStatus", "StaticLiabilityClass", "DynamicAffectionStatus", "Environment", or "Treatment") | Default value = "Trait" DynamicAffectionStatus variables are not yet implemented. | 
| format | String ("Number", "Code", "Date", "DerivedNumber", or "DerivedCode") | The value of format determines the datatype of the Phenotype values associated with the StudyVariable. | 
| isXLinked | Boolean (1 or 0) | Default value = 0 | 
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| description | String (<= 255 characters) | |
| lowerBound | Float (12,5) OR String ("YYYY-MM-DD") | Only appropriate for StudyVariables with a format of Number or Date. | 
| upperBound | Float (12,5) OR String ("YYYY-MM-DD") | Only appropriate for StudyVariables with a format of Number or Date. | 
| Codes | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: code => Integer (<= 99) description => String (<= 255 characters) formula => String (<= 65535 characters) | Only appropriate for StudyVariables with a format of Code or DerivedCode. A formula is required for each Code definition if the format is DerivedCode | 
| AffStatDef | Hash pointer.  The referenced hash should have the following key/value structure: name => String (<= 120 characters) diseaseAlleleFreq => Float (7,6) pen11 => Float (7,6) pen12 => Float (7,6) pen22 => Float (7,6) AffStatElements => Array reference to a list of anonymous hashes (see Example). | Only appropriate for StudyVariables with a format of DynamicAffectionStatus.  DynamicAffectionStatus variables are not yet implemented; the data in this field is stored in the database, but it is not used for anything. When DynamicAffectionStatus variables are implemented, the AffStatDef will automatically map phenotype values onto a trait locus for genetic linkage analysis. | 
| LCDef | Hash pointer.  The referenced hash should have the following key/value structure: name => String (<= 120 characters) LiabilityClasses => Array reference to a list of anonymous hashes (see Example). | Only appropriate for StudyVariables with a category of DynamicAffectionStatus.  DynamicAffectionStatus variables are not yet implemented; the data in this field is stored in the database, but it is not used for anything. When DynamicAffectionStatus variables are implemented, the LCDef will automatically map phenotype values into liability classes for genetic linkage analysis. | 
Example:
$sv = new Genetics::StudyVariable(name => 'EA Aff Stat',
				  importID => 445,
				  dateCreated => $today,
				  Keywords => [ {name => "Test Data", 
						 dataType => "Boolean", 
						 value => 1}, 
					      ], 
				  description => "EA Trait Locus", 
				  category => "AffectionStatus", 
				  format => "Code", 
				  isXLinked => 0, 
				  Codes => [ {code => 0,
					      description => "Unknown EA Status"},
					     {code => 1,
					      description => "EA Unaffected"},
					      {code => 2,
					      description => "EA Affected"},
					   ], 
				  AffStatDef => {name => 'EA',
						 diseaseAlleleFreq => 0.001,
						 pen11 => 0.0,
						 pen12 => 0.0,
						 pen22 => 1.0,
						 AffStatElements => [ {code => 0,
								       type => "Unknown",
								       formula => "'EA Aff Stat' = 0"}, 
								      {code => 1,
								       type => "Unaffected",
								       formula => "'EA Aff Stat' = 1"}, 
								      {code => 2,
								       type => "Affected",
								       formula => "'EA Aff Stat' = 2"}, 
								    ],
						},
				  LCDef => {name => 'EA Default LC',
					    LiabilityClasses => [ {code => 0,
								   description => "Unknown Age",
								   pen11 => 0.0,
								   pen12 => 0.0,
								   pen22 => 1.0,
								   formula => "'Age' = ''"}, 
								  {code => 1,
								   description => "Age less than 40",
								   pen11 => 0.0,
								   pen12 => 0.2,
								   pen22 => 1.0,
								   formula => "'Age' < 40"}, 
								  {code => 2,
								   description => "Age less than 50",
								   pen11 => 0.0,
								   pen12 => 0.3,
								   pen22 => 1.0,
								   formula => "'Age' < 50"}, 
								  {code => 3,
								   description => "Age grater than or equal to 60",
								   pen11 => 0.0,
								   pen12 => 0.4,
								   pen22 => 1.0,
								   formula => "'Age' >= 60"}, 
								    ],
						},
				 ) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| gender | String ("Unknown", "Male", "Female", or "Both") | |
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| dateOfBirth | String ("YYYY-MM-DD") | |
| dateOfDeath | String ("YYYY-MM-DD") | |
| isProband | Boolean (1 or 0) | Default value = 0. | 
| Mother | Subject Reference | |
| Father | Subject Reference | |
| Kindred | Kindred Reference | |
| Organism | Hash pointer.  The referenced hash should have the following key/value structure: genusSpecies => String (<= 255 characters) subspecies => String (<= 120 characters) strain => String (<= 120 characters) | |
| Haplotypes | Array pointer to a list of hash pointers.  The referenced hashes should have the following key/value structure: Haplotype => Haplotype References phase => String ("Unknown", "Maternal", or "Paternal") | Haplotype assignments. | 
Example:
$subject = new Genetics::Subject(name => 'JXPed1-1',
				 importID => 12,
				 dateCreated => $today,
				 NameAliases => [ {name => "jb2002", 
						   contactName => "Gregor Mendel"}
						], 
				 Contact => {name => "J.P. Morgan, II", 
					     comment => "Referring physican"}, 
				 DBXReferences => [ {accessionNumber => "abc123",
						     databaseName => "Clinical Land",
						     schemaName => "Master",
						     comment => "Internal Clinical DB"}
						  ],
				 Keywords => [ {name => "Study Population", 
						dataType => "String", 
						description => "Internal study identifier.", 
						value => "Test"},
					       {name => "Foo", 
						dataType => "String", 
						description => "Crap", 
						value => "Bar"},
					       {name => "Test Data", 
						dataType => "Boolean", 
						value => 1}, 
					     ], 
				 gender => "Male",
				 dateOfBirth => "1937-08-18", 
				 dateOfDeath => "1997-02-15",
				 isProband => 1,
				 Mother => {name => "JXPed1-2", importID => 32}, 
				 Father => {name => "JXPed1-3", importID => 22}, 
				 Kindred => {name => "JXPed1", importID => 264}, 
				 Organism => {genusSpecies => "Homo sapiens"},
				) ;
| Required Attributes | ||
|---|---|---|
| Name | Data type and Format | Description | 
| tissue | String (<= 120 characters) | |
| Optional Attributes | ||
| Name | Data type and Format | Description | 
| dateCollected | String ("YYYY-MM-DD") | Date the sample was collected/prepared. | 
| amount | Float (6,3) | |
| amountUnits | String ("g", "mg", "ug", or "ng") | |
| Subject | Subject reference | Reference to the Subject from which the Sample is derived. | 
| Genotypes | Array pointer to a list of Genotype references | Reference(s) to Genotype(s) derived from the Sample. | 
| DNASamples | Array pointer to a list of DNASample references | Reference(s) to DNASample(s) derived from the TissueSample. | 
Example:
$sample = new Genetics::TissueSample(name => 'SM20.1',
				     importID => 273,
				     dateCreated => $today,
				     comment => "Hard to make DNA from this", 
				     Keywords => [ {name => "Test Data", 
						    dataType => "Boolean", 
						    value => 1}, 
						 ], 
				     tissue => "Muscle",
				     dateCollected => "2001-01-18",
				     amount => 150,
				     amountUnits => "g",
				     Subject => {name => 'EAPed20.1',
						 importID => 12},
				     DNASamples => [ {name => 'SM20.1-3',
						      importID => 272},
						   ]
				    ) ;
The GenPerl API functionality is separated into the following packages.
An instance of Genetics::API must be instantiated in order to interact with GenPerl objects in a database (or to access any other API methods, for that matter). If the initialization parameters include a 'DSN' value, the Genetics::API constructor will use these values to construct a data source name string. This data source name, along with the user and password parameters are passed directly to DBI->connect for the creation of a database handle. It is possible to instantiate an API instance without the creation of a database connection (just ommit the DSN attribute from the arguments passed to the constructor). In this case the user and password are igored, however this may be changed in the future.
Example:
$api = new Genetics::API(DSN => {driver => "mysql",
			         host => $Host,
			         database => $Database},
			 user => $UserName,
			 password => $Password) ;
When saving new objects, references between object may be made either via importIDs (saved as Keywords) or by absolute object IDs. IDs can only be used to reference objects that have already been saved to the database, and have thus been given a database ID. Otherwise, importIDs must be used.
When a new object identified with an importID is saved to the database, an id is generated for that object. However, it's importID is also saved as a Keyword (w/ keywordTypeID = 1). These keywords are the mechanism the API uses to make connections between objects being imported with importIDs. In theory, an object could always be referenced via the importID with which it was identified when it was first imported. The API supports this. However, unless the importIDs you use are unique amongst all objects ever imported into a given schema, this approach will eventually lead to trouble.
Thus, the best plan is to stick with the following rule. If you want to reference an object that already exists in the database, use ids; if you are making references to objects that are being imported concurrently as part of the same import session, use importIDs.
The various save methods return the database id of the saved object.
Example:
The following code will create a new SNP. It also creates a new Genotype. The new Genotype references the new SNP, but a Subject that has previously been saved to the database.
$snp = new Genetics::SNP(name => 'SNPxyz',
			 importID => 121,
			 dateCreated => $today,
			 chromosome => "X", 
			 malePloidy => 1,
			 femalePloidy => 2,
			 snpType => "Substitution",
			 functionClass => "Synonymous",
			 snpIndex => 2,
			 isConfirmed => 1,
			 confirmMethod => "Resequencing",
			 Organism => {genusSpecies => "Homo sapiens"},
			 Alleles => [ {name => "C", type => "Nucleotide"}, 
				      {name => "T", type => "Nucleotide"} ],
			) ;
$gt = new Genetics::Genotype(name => '1-SNPxyz',
			     importID => 122,
			     dateCreated => $today,
			     isActive => 1,
			     dateCollected => $today,
			     Subject => {name => "EAPed20.1", id => 1},
			     Marker => {name => "SNPxyz", importID => 121},
			     AlleleCalls => [ {alleleName => "C", 
					       alleleType => "Nucleotide", 
					       phase => "Unknown" 
					      },
					      {alleleName => "T", 
					       alleleType => "Nucleotide", 
					       phase => "Unknown"
					    ]
			    ) ;
$id = $api->insertSNP($snp) ;
$id = $api->insertGenotype($gt) ;
Currently, objects can be retrieved from the databse by id and by name.
Example:
This code will retrieve and print the Subject with id = 1.
    $subject = $api->getSubject(1) ;
    $subject->print() ;
Objects in the database are identified for update by id only.
The following describes the update behavior implemented by the methods in this package: - The data in each object field will completely replace the data in the database for that field. - Data for fields not present in an object will not be affected. - In order to delete data for a particular field, the value of that field should be set to "DELETE" (irrespective of the normal datatype of thet field). - In order to add to existing data for a particular field, use an appropriate method in Genetics::API or handle it manually.
Examples:
The following code will retrieve the Subject with id = 3, change its name, and then re-save the Subject with its new name back to the databse.
    $subject = $api->getSubject(3) ;
    $subject->field("name", "foo") ;
    $api->updateSubject($subject) ;
The following code will completely replace the set of alleles associated with the SNP with id = 11:
    @alleles = ( {name => "A", type => "Nucleotide"},
	         {name => "C", type => "Nucleotide"} ) ;
    $snp = $api->getSNP(11) ;
    $snp->field("Alleles", \@alleles) ;
    $api->updateSNP($snp) ;
The following code will add an allele to those associated with the SNP with id = 11:
    $snp = $api->getSNP(11) ;
    $alleleListPtr = $snp->field("Alleles") ;
    push( @$alleleListPtr, {name => "A", type => "Nucleotide"} ) ;
    $snp->field("Alleles", $alleleListPtr) ;
    $api->updateSNP($snp) ;
Currently, objects can be deleted from the databse by id only.
Example:
The following code will delete Genotypes:
foreach $id ( @badGenotypeIDs ) {
  $rv = $api->deleteGenotype($id) ;
  defined $rv or print "Error deleting Genotype w/ ID $id\n" ;
}
Example:
The following code ...
Example:
The following code...
In order to manage the persistance of GenPerl objects, the GenPerl API requires a relational database instance in which to store the data. All database interaction in the API is implemented using DBI, and thus the API could, in theory, support the use of any RDBMS for which a DBI driver module exists. However, right now, the API will only work with MySQL. I chose MySQL mainly because it's free, fast, and relatively simple to administrate. The API takes advantage of a couple of MySQL features - mainly AUTOINCREMENT - which are not present in all RDBMSs. The API code would probably need to be changed in order to use any database. Of course, a suitable schema must be created within the database instance in which the data will be stored and the DDL (see below) would need to be altered such that the column data types are something that the other database will understand.
The DDL in this script can be used to create an appropriate schema in MySQL.
The sections below contain example code with comments explaining what the code is doing.
#!/usr/local/bin/perl -w
# Create an API instance
use Genetics::API ;
$api = new Genetics::API(DSN => {driver => "mysql",
				 host => $Host,
				 database => $Database},
			 user => slm,
			 password => GetMysqlPassword()) ;
# Get the StudyVariable on which the Clusters will be based
$sv = $api->getObjectByName("Aff") ;
# Get affected Subjects and create a Cluster
@affSubjects = $api->getSubjectsByPhenotype($sv, 2) ;
$affCluster = $api->createCluster("HT Affecteds", \@affSubjects) ;
# Get unaffected Subjects and create a Cluster
@unaffSubjects = $api->getSubjectsByPhenotype($sv, 1) ;
$unaffCluster = $api->createCluster("Normals", \@unaffSubjects) ;
# Get the Marker whose allele distributions will be tested
$marker = $api->getObjectByName("agtT174M") ;
# Do the test
$api->chiSquareAssocTest($marker, "Nucleotide", $affCluster, $unaffCluster) ;# This will print something along the lines of:
Allele counts: 
                C       T
HT_Affecteds    873 125
Normals         736 64
The Chi-Square value is 9.66582997149079, with 1 degrees of freedom
The probability that the difference in agtT174M allele distributions between 
HT_Affecteds and Normals is due to chance is less than 0.01.
#!/usr/local/bin/perl -w
#Create an API instance
use Genetics::API ;
$api = new Genetics::API(DSN => {driver => "mysql",
				 host => $Host,
				 database => $Database},
			 user => slm,
			 password => GetMysqlPassword()) ;
# Get the Clusters containing the Subjects whose allele frequencies will be graphed
$affCluster = $api->getObjectByName("HT Affecteds") ;
$unaffCluster = $api->getObjectByName("Normals") ;
# Get the Marker whose allele frequencies will be graphed
$marker = $api->getObjectByName("agtT174M") ;
# Graph the frequencies
$api->graphAlleleFreqs(
		       MARKER => $marker, 
		       FREQSOURCES => [ $affCluster, $unaffCluster ],
		       ALLELETYPE => "Nucleotide"
		      ) ;
# This will display a graph along the lines of:
 
#!/usr/local/bin/perl -w
#Create an API instance
use Genetics::API ;
$api = new Genetics::API(DSN => {driver => "mysql",
				 host => $Host,
				 database => $Database},
			 user => slm,
			 password => GetMysqlPassword()) ;
# Get a Kindred to be used
$kindred = $api->getObjectByName("Ped20") ;
# Get the founder Subjects in the Kindred.  These will be used as the allele frequency source
push(@founders, $api->getFounders($kindred)) ;
# Get the Markers to be used
@markers = $api->getObjectsByType("Marker") ;
# Get the StudyVariable to be used as the trait locus
$sv = $api->getObjectByName("Aff") ;
# Get the StudyVariable the defines liability classes assocoated with the trait locus
$lc = $api->getObjectByName("Aff LC") ;
open(DAT, "> ea20.dat") or die "Can't write file: $!" ;
open(PRE, "> ea20.pre") or die "Can't write file: $!" ;
# Write the files
$api->writeLinkageFiles(
			KINDREDS => [ $kindred ], 
			MARKERS => \@markers, 
			AFS => \@founders,
			TRAIT => $sv, 
			LC => $lc,
			DATFILE => \*DAT, 
			PEDFILE => \*PRE, 
		       ) ;
close DAT ;
close PRE ;
# This will produce the following:
Locus file:
10 0 0 5 0 0.0 0.0 0 1 2 3 4 5 6 7 8 9 10 1 2 << affection status 0.9999 0.000100 << allele frequencies 5 << number of liability classes 0.000000 0.000000 0.000000 << penetrance values 0.020000 0.020000 1.000000 << penetrance values 0.040000 0.040000 1.000000 << penetrance values 0.050000 0.050000 1.000000 << penetrance values 0.100000 0.100000 1.000000 << penetrance values 3 6 # D12S91 0.5 0.1667 0.3333 0.0001 0.0001 0.0001 << allele frequencies 3 6 # D12S100 0.0833 0.0833 0.5 0.25 0.0001 0.0833 << allele frequencies 3 7 # CACNL1A1 0.0001 0.0714 0.5 0.0714 0.0714 0.2857 0.0001 << allele frequencies 3 5 # D12S372 0.0001 0.3571 0.3571 0.2143 0.0714 << allele frequencies 3 7 # pY2/1 0.0001 0.25 0.0833 0.0001 0.0001 0.1667 0.5 << allele frequencies 3 9 # pY21/1 0.0001 0.0001 0.1429 0.3571 0.0714 0.2143 0.0001 0.1429 0.0714 << allele frequencies 3 5 # KCNA5 0.3333 0.1667 0.25 0.25 0.0001 << allele frequencies 3 8 # D12S99 0.0714 0.0714 0.0714 0.2857 0.1429 0.2143 0.0714 0.0714 << allele frequencies 3 7 # D12S93 0.1 0.1 0.1 0.2 0.2 0.0001 0.3 << allele frequencies 0 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 << map 1 0.10000 0.45000Pedigree file:
Ped20 1 1001 1000 1 2 3 3 1 3 6 9 10 5 3 6 2 3 7 3 2 1 3 6 4 Ped20 1001 0 0 1 1 4 3 1 3 3 9 9 5 4 6 6 4 3 4 3 5 1 6 6 Ped20 1000 2002 2001 2 2 4 1 3 4 6 10 10 2 3 1 2 7 7 3 2 6 3 4 7 Ped20 100 1001 1000 1 1 3 1 1 3 4 9 10 5 2 6 1 4 7 3 3 1 6 6 7 Ped20 2002 0 0 1 1 4 1 1 1 4 9 10 2 2 1 5 5 7 3 4 4 6 4 7 Ped20 2001 0 0 2 2 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ped20 1002 2002 2001 1 1 4 1 2 4 5 10 7 2 2 1 1 7 10 3 4 6 5 4 7 Ped20 1006 2002 2001 2 2 4 1 3 4 6 10 10 2 3 1 2 7 7 3 2 6 3 4 7 Ped20 1007 0 0 1 1 3 3 3 4 6 1 8 2 3 0 0 4 7 2 4 3 6 3 4 Ped20 1008 2002 2001 2 2 4 1 3 1 6 9 10 2 3 5 2 5 7 4 2 4 3 4 4 Ped20 1009 0 0 1 1 4 1 2 3 3 2 9 2 3 1 1 4 5 1 2 2 5 1 2 Ped20 199 0 0 2 1 3 0 0 0 0 9 10 3 4 2 5 4 9 0 0 4 4 0 0 Ped20 1010 2002 2001 2 2 4 1 3 1 6 9 10 2 3 5 2 5 7 4 2 4 3 4 4 Ped20 1011 0 0 1 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ped20 102 1001 1000 1 2 2 3 3 3 6 9 10 4 3 6 2 4 7 4 2 5 3 6 4 Ped20 103 0 0 2 1 2 2 1 3 2 10 10 4 3 6 6 4 3 3 1 6 4 0 0 Ped20 104 0 0 2 1 3 3 1 4 3 9 9 3 2 6 6 10 5 1 1 8 7 6 3 Ped20 113 1007 1006 2 2 2 3 3 6 4 1 10 3 2 6 1 4 7 2 3 6 3 4 7 Ped20 114 1007 1006 2 2 2 3 3 6 6 8 10 3 3 6 2 4 7 2 2 3 3 4 4 Ped20 115 1009 1008 1 2 2 1 3 3 6 9 10 3 2 1 2 4 7 2 2 5 3 1 4 Ped20 116 1011 1010 1 2 3 3 3 2 6 9 10 3 3 1 2 4 7 1 2 3 3 4 4 Ped20 117 1011 1010 2 1 3 1 3 4 1 8 9 2 3 6 5 5 5 4 4 4 4 4 4 Ped20 9099 100 199 1 1 2 0 0 0 0 9 9 5 3 6 2 4 4 0 0 1 4 0 0 Ped20 9098 100 199 2 0 2 0 0 0 0 10 10 2 4 1 5 7 9 0 0 6 4 0 0 Ped20 9097 100 199 1 1 1 0 0 0 0 9 9 5 3 6 2 4 4 0 0 1 4 0 0 Ped20 9003 102 103 1 2 1 3 2 6 2 10 10 4 3 2 6 7 4 2 3 3 4 4 6 Ped20 9004 102 103 1 1 1 3 1 6 3 10 10 3 3 6 6 4 3 4 1 5 6 4 6 Ped20 9005 1 104 2 2 1 3 1 6 4 10 9 3 3 2 6 7 5 2 1 3 7 4 6 Ped20 9006 1 104 1 0 1 1 1 3 4 9 9 5 3 6 6 3 5 3 1 0 0 0 0