Kristian Gray
Title
"New look HGNC gene families"
Home town
Handsworth, Sheffield
Where Benjamin Huntsman developed crucible steel
Highest degree
MSc in Bioinformatics from the University of Manchester
Previous teams
Cancer genome project and core software services team within the Sanger Institute
Current role
Scientific programmer within the HGNC
Unusual
Running 10K races and brewing all grain beer. One does not help the other ;-)
New look HGNC gene families
A whirlwind tour of our new gene family resource
Kristian Gray
What do we mean by gene families?
Biological dictionary
"A group of genes that have arisen by duplication of an ancestral gene. Such genes show similarities of nucleotide sequence ..." (Oxford Dictionary of Biology)
HGNC gene family
A group of genes that share important characteristics such as homology, work within a complex unit or participate in the same process
Many biological dictionaries would define a gene family as "A group of genes that have arisen by duplication of an ancestral gene. Such genes show similarities of nucleotide sequence ".
However our gene families could be said to be "A group of genes that share important characteristics such as homology, work within a complex unit or participate in the same process". Many of the gene families that we have, do fit in the stricter dictionary definition and usually the genes within these sets have a common root or stem symbol.
Old gene families
Over 10 years ago we started displaying gene families to our users within our site.
Because we were assigning common root symbols to gene family members it was a natural progression for us to create gene family reports.
This benefits many research groups interested in specific gene families by providing a way for them to view and download all the genes associated to a family of interest.
Index
We had an alphabetical gene family index that contained all the families.
Clicking on a family within the index would take you to a family report page written by a HGNC curator within Drupal.
Family with hierarchy
For some of the families a hierarchy was created within drupal to split the family down into subsets such as the G protein coupled receptors. In terms of a envisioning families as a tree, node families contained sub-families and leaf families contained genes.
Family containing genes
If the depth of the hierarchy is 1 deep, the page is likely to display the genes for each of the sub families.
Drupal module creates the gene table by calling the DB for a given family
To download a family gene set, the user can click on the "download gene family data" link for each table.
Problems with the old version
Family data had to be added to multiple locations within Drupal and our MySQL DB.
Database schema wasn't designed to handle hierarchies or genes within multiple families.
Each page was created by hand and no defined template.
Pages could be more user friendly to navigate.
Users had to download each subset family to get all the genes associated to a family with a hierarchy
Pages containing many subset families were slow to load.
Difficult to visualise complex hierarchies
Couldn't search families effectively.
Family data had to be added to both Drupal and our MySQL DB, as well as separately adding the family to the index, which means a lot of time is spent adding family data to multiple locations.
The database schema wasn't designed to handle hierarchies or that a gene could have more than one family. The only place that had hierarchy data was Drupal and the only place that had gene data was the DB which was not ideal.
Each page was created by hand within Drupal by our curtors and didn't have a defined template.
And so were not friendly to navigate.
To download all the genes within a family containing a hierarcy, the user had to download a file for each subset family. In the case of the G protein coupled receptors that is a lot of downloading and file concatinations
Pages that contained many subsets were slow to load due to the module that creates the gene tables being called one after the other hitting the database many time before loadin the page.
Wasn't easy to visualise the hierarchy for a complex family such as G protein coupled receptors.
And users couldn't search families effectively.
Solution?
Rethink & redevelop a new gene families resource.
Family index
Family containing sub-families - title
Family containing sub-families - map
Family containing sub-families - description & subsets
Family containing sub-families - genes within all subsets
Family containing sub-families - useful information and downloads
Sub family containing genes
Family without a hierarchy
Search all example
Search families only
New HGNC BioMart server for genes and families
Acknowledgments