>> good morning. and thank you for joining us today for the nci cbiit speaker series. i'm tony kerlavage, head of the informatics program here at cbiit. i'll remind you that today's presentation is being recorded and will be available on the wiki for the speaker series as a screencast with voiceover, and also posted on the speaker series youtube playlist. if you google the nci speaker series you'll find that wiki page. information about our future speakers is available on twitter and our blog. so check out those sites for the latest information. you can google nci blog or -- and our twitter handle is @nci_ncip. today i'm quite happy to welcome dr. ada hamosh, who's the clinical director of mckusick-nathans institute of genetic medicine at the johns hopkins university school of medicine. dr. hamosh is also the scientific director of online mendelian genetics in man, omim, and the co-chair of the phenotype review committee
of the combined baylor-hopkins centers for mendelian genomics, which is a national human genome research institute-funded project to identify the genes responsible for known and novel mendelian disorders. title of her presentation is "omim, online mendelian inheritance in man, a knowledge base of genes and genetic disorders." and with that, i'll turn the floor over to dr. hamosh. >> thank you. so, can i ask a question before i get started? how many of you have ever used omim? terrific. some of you haven't. so i'm going to a start -- i mean it's not true if some of you haven't but some of you haven't so i'm going to start at the beginning. and i'm going to go through this relatively quickly, but if it's too quickly please interrupt me and i guess for the people guess for the people who are remote if i'm not making sense or not pointing correctly let me know.
ok. so, in the beginning there was mendel and then there was dr. mckusick, who started omim as a series of notes to himself in little brown books of which he had 40,000 at the time of his death, and then, eventually turned it into a real book. and this is a series of the published mendelian inheritance in man, which its first edition was in 1966 with 1,400 entries and no genes, 'cause it was all traits. we didn't have any genes in hand. we knew it existed but we didn't have them in hand. out to the last publication in 1998, 12th edition, since then it's only been available electronically because that 1998 edition had -- which is -- had 8,400 entries and it's published in onionskin and i don't know how many volumes we'd have now. so in 1987, omim was actually adapted for the internet before there was a web. and then in 1995, it was adapted by the folks at ncbi for the world wide web. in -- oh i'm sorry, before i get to the next thing, the omim has been steadily adding more information as we go along
so there's more mendelian phenotypes that we recognize and then we also recognize the genes that are responsible for them and other genes have known function which is what our main remit is. and so, as of april 3rd there were 22,283 entries in omim, describing genes and phenotypes and i'm going into detail about that in a moment. ok. so, in the very end of 2010 really beginning of january 2011, omim migrated to a new website which was paid for and it's hosted at john -- actually hosted at ucsc but developed but johns hopkins and that website is omim.org. ncbi still gets all the data in omim and indexes it but once you do a search and a retrieval you will be taken to omim.org because that's where really all the data reside. and there are-- there's improved functionality in the omim.org search retrievals. so i'm going to show some of that.
so the first thing i'm going to show you with this mouse to point and if i get things wrong i'm sorry, is that across every page there's this -- stop moving, this header and right now i'm going to go to statistics. and on the statistics' page you can see how omim is doing in terms of the type of entries that we have. and i'm going to go across this from the left. an asterisk entry is the description of the gene, a plus sign is an entry that has a gene and phenotype combined. that number used to be much higher in the several hundreds. we've been eliminating them gradually. some -- it takes a while because some of them were big hairy monsters that describe sort of x-linked adrenoleukodystrophy and the gene and, you know, it takes three weeks for a person to sort those into two different entries. all of those are gone. the ones that are left are primarily extraordinary rare disorders that describe -- where there's maybe one person in the world with it and it sort of seems funny to make an entry for a person.
so that's what's left but eventually they'll be completely separated, there will be no plus entries. a number sign is the phenotype description where the molecular basis is known and i'll show some examples in a moment. and that percent sign is a phenotype description or locus where the molecular basis is unknown. it may be mapped and we don't know the gene,it may be completely unmapped but it's a phenotype. and then, there's a number of entries that don't have any symbol before them and what they are is a phenotype that is suspected but not certain to be mendelian and also some weird stuff. like if there's a gene in a mouse that humans don't have but it's kind of interesting for someway we might actually describe that in omim. and because it's a completely free text database without any rigid structures that people have to adhere to, we can do whatever we want which is very nice. and you can read about it if you're interested.
ok. and as of this morning i guess there were 14,704 genes and about 7,665 phenotypes described in omim. the other thing you can do from the statistics page is go to the gene map statistics. so omim keeps a catalog of its own, all the genes that are within omim and the number of genes that cause disorders and the number of disorders caused by genes, and this is molecularized disorders. and as of, again, this morning or yesterday morning there were 5,227 phenotypes. most of which are single-gene mendelian disorders but some of which are traits, like ptc tasting or skin pigmentation that where the molecular basis is known and then those are described by 3,212 different genes today. so still a very small percentage of the genome that we know that there's a phenotype associated with. this is just for people who like to see how we're doing in the world.
this is the growth of the gene-phen relationships as of collected on april 7th. so, you know, we're making progress. i mean, we weren't making very much progress at all in the beginning, so we're getting there. and this is a different presentation of the opposite data which is the growth of disease genes over time. again, making progress but still fairly linear, not exponential. now, the other thing you can find -- the last thing you can find from the statistics page of omim is an update list, which goes back to the beginning of being transitioned to the internet since from 1995 forward. and you can just see how things are going month by month. you can get this -- there's a little rss feed if you want to know how we're doing and then at the end of this, i'll show you some new functionality you want to track things in omim.
ok. so, omim has gene entries and phenotype entries. i'm going to show you a little bit about what's different about each and i'm going to first just go through the arrows on the left, rather than pointing to everything i'm going to just follow along. so on the top is the mim number -- the omim number. and that is assigned, the unique identifier, and in general, never changes, in general. occasionally it changes, but for a gene it'll stay the same and for a phenotype gets its name first. the phenotype -- i think there were plus signed entry that the number will stay with the phenotype and the gene will a get new number. so -- and this has an asterisk before which says it's a gene. this is the name of the gene, ataxia telangiectasia and rad3-related or atr. right at the -- sorry, second arrow down shows that we -- show you the hgnc-approved gene symbol and what that is, that's a live link. if you click on that you go to hgnc's website to find out what they have to say about that.
we have cytogenetic location and genomic coordinates link on these -- that link on the cytogenetic location will take you to omim's gene map, a link on the genomic coordinates will take you to the ucsc browser. and for those of you who don't know, i'm guessing most of the people in this room do know. there's actually a little difference between the ncbi coordinates and the ucsc coordinates. and sometimes it's a big deal and sometimes it's a little deal, but it is -- there is a difference for some genes. the next thing, the next arrow on the left is a brief table that shows you the gene, the phenotype relationship. so, here we're in the gene entry and it says that this gene causes seckel syndrome-1, ok, and that's a live link to description of the phenotype. and that's a very nice quick view for somebody who wants to just literally spend a second to say, does this gene cause any phenotype? it's right there. and then, over on the right hand side there's a table of contents
and there's a table view for allelic variants in this entry, which i'm not going to take you to right now. and up in the far, far right there is google language translator which, you know, that doesn't do a great job but if your only language is chinese or portuguese and english is really hard for you it'll -- it's better than nothing, i think. and if i'm wrong please let me know. ok. so this is the phenotype entry and it has some differences. again starts with the mim number. it's going to start with -- it will have a number sign in front of it 'cause it's a phenotype entry. and this is one that has a gene that we know about, so it has a number sign. and again, going down on left are alternate titles and symbols for this particular phenotype and then this little table which is now the opposite, right, of the phenotype gene relationship.
and then there's phenotypic series and clinical synopses which i'm going to describe to you in more detail in just a second. the other thing is that we have icd and snomed ct codes, if they're known and available for that condition. so, there are very few icd-9 codes for most genetic, rare genetic diseases. so there might be an icd-10 code, there might be a snomed ct code. if they're there, if they're known they're there, ok? another quick thing to help people who are searching. ok. so, what kind of phenotypes do we catalog in omim? single gene mendelian disorders, those are easy, phenotypic traits, i mentioned a minute ago including -- and then susceptibility to drug reactions for instance, malignant hyperthermia, warfarin sensitivity and others. altered susceptibility or reaction to infection, for example, herpes simplex encephalitis which turns out to be an autosomal recessive condition, you won't manifest unless you're exposed to herpes but if you're exposed
to herpes and you have this condition you will have encephalitis. progression of hiv infection to aids or lack of progression of hiv infection to aids, which are single gene disorders or benefits, i guess, depending on how you look at it, then germline susceptibility to cancers. so we are -- we absolutely catalog all germline susceptibility to cancers. and we also catalog the genes that contribute to cancer phenotypes with -- that are not necessarily germline, they may be somatic, we won't necessarily catalog those allelic variants, the mutations responsible, but we will catalog that relationship. and then recurrent deletion and duplication syndromes, ok? any questions up to this point? nope. ok. now, you all may not live in this space but i do, so i just want you to know that it is not easy to define a phenotype or a disease. it's actually really, really, really hard. and there are many reasons for them and i've listed some on the slide,
and i'm actually going to read them to you because you need to think about my troubles in life. so, there are changes in diagnostic modalities over time. so, for instance, loeys-dietz syndrome, i don't know if you're familiar with that, it's a condition in which you have aortic root dilation, you have arterial tortuosity, you may have club foot, you may have craniosynostosis. you may have a lot of things but the sort of defining feature that everybody has are this arterial tortuosity. well, until you had an mra or a cta, nobody knew there was arterial tortuosity, you can't see it. ok. so, the fact that you can do something different, i mean that's an example of an imaging modality, you know, until you could do transferrin isoelectric focusing, you didn't know their congenital disorders of glycosylation. you didn't know they were organic acidurias, you could do those tests. so, as we are able to do different --
technologically do different things, we could diagnose different conditions. but we didn't -- they were in front of us but we didn't know how to distinguish one from another. ok. changes in diagnostic criteria over time, differences in medical care and medical intervention in different countries and also cohort effect overtime. so cystic fibrosis, when it was initially described, was a lethal disorder in early childhood with a pancreatic insufficiency. nobody knew that there was any lung disease from cystic fibrosis because you died before you had any. ok. then once you could give pancreatic enzymes then suddenly there's this terrible lung disease and that's actually listed in icd-10 as under lung disease because that's what's going to kill you, which is the way they make their decisions about where to categorize multisystem condition. ok. cultural considerations.
if you're -- there are papers describing a phenotype and it's prevalent say in saudi arabia and none of the people who examined those patients were able to examine the women. so, they could not describe the manifestations in the women, we just don't know. ok. medical subspecialty bias, which i call the blind man and the elephant, it's unbelievable, ok? if something is only in ophthalmologic journals, you know everything on earth about the eye, i mean, more than you could imagine you could know about the eye. but they did not look at that person's hands or feet at all, let alone listen to their heart or lung. you just don't know. dentists tell you everything there is to know about teeth, it's fantastic. all other doctors tell you nothing about teeth. there could be horrible teeth things, you don't know, you just don't know. ok. ascertainment of the phenotype at different stages of development.
this is a really interesting problem because if initially you're reporting a problem that manifests in newborns you have the phenotype of the newborn. they may develop different and very interesting complications that you don't know about and somebody may be ascertaining a group of, say adolescents. so, they're describing those adolescents, they'll tell you everything that's wrong with those adolescents, they didn't see them as newborns. there are reports that is a new condition, this happens all the time, reports that as a new condition 'cause it's very interesting, very unusual, never seen before. and in fact it is -- it's not the same patient but the same disorder as what was seen in the newborns but it's a completely -- you can't recognize it because it's different or sometimes maybe they're just trying to pull the wool over the reviewers' eyes, i'm not sure which way that goes, but that's a real -- that real issue, very interesting issue. and then there are problems with phenotypic diversity at a locus for instance lamin a, which i think is up to 15 different phenotypes now.
i think it's -- it's always stays at number one, it's going to stay number one so they'll just keep finding more so it can stay number one. and then genetic heterogeneity. so, different genes causing seemingly the same disorder which once you know that, you actually start to see differences in those phenotypes but you have to really know it and look at it and segregate people by the gene responsible. ok. so those are my challenges and i love them, but they do make life difficult. ok. so, how do we deal with all this? so, there's lumping and splitting of phenotypes. in general, in omim, we split based on the gene. ok. so -- and this is always the example that we use, long qt, because long qt syndrome and what could be easier than long qt? you do an ekg, there's a long qt, that's it, that's all, what's the point? but actually once we knew the different genetic -- and i remember having a conversation with dr. mckusick in the '90s saying, "oh come on. you really want to split these, it's long qt."
he's like, "well, now you could do a test and you'll know that one is different than another." and over time we found out that actually which gene is responsible absolutely dictates how you want to treat them, the risk, what the precipitants are for an arrhythmic event and whether you need to do -- you'd implant a defibrillator, you know what the risk is of actually having an arrhythmia. so, it's completely different. there's -- all of these is long qt but it totally different disorders. so, you know, so we, in general, split. ok. and we make phenotypic series so people can try to find what we've done with related phenotypes. so this is a relatively new thing that we've added probably in the last three or four years and actually that -- those two numbers are wrong, it's more than 280 phenotypic series comprising about 2,700 different conditions. and it's a nice way to look at things over all because if you just stumble into one spinocerebellar ataxia don't click on the
phenotypic series link. it can be -- you know, you don't want to go through 27 or 30, or whatever the number is, entries and read them all. ok. now, in terms of allelic variants in omim. so, omim has never been and will never be the place to catalog all variants. ok. that's clinvar's, dbsnp's job. and hgmd catalogs all published mutations, omim has always only reported a selected few. and the way we decide what those selected few are which seems random if you're coming after a period of time. it's not random, it was initially made sense. so, we'll pick the first mutation or mutations to be discovered. and it may not -- if the first paper reports 13 different mutations we may not -- we probably will not put in 13 mutations. if i'm doing that paper i would probably put in the few that are recurrent and then if maybe one is a missense and one is a nonsense and one is a frameshift i might put -- to represent that there's different classes of mutation that can occur causing the same phenotype in that gene. and we will report high population frequency mutations, a distinctive phenotype.
so, if this mutation causes cystic fibrosis and that mutation causes only congenital bilateral absenceof the vas deferens those would definitely be put in as representative different mutations, mutations of historic significance, unusual mechanisms of mutation, unusual pathogenetic mechanisms and distinctive inheritance. so, we will certainly -- if something is recessive and something dominant from the same gene we would certainly put an example of each of those, but it is just an example. the bulk of the genes with variants in omim have less than 10 variants. so, it is really selected example, don't go there for a comprehensive view. ok. the other thing that we will do is if in the literature -- and by the way everything in omim is based on the biomedical literature. we -- it's all -- if isn't published, it's not there. ok. now, there may be some links to certain group efforts that have not gotten to publication that we might represent but it'll still have a citation, it won't exist just in, you know, in a vacuum. so, when a variant is reported --
initially reported which is causing a disease and there's a subsequent paper that says no, it actually doesn't cause a disease or whatever the changes we will keep that variant and change its title to reclassify the variant of uncertain significance and explain in the text what we learned over time.ok. we also have clinical synopses, which is sort of a quick view of what's wrong with that -- with patients who have that condition. and it's organized in a kind of head-to-toe organ system way. the category and subcategory headings are fixed.they do not change. those are actually controlled vocabulary. the features are semi-controlled, some of them are quite controlled like inheritance has seven options or eight options and that's all, and some of them are more variable.but there are -- all of these terms are -- i'm sorry, i'm going to give you one other thing before i go to the -- there's a -- sorry, let me go back one, excuse me, sorry.
so this is the full view going from head-to-toe and you just open the whole thing and look at it. there is a quick view which we've developed from omim.org where you can click on the clinical -- so, when you -- i've searched for progeria, i got my retrieval. i look -- i searched -- i picked the clinical synopsis on the far right of that return and then what i can do is i see that main headings of each of those phenotypes only now to lamin a/c [phonetic] away because that's a gene. these are phenotypes only and only phenotypes that have a clinical synopsis. it goes by organ system and i can mouse over and see what the features are. so, if i'm in clinic, i spend a lot time in clinic. so i'm in clinic seeing a patient. i can very quickly look through this and say "yep, nope, yep, nope" and just get through it quickly and figure out what might be appropriate -- what my patient might have. now, for those of you trying to mine data, computationally there are two ways to do this.
one is for people who have a particular interest in just one disorder. you can select that display clinical ids and this shows you the -- it happened to be noonan syndrome here. i'd switched disorders but doesn't matter. it has all the different phenotypic terminology terms. so, it has snomed ct terms, umls terms, hpo terms. it will have -- there's an international effort, called international consortium for human phenotype terminologies and it's trying to make mapping across all these so that everybody can talk to everybody. and we've actually met and we've decided on 2,300 terms that are enough to describe a person. they're not enough to describe the depths of any disorder, you know, they won't do that but they will describe a person that should allow interoperability across terminologies. anyway, these are available on this page and they're obviously available in a minable way from our api which i'll get to in a minute.
ok. so, in addition to what we do -- reading and writing about the literature -- we have copious links from all of our entries. and all links go directly to the relevant page of the other resource. so, if you're in cftr you're going to go to cftr. you're going to -- you know, et cetera. and the link only exists if there is a relevant page in the other resource. so, only those links that are live and useful, hopefully useful, are shown. ok. and we have lots of them. so, we have genome links, we have dna links, protein links. we have information about genes, variation, clinical resources. and again, not all of those will be -- in some entries all of those will be present, but for most of them it's the ones that actually have something useful in it, a relevant similar page. ok. and then we have animal models and cell lines and pathways. so, there's lots and lots of links, it's a really very easy place to get to a lot of places in the genome.
the other thing that we can do is a genomics coordinate search. so, if you've done a chromosome microarray and you get back a deletion or duplication you can put in that interval and see what genes are in that interval and you get -- let's see if i can show this to you. you get the omim gene map. this is the omim gene map. this gives you genomic -- i'm sorry, i just cannot control this mouse. the genomic coordinate search which will take you directly to ucsc, if you click on that, it gives you the gene name, the -- i'm sorry, the locus, the gene id, the name. it takes you to any related phenotype. and if you limit your retrieval to phenotype only -- i'm sorry, i went to ncbi here. i'm sorry. i do not go to ncbi. i went to ucsc and showed you the tracks. and that will take you -- if you pick it from the omim gene map this is the default view that you'll get of ucsc,
unless you got default of your own in your computer and it won't overwrite your own default. but if you don't have one it'll take you to this view,which is nice because it shows you the tracks that are relevant to omim. and let me go back one to show you one more thing. so, if you click on the phenotype only series it'll show you only those genes that have a phenotype associate with [inaudible] gene interval. if you want to see the genes in the interval you can -- and again, the only genes that you'll see from this view are the genes that are in omim. so, you should always -- once you looked in here, go out to whatever your favorite browser is and make sure that you see whether there's any other genes in that interval that might be important to you that we haven't gotten to or that you think might be interesting to look at for some reason. ok. questions before i go on to the next thing?
nope. ok. so, in this case, i just want to show you that we've enabled a thesaurus. so, i typed in dwarfism and it gives you options of other related words if you want to expand the search in case it went in a different way. and there are -- so this is really thesaurus. there are already synonyms embedded in our vocabulary. so, gallstones, we'll search cholelithiasis. you don't have to ask for it, it won't come up in a thesaurus search because it's a synonym. ok. alrighty. and then the other thing is once you've done the search which the terms that you've searched on that matched are highlighted and it shows what all the terms that matched. so, we do -- we default to an or search which drives some people crazy 'cause they want the two answers. just so you know, google does an or search. ok. it's not that we're doing anything weird,
it's just the people want two answers so they've got the disease they want and it makes them very upset. but because it's highlighting the terms that match it's always the more terms that match, the higher the likelihood and so it's at the top of your search. you don't have to go to page 530 -- well, you won't go to 537 anyway, but, you know, it's there, it's really there. it's right on top, everybody relax. ok. now, we have an api, which supports requests over http. it has all of the functionality of the website, everything. you can do anything from the front end, from the back end. you can retrieve all or part of an omim entry. it will support batch queries. retrievals are available in xml, json, python, or ruby formats. and there's excellent online help documentation which i'm not saying 'cause i've used this, but because others have used it have said so 'cause i've not used it. ok. and i didn't write it either. so, this is just an example of the result that you'll get.
and this is a way to do a clinical synopses search. and if you want to get the different phenotype ids because you want to do a cross, you know, query, it's all right there. it's not hidden, it's available, it's free, it's mapped. ok. and then there's a new feature that i'm going to spend a couple seconds on. for those of you who know omim, it's called mimmatch. i mean it when i say that you guys are very social media savvy, so we're not so social media savvy so we made our own thing. and this will allow you to -- you have to create an account, which is what you do here. and then you receive an email confirmation that your account is there, and you login and set your preferences. you can receive emails about new gene-phen relationship or updates to phenotypic series of interest. you can also follow a gene or phenotype by searching omim and selecting that gene.
so, if you -- you're passionate about -- actually i think that's what my next slide is. this just shows me that i've created an account. and what this says is that i can select to notify me when a new gene-phen relationship is cataloged. so, i get an email in the morning that tells me whatever we've added as a new gene-phen relationship if there's one that happened overnight. and anything you select from mimmatch generates a single email per day. you will not be flogged. one email per day if there's something we need to tell you, not if there isn't, ok? you can follow phenotypic series and this is the alphabetical list of the phenotypic series. you can go through and select what you like. and here, i typed in brca1. i'm sorry. let me show you, at the upper left, i typed in brca1. and i returned. i'm now in omim, logged in to mimmatch.
ok. and what i can do is select mimmatch and it will pop up this window. it says notify me on update. so, i want everything that omim adds to brca1, it will be there. i can also share my interest. so, i wouldn't recommend this for brca 1 because you may get too many answers. but if you happen to have a passionate interest in a gene that, you know, there's three people in the world studying about, but you're really interested in either finding other people who are interested in that gene, they've got an animal model or people who are interested in -- have patients with that condition if you're looking at a phenotype, you can actually find each other that way and only if you want to. you don't have to do this, this is totally optional. and it generates emails to each other and we have no knowledge of what's going on. ok. we do not know what's happening with this. and in this case, i've gone into the brca1 entry and i'm still logged into mimmatch so what you can see in the lower right hand corner is --
besides doing it on that initial page, i can here say notify me on update and share my interest if you want, totally optional. now, the other thing i want to share with you all is how we -- what we are doing with the literature to establish gene-phen relationships, which is an increasingly challenging problem right now because there are way too many journals and people can publish anything and they say what they like. and it's our job to decide whether what they're saying is true. and it's really hard. so basically, a single case or family with no functional data, will either not going at all or if we spent an hour figuring out that it wasn't worth going in, it will go in as a variant of uncertain significance. so that in case somebody finds the second family with the same phenotype, another mutation of that gene or the same mutation of that gene, same phenotype, it will -- they will both bubble up to being real, but we're not giving it any certainty of any kind. we've put it in, in that way. if it is a single multiplex family with functional data, an animal model,
they've done enzyme assays, whatever functional data that we believe. ok. we've read it, we believe it, they have it. then, we establish the relationship, meaning we put a number sign in front of a phenotype and we put in that variant and we put the phenotype title on that variant. but we will put a parenthetical one family after it. meaning, caveat emptor, it's reported in one family, ok? and we put a question mark in the gene map. so the gene map has our list of all the, you know, gene-phen relationships. it will have a question mark 'cause it's only one family. ok. when there are multiple families with mutations in the same gene and the same phenotype obviously there's no caveat, you just put it in. and what we're going to do in the next year -- and we've been doing this, by the way, since january of 2013. prior to that, you know, it was "brave new world" and nothing with quite as clear, so we will go back and review every entry in omim where there's
only one or maybe it's recessive two allelic variants that purports to be the cause of a phenotype to make sure that they adhere to this criteria. ok. so, that's [inaudible] we'll get it hopefully by the end of this year. ok. questions about that? you guys have no questions. am i making sense? ok. wait until the end. fine. no problem. ok. so, you know, i'm talking ncbi, i thought should say something about cancer in omim. so as i said earlier, we catalog all mendelian germline cancer syndromes. we -- that's our job. and if we're not doing it, please let us know 'cause we need to fix that right away. we catalog cancers with recurrent genetic contribution. we catalog the fact that like there will be a, i think i have a screen capture later, like prostate cancer. we will have prostate cancer heredity -- hereditary with a number of different numbers after it and then the genes that seem to be contributing to that.
they aren't really familial and i'm not sure they should stay hereditary that's a whole another discussion we have to have. but, we will be cataloging both the cancer and the gene and its relationship, but not with a number sign, we won't have a number sign. we won't necessarily catalog the allelic variants that are somatic unless there's lots of biology. ok. and for example, the braf v600e mutation, which i'm just going to show you, i went into the braf entry. oh, that migrated, that's not supposed to be up by molecular genetics, it's been circled. i wonder, it's supposed to be under allelic variants. ok. so, if i click that -- interesting, i don't know how that got there. if i click allelic variants this takes me to the top of the allelic variant list and this is -- that val600glu which is the way we use old nomenclature because we have 22,483 variants and we're not changing them.
but it takes you straight to dbsnp, it takes you straight to clinvar, and this actually goes on for about four pages about this variant, because there's a lot of biology. and it's organized by what -- which kind of cancer is playing a role and then so on so it's in there. if you don't want to read, you -- there's a table view which is available there -- i'm sorry, at the top, in the little button webcast and also right under allelic variants in the table of contents. and that will take you to this view which is for people who like tables and not to read. and there's a link directly to clinvar as well as the dbsnp if there's an rs number. and it shows you here that the first one is this -- the somatic mutation and malignant melanoma. by the way if something is somatic only, it will say comma somatic.
if something is present, it needs to be mosaic to manifest like, mccune-albright or another condition like that, where if you were -- if you have that in every cell in your body, you wouldn't exist, you must be mosaic to have the disease and be on the planet. it shows -- it has a comma mosaic. ok. but -- so here, we've got a bunch of colorectal cancers, malignant melanoma and other cancers caused by, to which mutations of this contribute to disease. and then also cut off at the bottom was cardiofaciocutaneous syndrome which is the germline mutation. but before, you can see this is the variant number 12. so, we had plenty of variants in this gene. they were cancer linked only before we got to a germline mutation causing a disease, ok? all right. in here i put in just a search for cancer, i'm not sure i recommend that for anybody.
i did a search for cancer and then i saw with the source terms are which are carcinoma tumor, neoplasia. so, if i do cancer alone i get 3,279 entries in omim that answer to cancer and if i add all the terms in the thesaurus, there are 5,206 entries in omim that have some allusion to cancer carcinoma, neoplasia, or tumor. so, you know, it's about a little bit less than a quarter of the entries in omim. so, there -- i just told you the first line, there 574 of those entries have a clinical synopsis, 4,518 have a map position, 1,505 are on the map with a corresponding phenotype. i did that by doing that phenotype only search in the gene map and there are 13 phenotypic series that comprise 45 known genes that are cancer syndromes such as li-fraumeni, hereditary nonpolyposis colon cancer, et cetera. and that's dr. mckusick and those are the people that worked on this project, and these are past and present adjunct writers. and i can tell you other things but i'd like to take your questions on omim first.
>> great. let's thank dr. hamosh for the very interesting presentation. [ applause ] and i'll open the floor to questions. so, if you were in the room, please use the microphones in front of you. just hit the button there that says, push, hit it once and then make sure to turn it off after you're done. if you're on the webex you can raise your hand on the webex dashboard or simply type something into the chat box and we'll unmute your line. so, questions in the room? >> so, that was very interesting. i'm an early omim user. i actually have some of the hard bound books signed by victor. but anyway, so one question i have. i deal with the disease that i'm sure you know called fanconi anemia.
many journals now require if you're describing a genetic condition that you put in the omim number. the problem is there are many numbers, how do you choose which one to use for publication? >> so, if you're talking about a fanconi anemia and patients have mutations at a particular gene? >> no, i'm talking about in general. i'm going to write a paper about fanconi anemia and i'm gonna tell you about all the mutations in my cohort. >> one. so if you're talking about fanconi anemia alone -- >> right. >> let's back up. so, when there's a number, a number of numbers, so number fanconi anemia,
the number of noonan syndrome, and number of whatever the one numbered one -- >> ok. is this new? >> no, it's been forever, is always the one that discusses genetic heterogeneity and it discusses kind of the overview of the condition. >> right. i know. if i go in the omim and i type in fanconi anemia it comes up with fanc a, which is the most common, but it doesn't come out with -- unless i'm searching wrong. >> that's because fanconi anemia is different 'cause it -- yeah, right. thank you. this all, is [inaudible] or whatever, yes.
>> yes. >> well, a is equivalent of one. >> ok. i was gonna say there is no one. >> but for the others it would be one like long qt 1 or noonan 1 -- >> the first one. >> -- that's where the kind of discussion of the overview is. and if you're not sure it'll be the one that says genetic heterogeneity, 'cause it'll be a big paragraph that says genetic heterogeneity and describes it. and then after that use the phenotype caused by mutations in that gene 'cause it is a one to one.
and also in the clinical synopses, when we don't know the gene responsible for disorder, obviously everything will be in the clinical synopses. once we understand that there's a gene -- you know, we know the gene responsible for 1 or many. that clinical synopsis will only include mutation positive patients so that we're not confused. ok. so, well actually narrow the clinical synopsis down to what we know with patients with the mutation, if that makes sense. >> ok. so, another example is dyskeratosis congenita. >> yup. >> which has three modes of inheritance? >> x-linked, autosomal dominant, autosomal recessive. >> we have all three.
>> and again, it's one of the older ones, so i don't think it has a one, it comes up with the x-linked which used to be the most common but probably isn't. >> isn't anymore, yes. why are you asking me such difficult questions? [laughter] >> it really worked. >> i would -- so are you talking about dyskeratosis congenita in general? >> yeah. again, you know, i'm going to say, "oh let's talk about this disease." >> i've then would list -- i would say which is --which has three different modes of inheritance.
>> and then comma -- >> have each one of them? >> yeah. >> ok. >> i would because they're different -- they're somewhat different conditions depending on a mode of inheritance and what happened -- >> i mean, even fanconi actually has -- >> yes.>> -- an x-link in one case but we ignore that. >> yeah, yeah. >> so, that's what i would do.
>> ok. and then if i can carry on, are there other people or shall i continue? >> ok. and then, if i can carry on are there other people or shall i continue? i have one other issue. >> sure, sure. >> so-- and i have not seen the most recent version. but as i recall at least in the past, the description started out with phenotypic descriptions of patients before there were genes. and then, there would be a citation to an old paper and then there would be, you know, so and so did this and here's the new citation. now, we got a gene and here's another citation, and on and on. and you're reading through this long, long list. has that been curated or is that still there with the entire history of the reporting of that disorder?
>> ok. so, first of all, almost all, not all yet, but almost all, and very soon all, entries begin with a paragraph that says a description. just brief -- a paragraph that describes the condition or the gene, like the thumbnail that somebody wants to have. then again, i think we probably have another 2,500 or so entries that have not had been -- had structure added but that means 20,000 have, ok. and they -- so, they -- depending on whether it's a phenotype entry or gene entry they have different headings and they're broken into those headings. so, clinical features will live in one place, inheritance live in another place, management live in another place, you know, or gene function and molecular genetics and so on, to figure out depending on what kind of entries. those are under it, after that it is still chronological. and, you know, i know everybody would like omim to be beautiful reviews and i too would love to read a beautiful review, it's impossible, it's impossible.
so, what we will do is when we're cleaning up an entry we will go and stick some things under history like the stuff that's wrong, you know, we stick under history. we don't actually get rid of anything, but we stick it under history. you know, say previously thought people that blah, blah, blah. but we don't throw anything out because there are number of instances some orphan entries and -- orphan phenotypes in omim, where there's somebody --a family described in 1953 with condition x, right? well, there were no -- we had no molecular techniques in 1953. that family may or may not have not another affected member that will be reported on later. but -- so, we just don't get rid of it or somebody might not have that family but a different family and actually that phenotype matches that first one and we'll put the new family with the new molecular knowledge that we have into that old entry.
so, we actually kind of get rid of nothing, occasionally we consolidate an entry that two things that we thought were different are the same and we'll combine them because they're one thing now. but nothing goes away, ever really. yeah? >> i manage to try to develop a hyper card application looking at mendelian inheritance in man back in 1991. at that time a very high percentage i forget what it was but in the range of 10 to 30% of the references in mendelian inheritance in man were personal correspondence to victor. they're gone. >> they've been gone. >> ok. 'cause i was going to say if they're still there do you have the original letters? >> they've been gone i would say at least 15 years.
>> they were gradually removed from like mid-1990 on and i would say that they're gone. >> ok, thanks. >> i actually have an example of that which is that at one time in fanconi anemia there was a term of congenital aplastic anemia and -- or congenital hypoplastic anemia. that's the term that lou diamond used for diamond-blackfan anemia and it was in omim, in an early version for fanconi anemia. so, i wrote a letter to dr. mckusick and that came into the next omim which is dr. so and so says that the term doesn't belong but here it is. so, i'm glad to hear it's gone. >> yes. they're -- yeah, they're all gone and generally they've been replaced by the publication that reports it. you know, i mean, they -- initially they were -- i mean initially mim was dr. mckusick's observation in his very large clinic
out of which he made lots of nosology and distinctions and, you know, described what the different mps's were before we had any enzyme assays et cetera. so, you know, that's what it started as. it isn't and hasn't been for a very long time. and actually he wrote it by himself until about '94 -- by himself. >> are there any questions from the phone? i got a question for you, ada. you may have answered this but i just want to confirm when you were talking about multiple phenotypes and whether it's presenting in different tissues such as in the cfk, so we're talking about or presenting in different developmental stages.
what's the best way to actually get the summary for that? is that in the clinical synopsis? and if so, do you -- is it easiest to get there from the gene? would i go from cftr or from the phenotype? >> you can't get to a clinical synopses from a gene. >> you can't, ok. >> you cannot, so that's easy. we can't do that.it's from the phenotype. and the -- a nuanced discussion of what manifests where is actually, you know, it's in the language and under the clinical features of the phenotype. it will be -- if there are specific things that, you know, manifesting a neonatal period, manifesting later, that would be in the clinical synopsis as well. but, you know, some of these things are really nuanced and no one wants to read,
i totally understand that no one wants to read but sometimes you have to read it. so, we -- i mean, i think it's funny, i think at this point omim may be our lab notebook, like for us to know what we thought when we put it in and what we need to do next. it is kind of useful but, you know, on the other hand, if you go into an entry that has, is 30 pages long without features of which there are none anymore, that's really painful. so, there are headings, it is divided and, you know, if it's gene you really care about or a phenotype you really need to learn about sometimes reading the mim entry is actually quite useful, so. >> i would like to go back to your definitions of what constitutes a phenotype. and i wouldn't mind if you brought that slide back. >> sure.
>> there are some individuals who think that, you know, everybody is in a unique phenotype and that these -- the types of things that you're talking about as phenotypes are really basically kind of lumps of things that are similar for whatever reason. and so, i wondered if you had any particular thoughts about that because, you know, as we're getting more and more towards precision medicine we're-- >> the second half of this talk which i realize there is no way to do [inaudible] they have a whole bunch. >> i'll come and talk with you afterwards. >> i have a whole bunch of lectures actually about the challenge of standardizing phenotypic features and actually developed a whole tool that is based.so, all of our terminologies up until two years ago were based on phenotyping a disorder. so, the patients who have a disorder, the phenotypic features of that disorder.
if you're doing exome or genome sequencing, you had better know the phenotype of that person and that's a totally different question. so, you need to go head to toe on the person and if they have a cataract, ok, and i don't know, club feet and craniosynostosis, let's give them three different things. the bias of a geneticist is those three things are caused by one gene, that's our bias, that's what we think. occam's razor should be sharp, ok? then you look at an exome or a genome and you find they have crystallin mutation. that crystallin mutation is responsible for their cataract and there's something else responsible for their craniosynostosis and club foot, which maybe a [inaudible] two mutations, who knows, something else. if you don't have the comprehensive phenotype of that person you will not be able to interpret an exome or genome correctly, particularly not if you're doing, you know, tumor germline pairs -- that's even more complicated. so, you have to phenotype a person. that's a real difference and actually, with the thing
that we developed [inaudible] called phenodb which we're using for the center for mendelian genomics. i can show you some slides just a few if you want to hear them. anyway, which goes head to toe, it's sort of based on the omim clinical synopsis. let me just show you some slides, let's just go through. and don't look at your screen for a second and i'll be done. almost there. anyway, and it is a different view is actually the ichpt, that's international consortium of human phenotype ontologies, was developed in order to have that bias of a person because it's not the same. and now, i mean the latest i don't think it's published yet, but the initial paper from the baylor group looking at i think of it as the first 250 exomes 4% of their population had two conditions. the latest data is 5% and my personal experience is like 25%, but i think that's 'cause i only see really, really weird patients now. but, you know, if -- and you don't -- you can't make sense with somebody with two rare genetic conditions because they didn't read any books and we can't sort it,
it's just one person that have too many problemsbut then once you do an exome they're like, "oh, ok, well that's either like that. i get it now." you know. so, anyway this is -- let me get to the picture. so, there were three independent efforts in 2012, two used hpo, i developed phenodb for the baylor-hopkins center for mendelian genomics and it's very nice. and it works, actually, we have over 3,000 families just in the baylor-hopkins instance of phenodb. it's also being used by university of washington for the same effort and by the community portal. and it's been downloaded about 176 times. it's available for anybody who'd like it, it's free. anyway, this is a home, the -- once you create an account, you go in and you say something about who the family is, what kind of disorder you think they might have. and this is why you need detailed phenotyping. this shows that you put in the family members that you have at hand to analyze.
and if you say someone is affected with that condition, whatever that condition is that you think you know or don't, you must add features. and this is the features page which tells you something about when you saw them and what you have to share. and it can upload -- it can upload anything, photograph, images, videos, whatever makes you happy. and then, this is the way the features work. so, there's a higher level, 21 kind of organ systems plus some labs and some in utero abnormalities of this person, its gimmish to try to get both history and a physical exam and a little medical, you know, all in one place. and the goal is that you pick abnormal, normal or unknown for each of these organ systems. so, if you're an ophthalmologist and you only looked in their eyes and you didn't look at any other part, you just say unknown for everything else 'cause you don't know, fine. if you do know, you say what you know.
as you pick abnormal, you go through and you can get down to the level that you understand, ok? so, for another thing let's say, you know, they have something wrong with their iris. you don't know what that is 'cause you're not an ophthalmologist, you just know that they look a little funny to you. you can put iris abnormality and stop right there, you know, or if you happen to know with an iris coloboma, you can do that. so, this just lets you click -- oh by the way, the other two things it does in the middle, upper, you -- as you select features, it shows you the features you selected to make sure you didn't leave something out, and it starts to do an omim search to see if there's a disorder that matches what you've put in, ok? and then, the other way you can do this if you don't want to go clicking through things 'cause life is short, you know exactly what's wrong with your patient, you can just type in and then this case, i typed in coloboma. it starts to bring -- give you what the options are that match.
you select one and it immediately fills what you selected and everything up. so, you can -- i can search across this if i want to find everybody with iris coloboma i can search all of phenodb for iris coloboma. but if i say, you know, maybe some people who entered this aren't this sophisticated, let me look for anybody that has an iris problem. i can search at that level and it automatically goes up the hierarchy. so, it's -- you don't have to click if you don't want to click. ok. and then, there's also an analysis tool which is actually more appropriately called a filtering and interpretation tool, which is very useful. this whole thing is really available, phenodb.net or phenodb.org. i'll tell you the difference right now. phenodb.net is sort of a research tool in a box. so, it has the phenotype module i just showed you. it has an lc module. if you want to keep track of the consent forms that people have submitted
and whether they are able to be used in your study or not, and it actually has a box for people to comment on the validity of the consent, i'm not part of that module so i can't even see it. it also has restrictions per module depending on the user which your system's administrator would apply. it has a sample tracking module that is for the project. so, it's not a lims module, it's a -- did i get the consent? did i get the blood? did i turn it into dna? that kind of thing. is it in the sequencing lab? did it come out? where is this -- where are we in this project? and then the last module is this analysis module which takes, starts from a vcs. convert the vcs into a annovar file, so it doesn't do the initial --
it doesn't take a fast queue, turn it into a bam file, it takes it from and then turns to vcs, it takes the vcs, turns it into an annovar file and then allows you to do a filtering depending on mode of inheritance, minor allele frequency, which builds the dbsnp, whatever makes you happy. all those things are built in. and then it adds biological tools for you to validate the interpretation. so, these are just people, this is the kind of results you can have in here. again, the store and all its web based tool and this is the form of inheritance, the -- who you want to sequence. i'm sorry, let me go back one. here, you can pick what kind of mutation you're looking for. again, minor allele frequency, indel span [phonetic]. and then, this gives you what you did and stores it,so you can go back and say, "what did i do? i don't remember what i did."
it's right there. you don't have to remember, it's all right there. and then, shows you how each of your different choices have restricted the number of variants. so, down here, you're down to 22 variants, ok? and then it shows you also mgi, omim, gepsus, gene cards, gene ontology, clinvar, uniprot, all those things are already built in as live links so that you can go just click on them and see what they say about it whether they're expressing the right issue whether it's appropriate for you, et cetera. that's what that shows you. and it also allows you to -- it saves all the analyses, allows you to compare different analyses. and in this case, it wants -- we're looking at a cohort and you could want to compare who's got -- who shares variants in the same gene, it does that for you. so, it's a nice tool.
so, that's -- phenodb.net has also four modules. phenodb.org gets rid of the sample tracking module. and the lc module, has the phenotype module and the analysis module and it can be toggled in two ways. it can be toggled as fully de-identified and you can't upload a thing 'cause you might not have erased the identifiers or fully identified for use in your own clinic or your own -- what clinical lab or so on, where you want to keep track of everything. so, we use it in our clinic and we -- it shows, you know, which attending, which resident or fellow, which counselor. it has all the phi. and it also -- the other thing is that in the phenodb.net and the research tool, it has a million different states 'cause you're in the process of trying to figure out what's going on. for the clinical one, it has three options, in progress, i've seen the patient have nothing back yet. solved, i know what's wrong with patient, or unsolved.
and we're using it because for the patients in whom we sent whole exome, sometimes they have an answer, and many times they don't. and so, we want to be able to reanalyze as time goes on, and we know more disease genes to see if we can actually solve that patient without spending any more money. so, now, i was trying to answer a question when i got to showing you all this. did i answer your question? did i answer your question? i did? >> good, that's fantastic. ok. >> it was -- and maybe we can close with a bit more of a speculative question building upon that. but you've started to address in a very pragmatic way talking about what -- the work you're doing with phenodb here.
but, i mean clearly have this -- we have this collection of databases today from omim, clinvar, dbsnp that are going, you know, perhaps from, you know, very highly characterized to less well characterized data sets. i think you've hit the nail on the head in terms of our goal in precision medicine wanting to have a comprehensively phenotyped individual. so, it's really required for successful precision medicine. what's your vision of the future where we have to analyze a collection of, you know, of germline and somatic mutations and a myriad different types of somatic mutations, in terms of understanding holistically that individual? what are the barriers to getting to there? and what do you see is the future solution to that? ten years from now, 20 years from now, where will we be? >> i want to be [inaudible]. so, here's what we need. i know what we need. and i know that we should be looking that way and not mired in the details of what's wrong today 'cause it's just stupid. and i don't know if any of you have to use epic, but if you know, your life is good, mine is bad.
ok. so, what epic has, which was built for billing, is nothing on standardized phenotypic information. you could sort on lots of diagnosis terms, but no phenotypic features, nothing at all, no standardized phenotypic features, nothing. and no -- nothing genomic, not a single genomic indicator, nothing. and the family history in epic, i don't know if you've ever seen it, but it is nothing to be proud of. so, what we need is standardized phenotypic features that can be evolved over time where you don't get rid of anything. so, if you're born with cryptorchidism, it doesn't matter that someone surgically corrected, you had cryptorchidism, it doesn't change, ok? i mean you can see -- be correct in some place in the medical record, but not in a phenotypic feature that you have a living, breathing layered family history. so, if you're an oncologist and all you care about is their cancer family history, that's all you see.
if you're a cardiologist and that's all you care about, that's all you see. if you're a geneticist, and you're a fool and you want to see every damn thing, you see it, you know. and it is living, so as people add things about other relatives, it shows up, not sure how to do that in the hipaa environment in the united states, but that would be great. and then the last component is the whole genome and/or whatever tumors you have come along or other somatic things that you're looking at. and that it all lives there and it's queryable by the user for what they want. and for that user who doesn't know enough to query, it flags what's really important to come to their attention. so, i don't want to know the variants that are responsible for different pharmacologic, you know, who's a fastest one to publish. you know, i do not want to know. i don't want to know hla-dr [inaudible], i can't do it, ok? my brain doesn't do that. but, if there's something important in that patient's genome, that means i need to ask a different question
or do a different study or send a different test or not prescribe that medication, i would like it to come up in front of my eyes in big red bold letters so i won't miss it. so, that's what we need. i think we can get there, but we need -- nobody who's building electronic health records is even thinking about this stuff. >> well, thanks for that answer. i'm afraid we're out of time. so, i want to thank you again for this very intriguing presentation and thanks to everybody who's joined us that are in the room and online. and we hope you can join us to our next presentation which will be on june 11th and it will be dr. henry francis from the fda.
Tidak ada komentar
Posting Komentar