Dr. Lars Juhl Jensen
EMBL-Heidelberg, Heidelberg, Germany
Biomedical literature mining (and why we really need Open Access)
UPDATE: Presentation (PowerPoint) now online. ENDUPDATE
MEDLINE
17 million citations
too much to read -> literature mining (get a computer to read them)
but to do that, you need access to the papers
discipline: info retrieval - finding the papers
ad hoc retrieval
MEDLINE - abstracts only
but would like to run on full text
next discipline: entity recognition
need synonyms / mapping lists - manual
plus orthographic variation
ihop
http://www.ihop-net.org/UniPub/iHOP/
discipline: information extraction
formalizing the facts - turning text into databases
Jensen et al Nature Reviews Genetics 2006
new discoveries - text mining
http://arrowsmith.psych.uic.edu/arrowsmith_uic/
mining temporal trends
timeline of buzzwords
integration of text and data
genotype to phenotype
Korbel et al PLoS Biology 2005 heatmap
UPDATE: I'm almost certain he's referencing
Korbel JO, Doerks T, Jensen LJ, Perez-Iratxeta C, Kaczanowski S, et al. (2005) Systematic Association of Genes to Phenotypes by Genome and Literature Mining. PLoS Biol 3(5): e134 doi:10.1371/journal.pbio.0030134
ENDUPDATE
where are we now?
the tools are there... we need the text
Q: how are researchers using tools?
A: unfortunately many of them aren't aware the tools exist
Q: copyright obstacles - collections of abstracts copyrighted (protection of database) - is this a problem?
full text - could authors prepare a second abstract for literature mining specifically?
A: extraction of facts... isn't really copyright violation
rather than having second abstract, just deposit semantic information and data directly into a database
Q: how does this relate to Biomart?
http://www.biomart.org/
A: they are trying to glue together different data sources
Comments