Research + Bio
My research is in the area of text mining.
I have focused on developing and evaluating text mining systems - both algorithms and user interfaces. More specifically I have worked on identifying information for database annotation, entity extraction and entity relations extraction from text, summarization, document clustering and categorization as well as identifying useful features in computer interfaces that allow end users to communicate efficiently with text mining systems. All aim at making the information hunt in electronic documents faster and more efficient.
Other areas I am interested in include knowledge representation and ontology design, the use of crowdsourcing for biomedical and linguistic studies, and studying the behavior of experts (in particular biomedical scientists), such as the language and media they use to communicate their ideas and findings. I have also combined knowledge acquisition with text mining to compose a better understanding of a specific domain in relation to the beliefs of experts.
Short Papers, Abstracts, Posters, etc
Medelyan, O., Witten, I.H., Divoli, A. and Broekstra, J. (2013) Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures, WIREs Data Mining Knowl Discov, 3: 257-279. doi: 10.1002/widm.1097
Medelyan, O., Manion, S., Broekstra, J., Divoli, A., Huang A.L., and Witten, I. (2013)
Constructing a Focused Taxonomy from a Document Collection, ESWC 2013, Montpellier, France (Conference Paper)
Yao, L., Divoli, A., Mayzus, I., Evans, J.A. and Rzhetsky, A. (2011) Benchmarking Ontologies, Bigger vs Better, PLoS Computational Biology 7(1): e1001055 [PubMed]
Morgan, A.A., Lu, Z., Wang, X., Cohen, A.M., Fluck, J., Ruch, P., Divoli, A., Fundel, A., Leaman, R., Hakenberg, J., Sun, C., Liu, H., Torres, R., Krauthammer, M., Lau, W.W., Liu, H., Hsu, C-N., Schuemie, M., Cohen, K.B., Hirschman, L. (2008)
Overview of BioCreative II gene normalization, Genome Biology, 9(Suppl 2):S3
Smith, L., Tanabe, L.K., Johnson nee Ando, R., Kuo, C-J., Chung, I-F., Hsu, C-N., Lin, Y-S., Klinger, R., Friedrich, C.M., Ganchev, K., Torii, M., Liu, H., Haddow, B., Struble, C.A., Povinelli, R.J., Vlachos, A., Baumgartner, W.A., Hunter, L., Carpenter, B., Tsai, R.T-H., Dai, H-J., Liu, F., Chen, Y., Sun, C., Katrenko, S., Adriaans, P., Blaschke, C., Torres, R., Neves, M., Nakov, P., Divoli, A., Mana-Lopez, M., Mata, J., Wilbur, J.W. (2008)
Overview of BioCreative II gene mention recognition,
Genome Biology, 9(Suppl 2):S2
Hearst, M.A., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M.A. and Ye, J. (2007)
BioText Search Engine: beyond abstract search,
Bioinformatics, 23: 2196-2197
Divoli, A., Hearst, M.A., Nakov, P.I., Schwartz, A. and Ksikes, A. (2006)
BioText Team Report for the TREC 2006 Genomics Track,
The Fifteenth Text REtrieval Conference Proceedings, Gaithersburg, MD, USA
(Participation Report Paper)
Mitchell, A.L.,Divoli, A., Kim, J.-H., Hilario, M., Selimas, I. and Attwood, T.K. (2005)
METIS: Multiple Extraction Techniques for Informative Sentences,
Bioinformatics, 21: 4196-4197
Other Selected Talks (not listed above)
||CS4HS @ Unitec|
I presented a session on
Findability and usability: lessons learnt from text analytics at Google's Computer Science for High Schools Workshop held at Unitec in Auckland. Many thanks to the organizers Mahsa Mohaghegh and Hossein Sarrafzadeh.
||Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures|
Our review paper Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures, by Olena Medelyan, Ian Witten, Anna Divoli and Jeen Broesktra was published in WIREs Data Mining and Knowledge Discovery. In this paper we outline current research on creating abstract, structured, representations of knowledge such as lexicons, taxonomies and ontologies, automatically from document collections.
Constructing a Focused Taxonomy from a Document Collection by Olena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, Anna Lan Huang and Ian Witten was accepted at ESWC. I represented the team at the conference that was held in Montpellier, France.
I attended ShareFEST in Philadelphia. I presented our work Extracting and Mapping SharePoint Content to Create a Custom Taxonomy in the session: Implementation Tools & Technologies
I attended Text Analytics World in San Francisco.
I presented our work on "Automatic Taxonomy Generation for a News Group"
||Kiwi Foo Camp|
I was very happy to be invited back at Kiwi Foo Camp. Much like last year, I had a great time meeting and conversing with several amazing people!
||TAW & HCIR|
I attended the Text Analytics World and HCIR in Boston. My TAW talk in the Text Analytics and Taxonomy session was entitled "How taxonomies and facets bring end-users closer to big data".
I presented results from 4 user studies exploring end-user preferences for taxonomies and facets - both in terms of content and appearance. Taxonomies and facets can be generated automatically but user preferences and needs can (and should) be taken into account.
||CUES at EuroHCIR|
Over the past few months, Matthew Pike, Max Wilson, Alyona Medelyan and I have been working on a system for evaluating user interfaces of websites and desktop applications. We presented this work: CUES: Cognitive Usability Evaluation System at EuroHCIR 2012 in Nijmegen, Netherlands, as a paper and a demo poster. CUES integrates brain derived signals and emotions with other common usability measures, such as interaction logs, screen capture, and think aloud.
||Do Peers See More in a Paper than its Authors?|
Our paper "Do Peers See More in a Paper than its Authors?" by Anna Divoli, Preslav Nakov and Marti A. Hearst was accepted by Advances in Bioinformatics in a special issue on Literature Mining Solutions for Life Science Research. Citation sentences (or citances) are produced by peers and represent subjective points of interest of a paper. In this paper we focus in the area of molecular interactions and compare the content of abstracts (containing the main points of a paper as judged by the authors) and the content of citances (containing the main points of a paper as judged by peers). We use MeSH terms to annotate the content and present a detailed summary of the differences across different information types represented in abstracts and citances - we also examine the effects of other citations and time. We propose that collectively the content of these citances can be used for automatic annotation (assigning relationships among biological entities and concepts) and for a number of NLP tasks such as producing summaries.
||O'Reilly Strata Conference 2012|
My colleague, Alyona Medelyan, and I presented "Mining Unstructured Data: Practical Applications" at the Data Science Session of O'Reilly's Strata Conference, a leading conference for data scientists and analysts. It was a great event with interesting, busy program and fantastic speakers.
||Kiwi Foo Camp|
What an honor, 3 months in New Zealand and I found myself invited at Kiwi Foo Camp, where I met so many intelligent, driven people from all kinds of fields! The meeting took place in Warkworth, the sessions were very versatile, the conversations lively and inspiring, and the werewolves vicious!
||Interview with O'Reilly Radar|
Alyona Medelyan and I were interviewed by the O'Reilly Radar: "Unstructured data is worth the effort when you've got the right tools". The interview was also featured in Forbes: "Unlocking Opportunities in Messy Data"
||Search interface feature evaluation in biosciences @ HCIR|
Our paper, Search interface feature evaluation in biosciences by Anna Divoli and Alyona Medelyan, was accepted for full presentation at the HCIR 2011 Workshop. In this paper we report findings on desirable interface features for different search tasks in the biomedical domain. The workshop took place at Google's main campus in Mountain View, California. My colleague Alyona presented our work.
||Guest editor for CTS 2011 special issue|
I am guest editor for CTS 2011 special issue in Elsevier Future Generation Computer Systems. Here is the: CFP. Abstract Submission Deadline: October 10 & Full Manuscript Submission Deadline: December 19. I look forward to an interesting special issue!
I joined Pingar as Senior Software Researcher. I look forward to working on text analytics, improving search systems and interface usability research with immediate real world applications. I am starting this exciting research post in the Silicon Valley office but I will soon head to Auckland for at least several months.
||Conflicting biomedical assumptions for mathematical modeling: The case of cancer metastasis|
Our paper was accepted by PLoS Computational Biology: Conflicting biomedical assumptions for mathematical modeling: The case of cancer metastasis by Anna Divoli, Eneida Mendonca, James Evans and Andrey Rzhetsky
In this paper individual viewpoints from 28 experts in clinical or molecular aspects of cancer metastasis were harvested and summarized computationally. Detailed analysis of the data reveals areas of disagreement and a range of opinions on underlying causes and processes in metastasis.
||Invited talk @ CoHeB / CTS|
I attended CTS (International Conference on Collaboration Technologies and Systems) in Philadelphia. The meeting covered a large range of Collaboration Web technologies, Human Factors and HCI topics I am interested in. I was invited to give a talk at CoHeB (Workshop on Collaboration Technologies and Systems in Healthcare and Biomedical Fields) as part of CTS. My talk was entitled: "Expert Opinions in Cancer Metastasis: Harvesting Knowledge from Uncertainty and Discrepancies"
I visited the EBI interfaces group at the European Bioinformatics Institute in Hinxton. It was great to discuss usability and HCI issues for bioinformatics and computational biology with several scientists there. During my visit, I gave a talk on: "Human factors in computational biology - from mathematical models to user interfaces" Special thanks to my host: Francis Rowland
I visited the Cardiff School of Computer Science & Informatics, where a gave a talk on: "Expert opinions in cancer metastasis: Uncertainty, discrepancies, range and models" and had wonderful conversations with several faculty members and other researchers there that work on biodiversity informatics, ontologies, and medical informatics. Special thanks to my host: Irena Spasic
||Benchmarking Ontologies, Bigger vs Better @ PLoS Computational Biology|
One of our group's papers was recently accepted by PLoS Computational Biology: Benchmarking Ontologies, Bigger vs Better by Lixia Yao, Anna Divoli, Ilya Mayzus, James Evans and Andrey Rzhetsky
In this paper we introduce a family of ontology metrics and we test them on four medical ontologies and seven popular English thesauri.
||Bio-Interfaces Google Group|
Together with some of the Interfaces people at EBI (if you haven't already, check out their fantastic site: http://ebiinterfaces.wordpress.com ), I have started a Google Group "Biological Interfaces".
I attended ISMB in Boston. I presented a poster: "Considering alternative views when modeling cancer metastasis".
Last updated: Oct 2013