Head of Research & Development (Dec 2013 - )
Chief Research Officer (May-Dec 2013)
Senior Software Researcher (2011-2013)
Pingar Research, Pingar

Postdoctoral Scholar (2007-2011)
Andrey Rzhetsky's Group, Department of Medicine and Institute for Genomics and Systems Biology, The University of Chicago

Postdoctoral Scholar (2006-2007)
Marti Hearst's BioText Group, School of Information, University of California, Berkeley

PhD Student (2002-2006)
Teresa Attwood's Protein Sequence Analysis Group, The University of Manchester

Contact Information

Email: annadivoli@gmail.com

Research + Bio

Research Interests

My research is in the area of text mining.

I have focused on developing and evaluating text mining systems - both algorithms and user interfaces. More specifically I have worked on identifying information for database annotation, entity extraction and entity relations extraction from text, summarization, document clustering and categorization as well as identifying useful features in computer interfaces that allow end users to communicate efficiently with text mining systems. All aim at making the information hunt in electronic documents faster and more efficient.

Other areas I am interested in include knowledge representation and ontology design, the use of crowdsourcing for biomedical and linguistic studies, and studying the behavior of experts (in particular biomedical scientists), such as the language and media they use to communicate their ideas and findings. I have also combined knowledge acquisition with text mining to compose a better understanding of a specific domain in relation to the beliefs of experts.

Academic Background

2007-2011: postdoc at the Institute for Genomics and Systems Biology and the Department of Medicine in The University of Chicago with the Rzhetsky Group, PI: Prof Andrey Rzhetsky

2006-2007: postdoc at the School of Information in the University of California, Berkeley with the BioText Group, PI: Prof Marti Hearst

2002-2006: PhD at The University of Manchester with the Protein Sequence Analysis Group, PI: Prof Teresa Attwood
Thesis: "Biomedical Text Mining Approaches: Applications in Protein Family Annotation"

2000-2001: MSc in Biosystems and Informatics at The University of Liverpool
Thesis: "Applications of Computational Linguistics for the Investigation of the Functional Genomics of Eukaryotic Transcription Factors"
Dissertation (mini thesis): "Applications of Ideas from Linguistics to Functional Genomics: Formal Grammars for Biological Sequences"

1996-1999: BSc in Biomedical Sciences at The Manchester Metropolitan University



Short Papers, Abstracts, Posters, etc

    Mitchell, A.L., Bradley, P., Divoli, A. and Attwood, T.K. (2004) Sequence Analysis Workshop, MIPNETS training meeting, Liverpool, UK (Workshop Organised)

Other Selected Talks (not listed above)

    Divoli, A. (2008) Usability, Interfaces and Text Mining (work with Wooldridge, M.A. and Hearst, M.A.), Dagstuhl seminar "Ontologies and text mining for Life Sciences-Current Status and Future Perspectives", Germany (Invited Session)


Oct CS4HS @ Unitec
I presented a session on Findability and usability: lessons learnt from text analytics at Google's Computer Science for High Schools Workshop held at Unitec in Auckland. Many thanks to the organizers Mahsa Mohaghegh and Hossein Sarrafzadeh.

Jul Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures
Our review paper Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures, by Olena Medelyan, Ian Witten, Anna Divoli and Jeen Broesktra was published in WIREs Data Mining and Knowledge Discovery. In this paper we outline current research on creating abstract, structured, representations of knowledge such as lexicons, taxonomies and ontologies, automatically from document collections.

Our paper Constructing a Focused Taxonomy from a Document Collection by Olena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, Anna Lan Huang and Ian Witten was accepted at ESWC. I represented the team at the conference that was held in Montpellier, France.

Apr ShareFEST
I attended ShareFEST in Philadelphia. I presented our work Extracting and Mapping SharePoint Content to Create a Custom Taxonomy in the session: Implementation Tools & Technologies

I attended Text Analytics World in San Francisco. I presented our work on "Automatic Taxonomy Generation for a News Group"

Feb Kiwi Foo Camp
I was very happy to be invited back at Kiwi Foo Camp. Much like last year, I had a great time meeting and conversing with several amazing people!


I attended the Text Analytics World and HCIR in Boston. My TAW talk in the Text Analytics and Taxonomy session was entitled "How taxonomies and facets bring end-users closer to big data". I presented results from 4 user studies exploring end-user preferences for taxonomies and facets - both in terms of content and appearance. Taxonomies and facets can be generated automatically but user preferences and needs can (and should) be taken into account.

Aug CUES at EuroHCIR
Over the past few months, Matthew Pike, Max Wilson, Alyona Medelyan and I have been working on a system for evaluating user interfaces of websites and desktop applications. We presented this work: CUES: Cognitive Usability Evaluation System at EuroHCIR 2012 in Nijmegen, Netherlands, as a paper and a demo poster. CUES integrates brain derived signals and emotions with other common usability measures, such as interaction logs, screen capture, and think aloud.

Jun Do Peers See More in a Paper than its Authors?
Our paper "Do Peers See More in a Paper than its Authors?" by Anna Divoli, Preslav Nakov and Marti A. Hearst was accepted by Advances in Bioinformatics in a special issue on Literature Mining Solutions for Life Science Research. Citation sentences (or citances) are produced by peers and represent subjective points of interest of a paper. In this paper we focus in the area of molecular interactions and compare the content of abstracts (containing the main points of a paper as judged by the authors) and the content of citances (containing the main points of a paper as judged by peers). We use MeSH terms to annotate the content and present a detailed summary of the differences across different information types represented in abstracts and citances - we also examine the effects of other citations and time. We propose that collectively the content of these citances can be used for automatic annotation (assigning relationships among biological entities and concepts) and for a number of NLP tasks such as producing summaries.

Feb O'Reilly Strata Conference 2012
My colleague, Alyona Medelyan, and I presented "Mining Unstructured Data: Practical Applications" at the Data Science Session of O'Reilly's Strata Conference, a leading conference for data scientists and analysts. It was a great event with interesting, busy program and fantastic speakers.

Feb Kiwi Foo Camp
What an honor, 3 months in New Zealand and I found myself invited at Kiwi Foo Camp, where I met so many intelligent, driven people from all kinds of fields! The meeting took place in Warkworth, the sessions were very versatile, the conversations lively and inspiring, and the werewolves vicious!

Feb Interview with O'Reilly Radar
Alyona Medelyan and I were interviewed by the O'Reilly Radar: "Unstructured data is worth the effort when you've got the right tools". The interview was also featured in Forbes: "Unlocking Opportunities in Messy Data"


Oct Search interface feature evaluation in biosciences @ HCIR
Our paper, Search interface feature evaluation in biosciences by Anna Divoli and Alyona Medelyan, was accepted for full presentation at the HCIR 2011 Workshop. In this paper we report findings on desirable interface features for different search tasks in the biomedical domain. The workshop took place at Google's main campus in Mountain View, California. My colleague Alyona presented our work.

Sep Guest editor for CTS 2011 special issue
I am guest editor for CTS 2011 special issue in Elsevier Future Generation Computer Systems. Here is the: CFP. Abstract Submission Deadline: October 10 & Full Manuscript Submission Deadline: December 19. I look forward to an interesting special issue!

Sep Pingar
I joined Pingar as Senior Software Researcher. I look forward to working on text analytics, improving search systems and interface usability research with immediate real world applications. I am starting this exciting research post in the Silicon Valley office but I will soon head to Auckland for at least several months.

Jun Conflicting biomedical assumptions for mathematical modeling: The case of cancer metastasis
Our paper was accepted by PLoS Computational Biology: Conflicting biomedical assumptions for mathematical modeling: The case of cancer metastasis by Anna Divoli, Eneida Mendonca, James Evans and Andrey Rzhetsky
In this paper individual viewpoints from 28 experts in clinical or molecular aspects of cancer metastasis were harvested and summarized computationally. Detailed analysis of the data reveals areas of disagreement and a range of opinions on underlying causes and processes in metastasis.

May Invited talk @ CoHeB / CTS
I attended CTS (International Conference on Collaboration Technologies and Systems) in Philadelphia. The meeting covered a large range of Collaboration Web technologies, Human Factors and HCI topics I am interested in. I was invited to give a talk at CoHeB (Workshop on Collaboration Technologies and Systems in Healthcare and Biomedical Fields) as part of CTS. My talk was entitled: "Expert Opinions in Cancer Metastasis: Harvesting Knowledge from Uncertainty and Discrepancies"

Apr EBI visit
I visited the EBI interfaces group at the European Bioinformatics Institute in Hinxton. It was great to discuss usability and HCI issues for bioinformatics and computational biology with several scientists there. During my visit, I gave a talk on: "Human factors in computational biology - from mathematical models to user interfaces" Special thanks to my host: Francis Rowland

Apr Cardiff visit
I visited the Cardiff School of Computer Science & Informatics, where a gave a talk on: "Expert opinions in cancer metastasis: Uncertainty, discrepancies, range and models" and had wonderful conversations with several faculty members and other researchers there that work on biodiversity informatics, ontologies, and medical informatics. Special thanks to my host: Irena Spasic


Nov Benchmarking Ontologies, Bigger vs Better @ PLoS Computational Biology
One of our group's papers was recently accepted by PLoS Computational Biology: Benchmarking Ontologies, Bigger vs Better by Lixia Yao, Anna Divoli, Ilya Mayzus, James Evans and Andrey Rzhetsky
In this paper we introduce a family of ontology metrics and we test them on four medical ontologies and seven popular English thesauri.

Aug Bio-Interfaces Google Group
Together with some of the Interfaces people at EBI (if you haven't already, check out their fantastic site: http://ebiinterfaces.wordpress.com ), I have started a Google Group "Biological Interfaces".

Jul ISMB 2010
I attended ISMB in Boston. I presented a poster: "Considering alternative views when modeling cancer metastasis".


