13 novembre 2014 à 14h00 à l'Aquarium du LIG
Patent Machine Translation (Handling large data with Moses)
Presenter: Bruno Pouliquen
Abstract: WIPO is in charge of Intellectual Property, one of the main activity is about patent information. Patents are usually available in several languages, therefore we develop our own tool – called Tapta – built with Moses open-source. WIPO can use huge corpora (22 time more English texts than the whole Wikipedia), we built specific tools to extract parallel texts, as a result some of our bitexts have a world-class size (eg. Chinese-English: 2 Billion words). We will explain what are the specificities of Patent texts and how we managed to scale our models so that they can effectively run on huge models. We will also highlight some of the choices we made to be able to manage several models and installations at various places (most of them daily used in production environment), with a reduced team (less than 2 persons). Finally we will talk about various user interfaces we built and the user acceptance.
Bio: Bruno Pouliquen is a senior software engineer specialized in patent machine translation working at the World Intellectual Property Organization (WIPO) in Geneva since 2009. He owns a PhD in computer science (Faculty of Rennes, France, 2002) and specialized later in multilingual text mining (Joint Research Centre of the European Commission, Italy, 2001-2009). He published more than 50 scientific papers in the computational linguistic domain between 1995 and 2014. Bruno’s position at WIPO includes exploring Statistical Machine translation applied to the Patent domain and providing his tool (built around Moses) to other services and international organizations (including United Nations in New York). The tool is now used in production in various places, both for assimilation (gist translation) and for dissemination (translation accelerator).
27 octobre 2014 à 14h00 à l'amphithéatre F107, INRIA Grenoble Rhône-Alpes, Montbonnot
Learning and aligning ontologies: methodologies and algorithms
Presenters:Kate Cerqueira Revoredo and Fernanda Baião
Federal University of the State of Rio de Janeiro (UNIRIO)
Abstract: Typically, an ontology formalizes a number of inter-related concepts in a domain of discourse. It represents a crucial artifact for several applications, such as semantic data integration, systems interoperability and knowledge management. However, since manually defining such ontologies is a complex, time consuming and error-prone task, there are a number of research efforts towards (semi)automatically learning ontologies. Furthermore, concepts and relationships of the domain of discourse evolve over time, thus the corresponding ontology must evolve as well and automatic mechanisms are also demanded to improve the ontology evolution process. From another perspective, since several ontologies typically exist representing the same domain, there is a need of establishing an alignment among them. Ontology Matching is a very active research area that essentially aims to automatically establish correspondences between entities of two ontologies. Despite the research efforts in these three
areas, there are still interesting challenges to be solved. In this talk we will discuss approaches that are being developed in our research group for automatically learning and revising an ontology using theory revision techniques, as well as ontology matching approaches that consider user feedback within an active-learning technique and the development and reuse of correspondence antipatterns.
Bio: Kate Cerqueira Revoredo is a Professor at Applied Informatics Department of the Federal University of the State of Rio de Janeiro, Brazil. She obtained her M.Sc and D.Sc. degree in Computer Science from Federal University of Rio de Janeiro (COPPE-UFRJ). During the year 2006 she worked for six months at the University of Freiburg, Freiburg (Germany) as part of her D.Sc. research. Her experience and research work focus on machine learning, data mining, social media analytics and ontology alignment. She participates in several program committees of national and international jornals and conferences, and is a member of the Brazilian Computer Society and the Special Comission in Artificial Intelligence.
Fernanda Baião is an Associate Professor of the Department of Applied Informatics at the Federal University of the State of Rio de Janeiro (UNIRIO) since 2004. She worked at the University of Wisconsin, Madison (USA) as a visiting student in 2000. Her current research interests include conceptual modelling, well-founded representation languages, information architecture, data management in distributed and parallel environments, and scientific workflows. Since 2009 she has been one of the research members of the Brazilian WebScience Research Institute. She participates in research projects funded by brazilian and international agencies.
6 octobre 2014 à 10h00 à l'amphitéâtre de la MJK
A Personalised Recommendation System for Context-Aware Suggestions,
Presenter: Fabio Crestani University of Lugano (USI) Switzerland
Abstract: The recently introduced TREC Contextual Suggestion track addresses the problem of suggesting contextually relevant places to a user visiting a new city based on his/her preferences and the location of the new city. In this talk I will frame the problem of representing and using context and will introduce a new and more sophisticated approach to constructing user profiles for that track in order to provide more accurate and relevant recommendations. The results show that our system not only significantly outperforms the TREC 2013 Contextual Suggestion track baseline method (this year results are not our yet), but also performs very well in comparison to other runs submitted to that track, managing to achieve the best results in nearly half of all test contexts.
Towards a Game-Theoretic Framework for Information Retrieval
Presenter Cheng Zhaï University of Illinois USA
Abstract: The task of information retrieval (IR) has traditionally been defined as to rank a collection of documents in response to a query. While this definition has enabled most research progress in IR so far, it does not model accurately the actual retrieval task in a real IR application, where users tend to be engaged in an interactive process with multipe queries, and optimizing the overall performance of an IR system on an entire search session is far more important than its performance on an individual query. In this talk, I will present a new game-theoretic formulation of the IR problem where the key idea is to model information retrieval as a process of a search engine and a user playing a cooperative game, with a shared goal of satisfying the user's information need while minimizing the user's effort and the resource overhead on the retrieval system. Such a game-theoretic framework offers several benefits. First, it naturally suggests optimization of the overall utility of an interactive retrieval system over a whole search session, thus breaking the limitation of the traditional formulation that optimizes ranking of documents for a single query. Second, it models the interactions between users and a search engine, and thus can optimize the collaboration of a search engine and its users, maximizing the "combined intelligence" of a system and users. Finally, it can potentially serve as a unified framework for optimizing both interactive information retrieval and active relevance judgment acquisition through crowdsourcing. I will discuss how the new framework can not only cover several emerging directions in current IR research as special cases, but also open up many interesting new research directions in IR.