Monday, May 30, 2016 : 9.45 — 10.35
Invited Talk: “IBM Watson Grand Challenge: Lesson Learned in a Machine Learning and deep NLP perspective“. Alessandro Moschitti, Qatar Computing Research Institute, HBKU.
Abstract: On February 2011, Watson, an advanced Question Answering (QA) system developed at IBM, defeated two all-time human champions of the most famous America’s Quiz show, Jeopardy!. Watson’s core technology is based on deep linguistic analysis, which allows it to outperform human experts on very complex and articulated questions. However, since then, it has become clear that translating this technology from a game setting into real-world applications presents other complex challenges, e.g., porting the developed rules and features to other domains and languages for training the system and, most importantly, serving a real-user need. In this talk, after introducing Watson with an overview of its technology, I will show some developments of its successful solutions applied to real-world applications, such as Community QA. In particular, I will focus on: (i) supervised machine learning methods, e.g., for answer passage re-ranking; (ii) text representations, ranging from word sequences to syntactic/semantic relational structures; (iii) powerful approaches for automatic feature engineering using structural kernels, e.g., for modeling the relations between questions and answers; and finally, (iv) deep learning models and their combination with kernels for answer passage/comment selection. I will show that the methods above constitute the current state of the art, at least, according to the most important academic benchmarks of QA.
Bio: Alessandro Moschitti is a Principal Research Scientist of the Qatar Computing Research Institute (QCRI), within the Hamad Bin Khalifa University and a professor of the CS Department of the University of Trento, Italy. He obtained his PhD in NLP from the University of Rome in 2003. He has worked as a researcher associate at the University of Texas at Dallas and as a visiting professor of the Universities of Columbia, Colorado at Boulder, John Hopkins and of the Massachusetts Institute of Technology. Moreover, he has been a visiting research scientist of the IBM Watson Research center of NY, where he participated to the Jeopardy! Challenge with IBM Watson. His expertise concerns theoretical and applied Machine Learning in the areas of NLP, Information Retrieval and Data Mining. He has devised innovative kernels within support vector and other kernel-based machines for advanced syntactic/semantic processing, documented in more than 230 scientific articles published in major venues. He has been the General Chair of EMNLP 2014 and the Program co-Chair of CoNLL 2015. He has been PI for several European and USA projects. He is currently the PI (QCRI side) of a large collaboration project with MIT-CSAIL. He has received four IBM Faculty awards, one Google Faculty award, five best paper awards and a best researcher award from the Trento University.
Monday, May 30, 2016 : 14.00 — 14.50
Invited Talk: “Genomic Computing: Making sense of the Signals from the Genome“. Stefano Ceri, Politecnico di Milano. [Slides]
Abstract: Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US $; many algorithms exist for the extraction of genome features, or “signals”, including peaks (enriched regions), mutations, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a “biological meaning” to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development. The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant currently undergoing the contract preparation) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments; the research group of DEIB is among the few which are centering their focus on genomic data integration. Starting from an abstract model, we already developed a system for query processing that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark, Flink, and SciDB data engines, and prototypes can already be accessed from Cineca servers or be downloaded from PoliMi servers. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient. Most pathologies have a genetic component, hence a system which is capable of integrating “big data” of genomics is of paramount importance. Among the objectives of the project, the creation of an “open source” system available to biological and clinical research; while the project will provide public services which only use public data (anonymized and made available for secondary use, i.e., knowledge discovery), the use of system within protected clinical contexts will enable personalized medicine, i.e. the adaptation of therapies to specific needs of patients. The most ambitious objective is the development, during the 5-years ERC project, of an “Internet for Genomics”, i.e. a protocol for collecting data from Consortia and individual researchers, and a “Google for Genomics”, supporting indexing and search over huge collections of genomic datasets.
Bio: Stefano Ceri is Professor at the Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) of Politecnico di Milano. He obtained his Dr. Eng. Degree from Politecnico di Milano in July 1978. He was Visiting Professor at the Computer Science Department of Stanford University (1983-1990), Chairman of the Computer Science Section of DEI (1992-2004), Director of Alta Scuola Politecnica (ASP) of Politecnico di Milano and Politecnico di Torino (2010-2013). His research work covers four decades (1976-2016) and has been generally concerned with extending database technology to incorporate new features: distribution, object-orientation, rules, streaming data, crowd-based and genomic computing. He has been awarded two advanced ERC Grants on “Search Computing” (2008-2013) and “Genomic Computing” (2016-2021). He is co-founder (2001) of WebRatio (http://www.webratio.com/). He is the recipient of the ACM-SIGMOD “Edward T. Codd Innovation Award” (2013), an ACM Fellow and member of the Academia Europaea.
Tuesday, May 31, 2016 : 9.30 — 10.20
Invited Talk: “Succinct data structures in Information Retrieval“. Rossano Venturini, University of Pisa. [Slides]
Abstract: Succinct data structures are used in many information retrieval applications, e.g., posting lists representation, language model representation, indexing (social) graphs, query auto-completion, document retrieval and indexing dictionary of strings, just to mention the most recent ones. These new kind of data structures mimic the operations of their classical counterparts within a comparable time complexity but requiring much less space. The talk will introduce this field of research by presenting the most important succinct data structures with their most important applications to Information Retrieval problems.
Bio: Rossano Venturini is is a researcher at Computer Science Department, University of Pisa. He received his Ph.D. from the Computer Science Department of the University of Pisa in 2010 discussing his thesis titled “On Searching and Extracting Strings from Compressed Textual Data”. His research interests are mainly focused on the design and the analysis of algorithms and data structures with special attention to problems of indexing and searching large collections. He received the Best Italian PhD thesis award in Theoretical Computer Science of the Italian Chapter of EATCS in 2012, and two Best Paper awards at ACM SIGIR in 2014 and 2015.
Tuesday, May 31, 2016 : 13.40 — 14.30
Invited Talk: “Recent Developments in Natural Language Parsing“. Giorgio Satta, University of Padua. [Slides]
Abstract: Parsing is the task of retrieving a syntactic structure for an input string/sentence. In the context of natural language processing, the parser is an important component in many end-to-end systems, such as question answering, information extraction, machine translation, and text summarization. In this presentation I will provide an overview of current research in natural language parsing. I will introduce and discuss formalisms, algorithms, methodologies and resources that have been adopted or newly developed in this area in the last ten years. I will also outline future research directions that might become prominent for this community in the years to come.
Bio: Giorgio Satta received a Ph.D. in Computer Science in 1990 from University of Padua, Italy. He is currently a full professor at the Department of Information Engineering, University of Padua. His main research interests are in computational linguistics, mathematics of language and formal language theory. He has served as chair of the European Chapter of the Association for Computational Linguistics (EACL) for years 2009-10. He has joined the standing committee of the Formal Grammar conference and the editorial boards of the journals Computational Linguistics, Grammars, Research on Language and Computation, and Transaction of the Association for Computational Linguistics. He has also served as program committee chair for the Annual Meeting of the Association for Computational Linguistics (ACL) in 2001.