publications
2019

NAACL
Competencebased Curriculum Learning for Neural Machine Translation. Platanios, Emmanouil Antonios, Stretcu, Otilia, Neubig, Graham, Póczos, Barnabás, and Mitchell, Tom In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2019 Abstract PDF
Current stateoftheart NMT systems use large neural networks that are not only slow to train, but also often require many heuristics and optimization tricks, such as specialized learning rate schedules and large batch sizes. This is undesirable as it requires extensive hyperparameter tuning. In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. Our framework consists of a principled way of deciding which training samples are shown to the model at different times during training, based on the estimated difficulty of a sample and the current competence of the model. Filtering training samples in this manner prevents the model from getting stuck in bad local optima, making it converge faster and reach a better solution than the common approach of uniformly sampling training examples. Furthermore, the proposed method can be easily applied to existing NMT models by simply modifying their input data pipelines. We show that our framework can help improve the training time and the performance of both recurrent neural network models and Transformers, achieving up to a 70% decrease in training time, while at the same time obtaining accuracy improvements of up to 2.2 BLEU.
2018

EMNLP
Contextual Parameter Generation for Universal Neural Machine Translation. Platanios, Emmanouil Antonios, Sachan, Mrinmaya, Neubig, Graham, and Mitchell, Tom In Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018 Abstract PDF Poster Slides
We propose a simple modification to existing neural machine translation (NMT) models that enables using a single universal model to translate between multiple languages while allowing for language specific parameterization, and that can also be used for domain adaptation. Our approach requires no changes to the model architecture of a standard NMT system, but instead introduces a new component, the contextual parameter generator (CPG), that generates the parameters of the system (e.g., weights in a neural network). This parameter generator accepts source and target language embeddings as input, and generates the parameters for the encoder and the decoder, respectively. The rest of the model remains unchanged and is shared across all languages. We show how this simple modification enables the system to use monolingual data for training and also perform zeroshot translation. We further show it is able to surpass stateoftheart performance for both the IWSLT15 and IWSLT17 datasets and that the learned language embeddings are able to uncover interesting relationships between languages.

CACM
NeverEnding Learning. Mitchell, Tom M, Cohen, William W, Hruschka Jr, Estevam R, Pratim Talukdar, Partha, Yang, Bishan, Betteridge, Justin, Carlson, Andrew, Dalvi, Bhanava, Gardner, Matt, Kisiel, Bryan, Krishnamurthy, Jayant, Lao, Ni, Mazaitis, Kathryn, Mohamed, Thahir P, Nakashole, Ndapakula, Platanios, Emmanouil A, Ritter, Alan, Samadi, Mehdi, Settles, Burr, Wang, Richard C, Wijaya, Derry, Gupta, Abhinav, Chen, Xinlei, Saparov, Abulhair, Greaves, Malcolm, and Welling, Joel In Communications of the ACM 2018 Abstract PDF
Whereas people learn many different types of knowledge from diverse experiences over many years, and become better learners over time, most current machine learning systems are much more narrow, learning just a single function or data model based on statistical analysis of a single data set. We suggest that people learn better than computers precisely because of this difference, and we suggest a key direction for machine learning research is to develop software architectures that enable intelligent agents to also learn many types of knowledge, continuously over many years, and to become better learners over time. In this paper we define more precisely this neverending learning paradigm for machine learning, and we present one case study: the NeverEnding Language Learner (NELL), which achieves a number of the desired properties of a neverending learner. NELL has been learning to read the Web 24hrs/day since January 2010, and so far has acquired a knowledge base with 120mn diverse, confidenceweighted beliefs (e.g., servedWith(tea,biscuits)), while learning thousands of interrelated functions that continually improve its reading competence over time. NELL has also learned to reason over its knowledge base to infer new beliefs it has not yet read from those it has, and NELL is inventing new relational predicates to extend the ontology it uses to represent beliefs. We describe the design of NELL, experimental results illustrating its behavior, and discuss both its successes and shortcomings as a case study in neverending learning. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.

arXiv
Agreementbased Learning. Platanios, Emmanouil A In arXiv (1806.01258) 2018 Abstract PDF
Model selection is a problem that has occupied machine learning researchers for a long time. Recently, its importance has become evident through applications in deep learning. We propose an agreementbased learning framework that prevents many of the pitfalls associated with model selection. It relies on coupling the training of multiple models by encouraging them to agree on their predictions while training. In contrast with other model selection and combination approaches used in machine learning, the proposed framework is inspired by human learning. We also propose a learning algorithm defined within this framework which manages to significantly outperform alternatives in practice, and whose performance improves further with the availability of unlabeled data. Finally, we describe a number of potential directions for developing more flexible agreementbased learning algorithms.

arXiv
Deep Graphs. Platanios, Emmanouil A, and Smola, Alex In arXiv (1806.01235) 2018 Abstract PDF
We propose an algorithm for deep learning on networks and graphs. It relies on the notion that many graph algorithms, such as PageRank, WeisfeilerLehman, or Message Passing can be expressed as iterative vertex updates. Unlike previous methods which rely on the ingenuity of the designer, Deep Graphs are adaptive to the estimation problem. Training and deployment are both efficient, since the cost is O(E+V), where E and V are the sets of edges and vertices respectively. In short, we learn the recurrent update functions rather than positing their specific functional form. This yields an algorithm that achieves excellent accuracy on both graph labeling and regression tasks.
2017

arXiv
Active Learning amidst Logical Knowledge. Platanios, Emmanouil A, Kapoor, Ashish, and Horvitz, Eric In arXiv (1709.08850) 2017 Abstract PDF
Structured prediction is ubiquitous in applications of machine learning such as knowledge extraction and natural language processing. Structure often can be formulated in terms of logical constraints. We consider the question of how to perform efficient active learning in the presence of logical constraints among variables inferred by different classifiers. We propose several methods and provide theoretical results that demonstrate the inappropriateness of employing uncertainty guided sampling, a commonly used active learning method. Furthermore, experiments on ten different datasets demonstrate that the methods significantly outperform alternatives in practice. The results are of practical significance in situations where labeled data is scarce.

NIPS
Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach. Platanios, Emmanouil A, Poon, Hoifung, Horvitz, Eric, and Mitchell, Tom M In Neural Information Processing Systems 2017 Abstract PDF Supplementary Poster arXiv
We propose an efﬁcient method to estimate the accuracy of classiﬁers using only unlabeled data. We consider a setting with multiple classiﬁcation problems where the target classes may be tied together through logical constraints. For example, a set of classes may be mutually exclusive, meaning that a data instance can belong to at most one of them. The proposed method is based on the intuition that: (i) when classiﬁers agree, they are more likely to be correct, and (ii) when the classiﬁers make a prediction that violates the constraints, at least one classiﬁer must be making an error. Experiments on four realworld data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing stateoftheart solutions in both estimating accuracies, and combining multiple classiﬁer outputs. The results emphasize the utility of logical constraints in estimating accuracy, thus validating our intuition.
2016

ICML
Estimating Accuracy from Unlabeled Data: A Bayesian Approach. Platanios, Emmanouil A, Dubey, Avinava, and Mitchell, Tom M In International Conference in Machine Learning 2016 Abstract PDF Supplementary Poster Slides
We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers, and the related question of how outputs from several classifiers performing the same task can be combined based on their estimated accuracies. To answer these questions, we first present a simple graphical model that performs well in practice. We then provide two nonparametric extensions to it that improve its performance. Experiments on two realworld data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing stateoftheart solutions in both estimating accuracies, and combining multiple classifier outputs.
2015

AAAI
NeverEnding Learning. Mitchell, Tom M, Cohen, William W, Hruschka Jr, Estevam R, Pratim Talukdar, Partha, Betteridge, Justin, Carlson, Andrew, Dalvi, Bhanava, Gardner, Matt, Kisiel, Bryan, Krishnamurthy, Jayant, Lao, Ni, Mazaitis, Kathryn, Mohamed, Thahir P, Nakashole, Ndapakula, Platanios, Emmanouil A, Ritter, Alan, Samadi, Mehdi, Settles, Burr, Wang, Richard C, Wijaya, Derry, Gupta, Abhinav, Chen, Xinlei, Saparov, Abulhair, Greaves, Malcolm, and Welling, Joel In Association for the Advancement of Artificial Intelligence 2015 Abstract PDF
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a neverending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the NeverEnding Language Learner (NELL), which achieves some of the desired properties of a neverending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidenceweighted beliefs (e.g., servedWith(tea, biscuits)). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.

CMU
Platanios, E. A. (2015). Estimating Accuracy from Unlabeled Data. Master’s Thesis at Carnegie Mellon University. Abstract PDF Poster
We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classiﬁers. This is an important question for any autonomous learning system that must estimate its accuracy without supervision, and also when classiﬁers trained from one data distribution must be applied to a new distribution (e.g., document classiﬁers trained on one text corpus are to be applied to a second corpus). We ﬁrst show how to estimate error rates exactly from unlabeled data when given a collection of competing classiﬁers that make independent errors, based on the agreement rates between subsets of these classiﬁers. We further show that even when the competing classiﬁers do not make independent errors, both their accuracies and error dependencies can be estimated by making certain relaxed assumptions. We then present an alternative approach based on graphical models that also allows us to combine the outputs of the classiﬁers into a single output label. A simple graphical model is introduced that performs well in practice. Then, two nonparametric extensions to it are presented, that signiﬁcantly improve its performance. Experiments on two realworld data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. We also obtain results demonstrating our graphical model approaches beating alternative methods for combining the classiﬁers’ outputs. These results are of practical signiﬁcance in situations where labeled data is scarce and shed light on the more general question of how the consistency among multiple functions is related to their true accuracies.
2014

UAI
Estimating Accuracy from Unlabeled Data. Platanios, Emmanouil A, Blum, Avrim, and Mitchell, Tom M In Conference on Uncertainty in Artificial Intelligence 2014 Abstract PDF Addendum Poster
We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers. This is an important question for any autonomous learning system that must estimate its accuracy without supervision, and also when classifiers trained from one data distribution must be applied to a new distribution (e.g., document classifiers trained on one text corpus are to be applied to a second corpus). We first show how to estimate error rates exactly from unlabeled data when given a collection of competing classifiers that make independent errors, based on the agreement rates between subsets of these classifiers. We further show that even when the competing classifiers do not make independent errors, both their accuracies and error dependencies can be estimated by making certain relaxed assumptions. Experiments on two data realworld data sets produce estimates within a few percent of the true accuracy, using solely unlabeled data. These results are of practical significance in situations where labeled data is scarce and shed light on the more general question of how the consistency among multiple functions is related to their true accuracies.

PAMI
Gaussian ProcessMixture Conditional Heteroscedasticity. Platanios, Emmanouil A, and Chatzis, Sotirios P In IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 Abstract PDF
Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in ﬁnancial return series. In this paper, we propose an alternative approach based on methodologies widely used in the ﬁeld of statistical machine learning. Speciﬁcally, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a Gaussian processmixture conditional heteroscedasticity (GPMCH) model for volatility modeling in ﬁnancial return series. We impose a nonparametric prior with powerlaw nature over the distribution of the model mixture components, namely the PitmanYor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copulabased approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated GPMCH model. We evaluate the efﬁcacy of our approach in a number of benchmark scenarios, and compare its performance to stateoftheart methodologies.
2012

NIPS
Nonparametric Mixtures of MultiOutput Heteroscedastic Gaussian Processes for Volatility Modeling. Platanios, Emmanouil A, and Chatzis, Sotirios P In Neural Information Processing Systems Workshop on Modern Nonparametric Methods in Machine Learning 2012 Abstract PDF Poster
In this work, we present a nonparametric Bayesian method for multivariate volatility modeling. Our approach is based on postulation of a novel mixture of multioutput heteroscedastic Gaussian processes to model the covariance matrices of multiple assets. Speciﬁcally, we use the PitmanYor process prior as the non parametric prior imposed over the components of our model, which are taken as multioutput heteroscedastic Gaussian processes obtained by introducing appropriate convolution kernels that combine simple heteroscedastic Gaussian processes under a multioutput scheme. We exhibit the efﬁcacy of our approach in a volatility prediction task.