Eneldo Loza Mencía

Welcome to my personal research homepage!

My name is Eneldo Loza Mencía and I am a senior scientist in the fields of machine learning, artificial intelligence and data science. I am currently on parental leave but always open for new endeavours.

I got my PhD at TU Darmstadt and worked there in the group of Knowledge Engineering for quite some years.

I enjoy to investigate topics from theory and real life, to develop novel algorithms, to find end-to-end solutions to real data science problems, and to collaborate with and guide junior researchers and students.

You will find my contact, my research interests, some selected or all publications, research projects, data sets, software and teaching material on this site.

Contact

research@eneldo.net

Main interests

My main research interests are the following:

multi-label classification, exploitation of dependency structures in complex outputs spaces, large scale data, efficient and scalable methods, text classification
interpretable machine learning, rule-learning, rule extraction from neural networks
automatic detection of disease outbreaks, (non-specific) syndromic surveillance
automatic text summarization

To get a better idea of my research topics, take also a look at the cluster titles in the following list of selected publications.

Selected publications

A guided and commented list of some of my publications. The full list is available under publications.

Dynamic classifier chains: predicting positive labels one by one with trees

2022

Eneldo Loza Mencía, Moritz Kulessa, Simon Bohlender, and Johannes Fürnkranz

Tree-Based Dynamic Classifier Chains

Machine Learning Journal, Mar 2022

#multi-label #dynamic classifier chains #decision trees

Abs Bib DOI URL

Classifier chains are an effective technique for modeling label dependencies in multi-label classification. However, the method requires a fixed, static order of the labels. While in theory, any order is sufficient, in practice, this order has a substantial impact on the quality of the final prediction. Dynamic classifier chains denote the idea that for each instance to classify, the order in which the labels are predicted is dynamically chosen. The complexity of a naive implementation of such an approach is prohibitive, because it would require to train a sequence of classifiers for every possible permutation of the labels. To tackle this problem efficiently, we propose a new approach based on random decision trees which can dynamically select the label ordering for each prediction. We show empirically that a dynamic selection of the next label improves over the use of a static ordering under an otherwise unchanged random decision tree model. In addition, we also demonstrate an alternative approach based on extreme gradient boosted trees, which allows for a more target-oriented training of dynamic classifier chains. Our results show that this variant outperforms random decision trees and other tree-based multi-label classification methods. More importantly, the dynamic selection strategy allows to considerably speed up training and prediction.
@article{loza22DCC, author = {Loza Menc{\'{i}}a, Eneldo and Kulessa, Moritz and Bohlender, Simon and F{\"{u}}rnkranz, Johannes}, title = {Tree-Based Dynamic Classifier Chains}, journal = {Machine Learning Journal}, year = {2022}, month = mar, url = {http://arxiv.org/abs/2112.06672}, publisher = {Springer International Publishing}, doi = {10.1007/s10994-022-06162-3}, }

2020

Simon Bohlender, Eneldo Loza Mencía, and Moritz Kulessa

Extreme Gradient Boosted Multi-label Trees for Dynamic Classifier Chains

In Discovery Science - 23rd International Conference, DS 2020, Thessaloniki, Greece, October 19-21, 2020, Proceedings , Oct 2020

#multi-label #dynamic classifier chains #gradient boosting #decision trees

Abs Bib DOI URL

Classifier chains is a key technique in multi-label classification, since it allows to consider label dependencies effectively. However, the classifiers are aligned according to a static order of the labels. In the concept of dynamic classifier chains (DCC) the label ordering is chosen for each prediction dynamically depending on the respective instance at hand. We combine this concept with the boosting of extreme gradient boosted trees (XGBoost), an effective and scalable state-of-the-art technique, and incorporate DCC in a fast multi-label extension of XGBoost which we make publicly available. As only positive labels have to be predicted and these are usually only few, the training costs can be further substantially reduced. Moreover, as experiments on eleven datasets show, the length of the chain allows for a more control over the usage of previous predictions and hence over the measure one want to optimize.
@inproceedings{bohlender20XDCC, author = {Bohlender, Simon and Loza Menc{\'{i}}a, Eneldo and Kulessa, Moritz}, month = oct, title = {Extreme Gradient Boosted Multi-label Trees for Dynamic Classifier Chains}, booktitle = {Discovery Science - 23rd International Conference, {DS} 2020, Thessaloniki, Greece, October 19-21, 2020, Proceedings}, series = {Lecture Notes in Computer Science}, volume = {12323}, year = {2020}, pages = {471--485}, publisher = {Springer International Publishing}, institution = {Knowledge Engineering Group, Technische Universit{\"{a}}t Darmstadt}, url = {https://arxiv.org/abs/2006.08094}, doi = {10.1007/978-3-030-61527-7_31}, }

2018

Moritz Kulessa, and Eneldo Loza Mencía

Dynamic Classifier Chain with Random Decision Trees

In Proceedings of the 21st International Conference of Discovery Science (DS-18) , Limassol, Cyprus, Oct 2018

#multi-label #dynamic classifier chains #random decision trees

Abs Bib DOI URL

Classifier chains is an effective approach in order to exploit label dependencies in multi-label data. However, it has the disadvantages that the chain is chosen at total random or relies on a pre-specified ordering of the labels which is expensive to compute. Moreover, the same ordering is used for every test instance, ignoring the fact that different orderings might be best suited for different test instances. We propose a new approach based on random decision trees (RDT) which can choose the label ordering for each prediction dynamically depending on the respective test instance. RDT are not adapted to a specific learning task, but in contrast allow to define a prediction objective on the fly during test time, thus offering a perfect test bed for directly comparing different prediction schemes. Indeed, we show that dynamically selecting the next label improves over using a static ordering of the labels under an otherwise unchanged RDT model and experimental environment.
@inproceedings{mk:DS-18-DCC-RDT, author = {Kulessa, Moritz and Loza Menc{\'{i}}a, Eneldo}, month = oct, title = {Dynamic Classifier Chain with Random Decision Trees}, booktitle = {Proceedings of the 21st International Conference of Discovery Science (DS-18)}, series = {Lecture Notes in Artificial Intelligence}, volume = {11198}, year = {2018}, pages = {33--50}, publisher = {Springer-Verlag}, location = {Limassol, Cyprus}, isbn = {978-3-030-01771-2}, url = {https://ke-tud.github.io/publications/papers/DS18_DynamicClassifierChains_RDT.pdf}, doi = {10.1007/978-3-030-01771-2}, }

Dynamic classifier chains with deep neural networks: applying sequence learning to the idea of classifier chains

2017

Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J. Kim, and Johannes Fürnkranz

Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification

In Advances in Neural Information Processing Systems 30 (NIPS-17) , 2017

#multi-label #extreme classification #scalability #dynamic classifier chains #recurrent neural networks

Abs Bib Code URL

Multi-label classification is the task of predicting a set of labels for a given input instance. Classifier chains are a state-of-the-art method for tackling such problems, which essentially converts this problem into a sequential prediction problem, where the labels are first ordered in an arbitrary fashion, and the task is to predict a sequence of binary values for these labels. In this paper, we replace classifier chains with recurrent neural networks, a sequence-to-sequence prediction algorithm which has recently been successfully applied to sequential prediction tasks in many domains. The key advantage of this approach is that it allows to focus on the prediction of the positive labels only, a much smaller set than the full set of possible labels. Moreover, parameter sharing across all classifiers allows to better exploit information of previous decisions. As both, classifier chains and recurrent neural networks depend on a fixed ordering of the labels, which is typically not part of a multi-label problem specification, we also compare different ways of ordering the label set, and give some recommendations on suitable ordering strategies.
@inproceedings{jn:NIPS-17-MLC-RNN, author = {Nam, Jinseok and Loza Menc{\'{i}}a, Eneldo and Kim, Hyunwoo J. and F{\"{u}}rnkranz, Johannes}, editor = {Guyon, Isabelle and von Luxburg, Ulrike and Bengio, Samy and Wallach, Hanna M. and Fergus, Rob and Vishwanathan, S. V. N. and Garnett, Roman}, title = {Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification}, booktitle = {Advances in Neural Information Processing Systems 30 (NIPS-17)}, year = {2017}, pages = {5419--5429}, address = {Long Beach, CA}, url = {http://papers.nips.cc/paper/7125-maximizing-subset-accuracy-with-recurrent-neural-networks-in-multi-label-classification}, }

2019

Jinseok Nam, Young-Bum Kim, Eneldo Loza Mencía, Sunghyun Park, Ruhi Sarikaya, and Johannes Fürnkranz

Learning Context-dependent Label Permutations for Multi-label Classification

In Proceedings of the 36th International Conference on Machine Learning (ICML-19) , 2019

#multi-label #dynamic classifier chains #extreme classification #scalability #recurrent neural networks #reinforcement learning

Bib URL

@inproceedings{jn:ICML-19,
  author = {Nam, Jinseok and Kim, Young{-}Bum and Loza Menc{\'{i}}a, Eneldo and Park, Sunghyun and Sarikaya, Ruhi and F{\"{u}}rnkranz, Johannes},
  editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  title = {Learning Context-dependent Label Permutations for Multi-label Classification},
  booktitle = {Proceedings of the 36th International Conference on Machine Learning (ICML-19)},
  series = {Proceedings of Machine Learning Research},
  volume = {97},
  year = {2019},
  pages = {4733--4742},
  publisher = {{PMLR}},
  address = {Long Beach, California, {USA}},
  url = {http://proceedings.mlr.press/v97/nam19a.html},
}

Induction of rule-like disease patterns on symptomes using clinical patient data from emergency departments and time series of reported cases

2022

Michael Rapp, Moritz Kulessa, Eneldo Loza Mencía, and Johannes Fürnkranz

Correlation-Based Discovery of Disease Patterns for Syndromic Surveillance

Frontiers in Big Data, Jan 2022

#rule-learning #syndromic surveillance #pattern discovery #multi-instance learning

Abs Bib DOI URL

Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. As a first step toward the goal to support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an preliminary experimental study, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results show the potential of the proposed approach to find patterns that correlate with the reported infections and to identify indicators that are related to the respective diseases. It also motivates the need for additional measures to overcome practical limitations, such as the requirement to deal with noisy and unbalanced data, and demonstrates the importance of incorporating feedback of domain experts into the learning procedure
@article{rapp22diseasepatterns, author = {Rapp, Michael and Kulessa, Moritz and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, title = {Correlation-Based Discovery of Disease Patterns for Syndromic Surveillance}, journal = {Frontiers in Big Data}, volume = {4}, year = {2022}, month = jan, url = {https://arxiv.org/abs/2110.09208}, doi = {10.3389/fdata.2021.784159}, issn = {2624-909X}, }

Detection of outbreaks of known and unknown diseases with statistical methods and sum-product networks

2021

Moritz Kulessa, Bennet Wittelsbach, Eneldo Loza Mencía, and Johannes Fürnkranz

Sum-Product Networks for Early Outbreak Detection of Emerging Diseases

In Artificial Intelligence in Medicine - 19th International Conference on Artificial Intelligence in Medicine, AIME 2021, Virtual Event, June 15-18, 2021, Proceedings , 2021

#outbreak detection #non-specific syndromic surveillance #sum-product networks

Abs Bib DOI URL

Recent research in syndromic surveillance has focused primarily on monitoring specific, known diseases, concentrating on a certain clinical picture under surveillance. Outbreaks of emerging infectious diseases with different symptom patterns are likely to be missed by such a surveillance system. In contrast, monitoring all available data for anomalies allows to detect any kind of outbreaks, including infectious diseases with yet unknown syndromic clinical pictures. In this work, we propose to model the joint probability distribution of syndromic data with sum-product networks (SPN), which are able to capture correlations in the monitored data and even allow to consider environmental factors, such as the current influenza infection rate. Conversely to the conventional use of SPNs, we present a new approach to detect anomalies by evaluating p-values on the learned model. Our experiments on synthetic and real data with synthetic outbreaks show that SPNs are able to improve upon state-of-the-art techniques for detecting outbreaks of emerging diseases.
@inproceedings{kulessa21NSS-SPN, author = {Kulessa, Moritz and Wittelsbach, Bennet and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, editor = {Tucker, Allan and Henriques Abreu, Pedro and Cardoso, Jaime and Pereira Rodrigues, Pedro and Ria{\~{n}}o, David}, title = {Sum-Product Networks for Early Outbreak Detection of Emerging Diseases}, booktitle = {Artificial Intelligence in Medicine - 19th International Conference on Artificial Intelligence in Medicine, {AIME} 2021, Virtual Event, June 15-18, 2021, Proceedings}, booktitle_short = {Artificial Intelligence in Medicine (AIME)}, series = {Lecture Notes in Computer Science}, volume = {12721}, year = {2021}, pages = {61--71}, publisher = {Springer International Publishing}, isbn = {978-3-030-77211-6}, url = {https://doi.org/10.1007/978-3-030-77211-6\_7}, alt-url = {https://sci-hub.yncjkj.com/https://link.springer.com/chapter/10.1007/978-3-030-77211-6\_7}, doi = {10.1007/978-3-030-77211-6\_7}, }

2021

Moritz Kulessa, Eneldo Loza Mencía, and Johannes Fürnkranz

Revisiting Non-Specific Syndromic Surveillance

In Advances in Intelligent Data Analysis XIX - 19th International Symposium on Intelligent Data Analysis, IDA 2021, Porto, Portugal, April 26-28, 2021, Proceedings , Apr 2021

#outbreak detection #non-specific syndromic surveillance

Bib DOI URL

@inproceedings{kulessa21revisiting,
  author = {Kulessa, Moritz and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes},
  editor = {Henriques Abreu, Pedro and Pereira Rodrigues, Pedro and Fern{\'{a}}ndez, Alberto and Gama, Jo{\~{a}}o},
  month = apr,
  title = {Revisiting Non-Specific Syndromic Surveillance},
  booktitle = {Advances in Intelligent Data Analysis {XIX} - 19th International Symposium on Intelligent Data Analysis, {IDA} 2021, Porto, Portugal, April 26-28, 2021, Proceedings},
  year = {2021},
  pages = {128-140},
  publisher = {Springer International Publishing},
  url = {https://arxiv.org/abs/2101.12246},
  doi = {10.1007/978-3-030-74251-5\_11},
}

Detection of disease outbreak on time series of labelled reported cases with stacking

2020

Moritz Kulessa, Eneldo Loza Mencía, and Johannes Fürnkranz

Improving the Fusion of Outbreak Detection Methods with Supervised Learning

In Computational Intelligence Methods for Bioinformatics and Biostatistics - 16th International Meeting, CIBB 2019, Bergamo, Italy, September 4-6, 2019, Revised Selected Papers , Bergamo, Italy, Dec 2020

#outbreak detection #stacking

Bib DOI

@inproceedings{mk:CIBB2019,
  author = {Kulessa, Moritz and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes},
  keywords = {Fusion, Outbreak Detection, Stacking, Surveillance},
  month = dec,
  title = {Improving the Fusion of Outbreak Detection Methods with Supervised Learning},
  booktitle = {Computational Intelligence Methods for Bioinformatics and Biostatistics - 16th International Meeting, {CIBB} 2019, Bergamo, Italy, September 4-6, 2019, Revised Selected Papers},
  series = {Lecture Notes in Computer Science},
  volume = {12313},
  year = {2020},
  pages = {55--66},
  publisher = {Springer},
  location = {Bergamo, Italy},
  isbn = {978-3-030-63060-7},
  doi = {10.1007/978-3-030-63061-4_6},
}

Learning multilabel rules with boosting: efficient and effective

2021

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, and Eyke Hüllermeier

Gradient-Based Label Binning in Multi-Label Classification

In Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part III , 2021

#multi-label #rule-learning #gradient boosting #efficiency

Abs Bib DOI URL

In multi-label classification, where a single example may be associated with several class labels at the same time, the ability to model dependencies between labels is considered crucial to effectively optimize non-decomposable evaluation measures, such as the Subset 0/1 loss. The gradient boosting framework provides a well-studied foundation for learning models that are specifically tailored to such a loss function and recent research attests the ability to achieve high predictive accuracy in the multi-label setting. The utilization of second-order derivatives, as used by many recent boosting approaches, helps to guide the minimization of non-decomposable losses, due to the information about pairs of labels it incorporates into the optimization process. On the downside, this comes with high computational costs, even if the number of labels is small. In this work, we address the computational bottleneck of such approach – the need to solve a system of linear equations – by integrating a novel approximation technique into the boosting procedure. Based on the derivatives computed during training, we dynamically group the labels into a predefined number of bins to impose an upper bound on the dimensionality of the linear system. Our experiments, using an existing rule-based algorithm, suggest that this may boost the speed of training, without any significant loss in predictive performance.
@inproceedings{rapp21labelbinning, author = {Rapp, Michael and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke}, editor = {Oliver, Nuria and P{\'{e}}rez{-}Cruz, Fernando and Kramer, Stefan and Read, Jesse and Antonio Lozano, Jos{\'{e}}}, keywords = {Gradient boosting, multilabel classification, Rule Learning}, title = {Gradient-Based Label Binning in Multi-Label Classification}, booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, {ECML} {PKDD} 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part {III}}, booktitle_short = {Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)}, year = {2021}, series = {Lecture Notes in Computer Science}, volume = {12977}, pages = {462--477}, publisher = {Springer}, doi = {10.1007/978-3-030-86523-8\_28}, url = {https://arxiv.org/abs/2106.11690}, }

2020

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Vu-Linh Nguyen, and Eyke Hüllermeier

Learning Gradient Boosted Multi-label Classification Rules

In Machine Learning and Knowledge Discovery in Databases (ECML-PKDD) , 2020

#multi-label #rule-learning #gradient boosting

Abs Bib DOI URL

In multi-label classification, where the evaluation of predic-tions is less straightforward than in single-label classification, variousmeaningful, though different, loss functions have been proposed. Ideally,the learning algorithm should be customizable towards a specific choiceof the performance measure. Modern implementations of boosting, mostprominently gradient boosted decision trees, appear to be appealing fromthis point of view. However, they are mostly limited to single-label clas-sification, and hence not amenable to multi-label losses unless these arelabel-wise decomposable. In this work, we develop a generalization of thegradient boosting framework to multi-output problems and propose analgorithm for learning multi-label classification rules that is able to min-imize decomposable as well as non-decomposable loss functions. Usingthe well-known Hamming loss and subset 0/1 loss as representatives, weanalyze the abilities and limitations of our approach on synthetic dataand evaluate its predictive performance on multi-label benchmarks.
@inproceedings{rapp20boomer, author = {Rapp, Michael and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes and Nguyen, Vu-Linh and H{\"{u}}llermeier, Eyke}, editor = {Hutter, Frank and Kersting, Kristian and Lijffijt, Jefrey and Valera, Isabel}, keywords = {Gradient boosting, multilabel classification, Rule Learning}, title = {Learning Gradient Boosted Multi-label Classification Rules}, booktitle = {Machine Learning and Knowledge Discovery in Databases (ECML-PKDD)}, series = {Lecture Notes in Computer Science}, volume = {12459}, year = {2020}, pages = {124--140}, publisher = {Springer}, url = {https://link.springer.com/chapter/10.1007/978-3-030-67664-3_8}, doi = {https://doi.org/10.1007/978-3-030-67664-3_8}, }

Investigating the connection between label dependencies and the losses which are optimized

2022

Eyke Hüllermeier, Marcel Wever, Eneldo Loza Mencía, Johannes Fürnkranz, and Michael Rapp

A Flexible Class of Dependence-sensitive Multi-label Loss Functions

Machine Learning Journal, Jan 2022

European Conference of Machine Learning (ECML) Journal Track

#multi-label #non-additive measures #evaluation

Abs Bib DOI URL

Multi-label classification is the task of assigning a subset of labels to a given query instance. For evaluating such predictions, the set of predicted labels needs to be compared to the ground-truth label set associated with that instance, and various loss functions have been proposed for this purpose. In addition to assessing predictive accuracy, a key concern in this regard is to foster and to analyze a learner’s ability to capture label dependencies. In this paper, we introduce a new class of loss functions for multi-label classification, which overcome disadvantages of commonly used losses such as Hamming and subset 0/1. To this end, we leverage the mathematical framework of non-additive measures and integrals. Roughly speaking, a non-additive measure allows for modeling the importance of correct predictions of label subsets (instead of single labels), and thereby their impact on the overall evaluation, in a flexible way - by giving full importance to single labels and the entire label set, respectively, Hamming and subset 0/1 are rather extreme in this regard. We present concrete instantiations of this class, which comprise Hamming and subset 0/1 as special cases, and which appear to be especially appealing from a modeling perspective. The assessment of multi-label classifiers in terms of these losses is illustrated in an empirical study.
@article{wever22mlclosses, author = {H{\"{u}}llermeier, Eyke and Wever, Marcel and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes and Rapp, Michael}, title = {A Flexible Class of Dependence-sensitive Multi-label Loss Functions}, journal = {Machine Learning Journal}, volume = {111}, number = {2}, pages = {713--737}, year = {2022}, month = jan, note = {European Conference of Machine Learning (ECML) Journal Track}, url = {https://arxiv.org/abs/2011.00792}, doi = {10.1007/s10994-021-06107-2}, }

2020

Vu-Linh Nguyen, Eyke Hüllermeier, Michael Rapp, Eneldo Loza Mencía, and Johannes Fürnkranz

On Aggregation in Ensembles of Multilabel Classifiers

In Discovery Science , Oct 2020

#multi-label #aggregation #ensembles

Abs Bib DOI URL

While a variety of ensemble methods for multilabel classification have been proposed in the literature, the question of how to aggregate the predictions of the individual members of the ensemble has received little attention so far. In this paper, we introduce a formal framework of ensemble multilabel classification, in which we distinguish two principal approaches: "predict then combine" (PTC), where the ensemble members first make loss minimizing predictions which are subsequently combined, and "combine then predict" (CTP), which first aggregates information such as marginal label probabilities from the individual ensemble members, and then derives a prediction from this aggregation. While both approaches generalize voting techniques commonly used for multilabel ensembles, they allow to explicitly take the target performance measure into account. Therefore, concrete instantiations of CTP and PTC can be tailored to concrete loss functions. Experimentally, we show that standard voting techniques are indeed outperformed by suitable instantiations of CTP and PTC, and provide some evidence that CTP performs well for decomposable loss functions, whereas PTC is the better choice for non-decomposable losses.
@inproceedings{ln:MLCaggregation, author = {Nguyen, Vu-Linh and H{\"{u}}llermeier, Eyke and Rapp, Michael and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, editor = {Appice, Annalisa and Tsoumakas, Grigorios and Manolopoulos, Yannis and Matwin, Stan}, keywords = {Combine then Predict, Ensembles of Multilabel Classifiers, F-measure, Hamming loss, Predict then Combine, Subset 0/1 loss}, month = oct, title = {On Aggregation in Ensembles of Multilabel Classifiers}, booktitle = {Discovery Science}, series = {Lecture Notes in Computer Science}, volume = {12323}, year = {2020}, pages = {533--547}, publisher = {Springer International Publishing}, address = {Cham}, isbn = {978-3-030-61527-7}, url = {https://arxiv.org/abs/2006.11916}, doi = {https://doi.org/10.1007/978-3-030-61527-7_35}, }

Learning interpretable multilabel rules with separate-and-conquer and tricks to make it more efficient and more expressive

2019

Yannik Klein, Michael Rapp, and Eneldo Loza Mencía

Efficient Discovery of Expressive Multi-label Rules using Relaxed Pruning

In Discovery Science , Oct 2019

Best Student Paper Award

#multi-label #rule-learning

Abs Bib DOI URL

Being able to model correlations between labels is considered crucial in multi-label classification. Rule-based models enable to expose such dependencies, e.g., implications, subsumptions, or exclusions, in an interpretable and human-comprehensible manner. Albeit the number of possible label combinations increases exponentially with the number of available labels, it has been shown that rules with multiple labels in their heads, which are a natural form to model local label dependencies, can be induced efficiently by exploiting certain properties of rule evaluation measures and pruning the label search space accordingly. However, experiments have revealed that multi-label heads are unlikely to be learned by existing methods due to their restrictiveness. To overcome this limitation, we propose a plug-in approach that relaxes the search space pruning used by existing methods in order to introduce a bias towards larger multi-label heads resulting in more expressive rules. We further demonstrate the effectiveness of our approach empirically and show that it does not come with drawbacks in terms of training time or predictive performance.
@inproceedings{yk:Relaxed-Pruning, author = {Klein, Yannik and Rapp, Michael and Loza Menc{\'{i}}a, Eneldo}, editor = {Kralj Novak, Petra and {\v S}muc, Tomislav and D{\v z}eroski, Sa{\v s}o}, keywords = {Label Dependencies, multilabel classification, Rule Learning}, month = oct, title = {Efficient Discovery of Expressive Multi-label Rules using Relaxed Pruning}, booktitle = {Discovery Science}, year = {2019}, pages = {367--382}, publisher = {Springer International Publishing}, note = {Best Student Paper Award}, isbn = {978-3-030-33778-0}, url = {https://arxiv.org/abs/1908.06874}, doi = {10.1007/978-3-030-33778-0_28}, }

2018

Michael Rapp, Eneldo Loza Mencía, and Johannes Fürnkranz

Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

In PAKDD 2018: Advances in Knowledge Discovery and Data Mining , 2018

#multi-label #rule-learning

Abs Bib DOI URL

Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.
@inproceedings{mr:ML-Antimonotonicity, author = {Rapp, Michael and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, editor = {Phung, Dinh Q. and Tseng, Vincent S. and Webb, Geoffrey I. and Ho, Bao and Ganji, Mohadeseh and Rashidi, Lida}, title = {Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules}, booktitle = {PAKDD 2018: Advances in Knowledge Discovery and Data Mining}, year = {2018}, pages = {29--42}, publisher = {Springer International Publishing}, address = {Cham}, isbn = {978-3-319-93034-3}, url = {https://arxiv.org/abs/1812.06833}, doi = {10.1007/978-3-319-93034-3_3}, }

Extraction of rules from (deep) neural networks in order to enhance understandability

2017

Camila González, Eneldo Loza Mencía, and Johannes Fürnkranz

Re-training Deep Neural Networks to Facilitate Boolean Concept Extraction

In Proceedings of the 20th International Conference on Discovery Science (DS-17) , Oct 2017

#extraction of rules from neural networks

Bib DOI URL

@inproceedings{jf:DS-17-DNN-Retraining,
  author = {Gonz{\'{a}}lez, Camila and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes},
  month = oct,
  title = {Re-training Deep Neural Networks to Facilitate Boolean Concept Extraction},
  booktitle = {Proceedings of the 20th International Conference on Discovery Science (DS-17)},
  series = {Lecture Notes in Computer Science},
  volume = {10558},
  year = {2017},
  pages = {127--143},
  publisher = {Springer-Verlag},
  isbn = {978-3-319-67785-9},
  url = {https://ke-tud.github.io/publications/papers/DS17-DNN-Retraining.pdf},
  doi = {10.1007/978-3-319-67786-6_10},
}

2016

Jan Ruben Zilke, Eneldo Loza Mencía, and Frederik Janssen

DeepRED – Rule Extraction from Deep Neural Networks

In Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19–21, 2016, Proceedings , 2016

#extraction of rules from neural networks

Bib DOI URL

@inproceedings{zilke2016deepRED,
  author = {Zilke, Jan Ruben and Loza Menc{\'{i}}a, Eneldo and Janssen, Frederik},
  editor = {Calders, Toon and Ceci, Michelangelo and Malerba, Donato},
  title = {DeepRED -- Rule Extraction from Deep Neural Networks},
  booktitle = {Discovery Science: 19th International Conference, DS 2016, Bari, Italy, October 19--21, 2016, Proceedings},
  year = {2016},
  pages = {457--473},
  publisher = {Springer International Publishing},
  isbn = {978-3-319-46307-0},
  url = {https://ke-tud.github.io/publications/papers/DS16DeepRED.pdf},
  doi = {10.1007/978-3-319-46307-0_29},
}

Instead of just averaging, how to combine the predictions of the individual trees in an ensemble of random decision trees if you know about the certainty of the predictions

2021

Florian Busch, Moritz Kulessa, Eneldo Loza Mencía, and Hendrik Blockeel

Combining Predictions under Uncertainty: The Case of Random Decision Trees

In Discovery Science , Sep 2021

#random decision trees #combination of predictions #uncertainty

Bib DOI URL

@inproceedings{busch21RDTcombination,
  author = {Busch, Florian and Kulessa, Moritz and Loza Menc{\'{i}}a, Eneldo and Blockeel, Hendrik},
  editor = {Soares, Carlos and Torgo, Lu{\'{i}}s},
  month = sep,
  title = {Combining Predictions under Uncertainty: The Case of Random Decision Trees},
  booktitle = {Discovery Science},
  series = {Lecture Notes in Computer Science},
  volume = {12986},
  year = {2021},
  pages = {78--93},
  publisher = {Springer},
  url = {https://doi.org/10.1007/978-3-030-88942-5\_7},
  doi = {10.1007/978-3-030-88942-5\_7},
}

Analyzing what is important in texts, what is relevant for different text quality criteria, and which features are useful for automatically producing text summaries

2020

Margot Mieskes, Eneldo Loza Mencía, and Tim Kronsbein

A Data Set for the Analysis of Text Quality Dimensions in Summarization Evaluation

In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020) , May 2020

Data set available at \urlhttps://github.com/keelm/DIP-SumEval

#automatic text summarization #dataset #human evaluation

Abs Bib URL

Automatic evaluation of summarization focuses on developing a metric to represent the quality of the resulting text. However, text qualityis represented in a variety of dimensions ranging from grammaticality to readability and coherence. In our work, we analyze the depen-dencies between a variety of quality dimensions on automatically created multi-document summaries and which dimensions automaticevaluation metrics such as ROUGE, PEAK or JSD are able to capture. Our results indicate that variants of ROUGE are correlated tovarious quality dimensions and that some automatic summarization methods achieve higher quality summaries than others with respectto individual summary quality dimensions. Our results also indicate that differentiating between quality dimensions facilitates inspectionand fine-grained comparison of summarization methods and its characteristics. We make the data from our two summarization qualityevaluation experiments publicly available in order to facilitate the future development of specialized automatic evaluation methods.
@inproceedings{mm:DIP-SumEval, author = {Mieskes, Margot and Loza Menc{\'{i}}a, Eneldo and Kronsbein, Tim}, month = may, title = {A Data Set for the Analysis of Text Quality Dimensions in Summarization Evaluation}, booktitle = {Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020)}, year = {2020}, pages = {6690–-6699}, publisher = {European Language Resources Association}, address = {Marseille, France}, note = {Data set available at \url{https://github.com/keelm/DIP-SumEval}}, url = {https://aclanthology.org/2020.lrec-1.826}, }

2018

Markus Zopf, Teresa Botschen, Tobias Falke, Benjamin Heinzerling, Ana Marasovic, Todor Mihaylov, Avinesh P.V.S., Eneldo Loza Mencía, Johannes Fürnkranz, and Anette Frank

What’s important in a text? An extensive evaluation of linguistic annotations for summarization

In Proceedings of the 5th International Conference on Social Networks Analysis, Management and Security (SNAMS-18) , Valencia, Spain, Oct 2018

#automatic text summarization #feature evaluation

Bib DOI URL

@inproceedings{aiphes:text-important,
  author = {Zopf, Markus and Botschen, Teresa and Falke, Tobias and Heinzerling, Benjamin and Marasovic, Ana and Mihaylov, Todor and P.V.S., Avinesh and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes and Frank, Anette},
  month = oct,
  title = {What’s important in a text? An extensive evaluation of linguistic annotations for summarization},
  booktitle = {Proceedings of the 5th International Conference on Social Networks Analysis, Management and Security (SNAMS-18)},
  year = {2018},
  pages = {272--277},
  location = {Valencia, Spain},
  url = {https://ieeexplore.ieee.org/document/8554853},
  doi = {10.1109/SNAMS.2018.8554853},
}

2018

Markus Zopf, Eneldo Loza Mencía, and Johannes Fürnkranz

Which Scores to Predict in Sentence Regression for Text Summarization?

In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018) , Jun 2018

#automatic text summarization #evaluation

Bib DOI URL

@inproceedings{TUD-CS-2018-0032,
  author = {Zopf, Markus and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes},
  editor = {Walker, Marilyn A. and Ji, Heng and Stent, Amanda},
  month = jun,
  title = {Which Scores to Predict in Sentence Regression for Text Summarization?},
  booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018)},
  volume = {1: Long Papers},
  year = {2018},
  pages = {1782--1791},
  url = {http://aclweb.org/anthology/N18-1161},
  doi = {10.18653/v1/N18-1161},
}

Using preference learning for dealing with importance in Automatic Text Summarization

2016

Markus Zopf, Eneldo Loza Mencía, and Johannes Fürnkranz

Beyond Centrality and Structural Features: Learning Information Importance for Text Summarization

In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning , Berlin, Germany, Aug 2016

#automatic text summarization #pairwise learning

Abs Bib URL

Most automatic text summarization systems proposed to date rely on centrality and structural features as indicators for information importance. In this paper, we argue that these features cannot reliably detect important information in heterogeneous document collections. Instead, we propose CPSum, a summarizer that learns the importance of information objects from a background source. Our hypothesis is tested on a multi-document corpus where we remove centrality and structural features. CPSum proves to be able to perform well in this challenging scenario whereas reference systems fail.
@inproceedings{ZopfLozaFuernkranz2016CPSum, author = {Zopf, Markus and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, month = aug, title = {Beyond Centrality and Structural Features: Learning Information Importance for Text Summarization}, booktitle = {Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning}, year = {2016}, pages = {84-94}, publisher = {Association for Computational Linguistics}, location = {Berlin, Germany}, url = {http://www.aclweb.org/anthology/K16-1009}, }

Link between multilabel classification and unsupervised learning: Learning domain-depending embeddings with the help of a labelled background corpus

2016

Eneldo Loza Mencía, Gerard de Melo, and Jinseok Nam

Medical Concept Embeddings via Labeled Background Corpora

In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) , May 2016

#medical embeddings #background corpora

Abs Bib URL

In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.
@inproceedings{loza16medsim, author = {Loza Menc{\'{i}}a, Eneldo and de Melo, Gerard and Nam, Jinseok}, month = may, title = {Medical Concept Embeddings via Labeled Background Corpora}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, year = {2016}, pages = {4629--4636}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, url = {http://www.lrec-conf.org/proceedings/lrec2016/pdf/1190_Paper.pdf}, }

Learning embeddings for documents, labels and their descriptions, and words together: better classification accuracy and enables zero-shot learning

2016

Jinseok Nam, Eneldo Loza Mencía, and Johannes Fürnkranz

All-in Text: Learning Document, Label, and Word Representations Jointly

In Proceedings of the 30th AAAI Conference on Artificial Intelligence , Phoenix, Arizona, 2016

#multi-label #embeddings #extreme classification #scalability #zero-shot learning

Bib Code URL

@inproceedings{nam16allintext,
  author = {Nam, Jinseok and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes},
  editor = {Schuurmans, Dale and Wellman, Michael P.},
  title = {All-in Text: Learning Document, Label, and Word Representations Jointly},
  booktitle = {Proceedings of the 30th AAAI Conference on Artificial Intelligence},
  year = {2016},
  pages = {1948--1954},
  publisher = {AAAI Press},
  location = {Phoenix, Arizona},
  url = {http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12058},
}

Multilabel-Classification of tweets

2016

Axel Schulz, Eneldo Loza Mencía, and Benedikt Schmidt

A Rapid-Prototyping Framework for Extracting Small-Scale Incident-Related Information in Microblogs: Application of Multi-Label Classification on Tweets

Information Systems, Apr 2016

#twitter mining #sentiment analysis #multi-label #framework

Bib DOI URL

@article{schulz15mltweets,
  author = {Schulz, Axel and Loza Menc{\'{i}}a, Eneldo and Schmidt, Benedikt},
  month = apr,
  title = {A Rapid-Prototyping Framework for Extracting Small-Scale Incident-Related Information in Microblogs: Application of Multi-Label Classification on Tweets},
  journal = {Information Systems},
  volume = {57},
  year = {2016},
  pages = {88-110},
  url = {https://ke-tud.github.io/publications/papers/infsys16tweetsMLC.pdf},
  doi = {10.1016/j.is.2015.10.010},
}

Exploiting hierarchies and joint embedding for (zero-shot) multilabel classification

2015

Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J. Kim, and Johannes Fürnkranz

Predicting Unseen Labels using Label Hierarchies in Large-Scale Multi-label Learning

In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases , Porto, Portugal, 2015

#multi-label #embeddings #zero-shot learning

Abs Bib DOI URL

An important problem in multi-label classification is to capture label patterns or underlying structures that have an impact on such patterns. One way of learning underlying structures over labels is to project both instances and labels into the same space where an instance and its relevant labels tend to have similar representations. In this paper, we present a novel method to learn a joint space of instances and labels by leveraging a hierarchy of labels. We also present an efficient method for pretraining vector representations of labels, namely label embeddings, from large amounts of label co-occurrence patterns and hierarchical structures of labels. This approach also allows us to make predictions on labels that have not been seen during training. We empirically show that the use of pretrained label embeddings allows us to obtain higher accuracies on unseen labels even when the number of labels are quite large. Our experimental results also demonstrate qualitatively that the proposed method is able to learn regularities among labels by exploiting a label hierarchy as well as label co-occurrences.
@inproceedings{nam15wsabieH, author = {Nam, Jinseok and Loza Menc{\'{i}}a, Eneldo and Kim, Hyunwoo J. and F{\"{u}}rnkranz, Johannes}, title = {Predicting Unseen Labels using Label Hierarchies in Large-Scale Multi-label Learning}, booktitle = {Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases}, year = {2015}, pages = {102-118}, publisher = {Springer International Publishing}, location = {Porto, Portugal}, issn = {0302-9743}, isbn = {978-3-319-23527-1}, url = {https://ke-tud.github.io/publications/papers/ECML2015Nam.pdf}, doi = {10.1007/978-3-319-23528-8_7}, }

Solutions for tasks when objects may belong to labels with a certain grade, e.g. with 0 to 5 stars mapping of movies to genres

2014

Christian Brinker, Eneldo Loza Mencía, and Johannes Fürnkranz

Graded Multilabel Classification by Pairwise Comparisons

In 2014 IEEE International Conference on Data Mining (ICDM 2014) , Dec 2014

#multi-label #ordered classification #pairwise learning

Abs Bib DOI URL

The task in multilabel classification is to predict for a given set of labels whether each individual label should be attached to an instance or not. Graded multilabel classification generalizes this setting by allowing to specify for each label a degree of membership on an ordinal scale. This setting can be frequently found in practice, for example when movies or books are assessed on a one-to-five star rating in multiple categories. In this paper, we propose to reformulate the problem in terms of preferences between the labels and their scales, which then be tackled by learning from pairwise comparisons. We present three different approaches which make use of this decomposition and show on three datasets that we are able to outperform baseline approaches. In particular, we show that our solution, which is able to model pairwise preferences across multiple scales, outperforms a straight-forward approach which considers the problem as a set of independent ordinal regression tasks.
@inproceedings{brinker14gmlc, author = {Brinker, Christian and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, month = dec, title = {Graded Multilabel Classification by Pairwise Comparisons}, booktitle = {2014 IEEE International Conference on Data Mining (ICDM 2014)}, year = {2014}, pages = {731--736}, publisher = {Curran Associates, IEEE}, address = {Shenzhen, China}, issn = {1550-4786}, isbn = {978-1-4799-4303-6}, url = {www.ke.tu-darmstadt.de/publications/papers/ICDM14graded.pdf}, doi = {10.1109/ICDM.2014.102}, }

Use of neural networks and techniques from deep learning for large scale text classification

2014

Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz

Large-Scale Multi-label Text Classification - Revisiting Neural Networks

In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD-14), Part 2 , Sep 2014

#multi-label #neural networks #deep learning

Bib DOI URL

@inproceedings{nam14revisiting,
  author = {Nam, Jinseok and Kim, Jungi and Loza Menc{\'{i}}a, Eneldo and Gurevych, Iryna and F{\"{u}}rnkranz, Johannes},
  editor = {Calders, Toon and Esposito, Floriana and H{\"{u}}llermeier, Eyke and Meo, Rosa},
  month = sep,
  title = {Large-Scale Multi-label Text Classification - Revisiting Neural Networks},
  booktitle = {Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD-14), Part 2},
  series = {Lecture Notes in Computer Science},
  volume = {8725},
  year = {2014},
  pages = {437--452},
  publisher = {Springer Berlin Heidelberg},
  isbn = {978-3-662-44850-2},
  url = {https://ke-tud.github.io/publications/papers/NN_MLC_ecml2014_camera_ready.pdf},
  doi = {10.1007/978-3-662-44851-9_28},
}

Dissertation about (m)any aspect(s) of Efficient Pairwise Multilabel Classification, and more:

2012

Eneldo Loza Mencía

Efficient Pairwise Multilabel Classification

Jul 2012

submitted on 2012-06-10, defended on 2012-07-24

Abs Bib URL

Multilabel classification learning is the task of learning a mapping between objects and sets of possibly overlapping classes and has gained increasing attention in recent times. A prototypical application scenario for multilabel classification is the assignment of a set of keywords to a document, a frequently encountered problem in the text classification domain. With upcoming Web 2.0 technologies, this domain is extended by a wide range of tag suggestion tasks and the trend definitely is moving towards more data points and more labels. This work provides an extended introduction into the topic of multilabel classification, a detailed formalization and a comprehensive overview of the present state-of-the-art approaches. A commonly used solution for solving multilabel tasks is to decompose the original problem into several subproblems. These subtasks are usually easy to solve with conventional techniques. In contrast to the straightforward approach of training one classifier for independently predicting the relevance of each class (binary relevance), this work focuses particularly on the pairwise decomposition of the original problem in which a decision function is learned for each possible pair of classes. The main advantage of this approach, the improvement of the predictive quality, comes at the cost of its main disadvantage, the quadratic number of classifiers needed (with respect to the number of labels). This thesis presents a framework of efficient and scalable solutions for handling hundreds or thousands of labels despite the quadratic dependency. As it turns out, training such a pairwise ensemble of classifiers can be accomplished in linear time and only differs from the straightforward binary relevance approach (BR) by a factor relative to the average number of labels associated to an object, which is usually small. Furthermore, the integration of a smart scheduling technique inspired from sports tournaments safely reduces the quadratic number of base classifier evaluations to log-linear in practice. Combined with a simple yet fast and powerful learning algorithm for linear classifiers, data with a huge number of high dimensional points, which was not amenable to pairwise learning before, can be processed even under real-time conditions. The remaining bottleneck, the exploding memory requirements, is coped by taking advantage of an interesting property of linear classifiers, namely the possibility of dual reformulation as a linear combination of the training examples. The suitability is demonstrated on the novel EUR-Lex text collection, which particularly puts the main scalability issue of pairwise learning to test. With its almost 4,000 labels and 20,000 documents it is one of the most challenging test beds in multilabel learning to date. The dual formulation allows to maintain the mathematical equivalent to 8 million base learners needed for conventionally solving EUR-Lex in almost the same amount of space as binary relevance. Moreover, BR was clearly beaten in the experiments. A further contribution based on hierarchical decomposition and arrangement of the original problem allows to reduce the dependency on the number of labels to even sub-linearity. This approach opens the door to a wide range of new challenges and applications but simultaneously maintains the advantages of pairwise learning, namely the excellent predictive quality. It was even shown in comparison to the flat variant that it has a particularly positive effect on balancing recall and precision on datasets with a large number of labels. The improved scalability and efficiency allowed to apply pairwise classification to a set of large multilabel problems with a parallel base of data points but different domains of labels. A first attempt was made in this parallel tasks setting in order to investigate the exploitation of label dependencies by pairwise learning, with first encouraging results. The usage of multilabel learning techniques for the automatic annotation of texts constitutes a further obvious but so far missing connection to multi-task and multi-target learning. The presented solution considers the simultaneous tagging of words with different but possibly overlapping annotation schemes as a multilabel problem. This solution is expected to particularly benefit from approaches which exploit label dependencies. The ability of pairwise learning for this purpose is obviously restricted to pairwise relations, therefore a technique is investigated which explores label constellations that only exist locally for a subgroup of data points. In addition to the positive effect of the supplemental information, the experimental evaluation demonstrates an interesting insight with regards to the different behavior of several state-of-the-art approaches with respect to the optimization of particular multilabel measures, a controversial topic in multilabel classification.
@phdthesis{loza2012diss, author = {Loza Menc{\'{i}}a, Eneldo}, keywords = {efficiency, multilabel classification, pairwise classification, scalability}, month = jul, title = {Efficient Pairwise Multilabel Classification}, type = {Dissertation}, year = {2012}, school = {Technische Universit{\"{a}}t Darmstadt, Knowledge Engineering Group}, note = {submitted on 2012-06-10, defended on 2012-07-24}, url = {http://tuprints.ulb.tu-darmstadt.de/3226/7/loza12diss.pdf}, urn = {urn:nbn:de:tuda-tuprints-32260}, tuprints-url = {http://tuprints.ulb.tu-darmstadt.de/3226/} }

Application of Subgroup Discovery finding locally exceptional patterns in multilabel data in order to exploit label dependencies:

2012

Wouter Duivesteijn, Eneldo Loza Mencía, Johannes Fürnkranz, and Arno J. Knobbe

Multi-label LeGo – Enhancing Multi-label Classifiers with Local Patterns

In Advances in Intelligent Data Analysis XI – Proceedings of the 11th International Symposium on Data Analysis (IDA-11) , Berlin, Oct 2012

Longer version available at https://ke-tud.github.io/bibtex/publications/show/2342

#multi-label #local patterns #exceptionality mining

Bib DOI URL

@inproceedings{duivesteijn12MultilabelLego,
  author = {Duivesteijn, Wouter and Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes and Knobbe, Arno J.},
  editor = {Hollm{\'{e}}n, Jaakko and Klawonn, Frank and Tucker, Allan},
  month = oct,
  title = {Multi-label LeGo -- Enhancing Multi-label Classifiers with Local Patterns},
  booktitle = {Advances in Intelligent Data Analysis XI -- Proceedings of the 11th International Symposium on Data Analysis (IDA-11)},
  series = {Lecture Notes in Computer Science},
  volume = {7619},
  year = {2012},
  pages = {114--125},
  publisher = {Springer},
  location = {Berlin},
  note = {Longer version available at https://ke-tud.github.io/bibtex/publications/show/2342},
  isbn = {978-3-642-34155-7},
  url = {https://ke-tud.github.io/publications/papers/IDA-12.pdf},
  doi = {10.1007/978-3-642-34156-4_12},
}

Connection between multi-task learning and multilabel classification in order to exploit label dependencies:

2010

Eneldo Loza Mencía

Multilabel Classification in Parallel Tasks

In Working Notes of the 2nd International Workshop on Learning from Multi-Label Data at ICML/COLT 2010 , Haifa, Israel, Jun 2010

#multi-label #transfer learning

Abs Bib URL

In real world multilabel problems, it is often the case that e.g. documents are simultaneously classified with labels from multiple domains, such as genres in addition to topics. In practice, each of these problems is solved independently without taking advantage of possible label correlations between domains. Following the multi-task learning setting, in which multiple similar tasks are learned in parallel, we propose a global learning approach that jointly considers all domains. It is empirically demonstrated in this work that this approach is effective despite its simplicity when using a multilabel learner that takes label correlations into account.
@inproceedings{loza10pt, author = {Loza Menc{\'{i}}a, Eneldo}, editor = {Zhang, Min-Ling and Tsoumakas, Grigorios and Zhou, Zhi-Hua}, month = jun, title = {Multilabel Classification in Parallel Tasks}, booktitle = {Working Notes of the 2nd International Workshop on Learning from Multi-Label Data at ICML/COLT 2010}, year = {2010}, pages = {29-36}, location = {Haifa, Israel}, url = {https://ke-tud.github.io/publications/papers/loza10mlpt.pdf}, crossref = {zhang10mlworkshop}, }

Usage of XML-specific features and machine learning techniques for information extraction applied to documents from the French IPR Law:

2009

Eneldo Loza Mencía

Segmentation of legal documents

In Proceedings of the 12th International Conference on Artificial Intelligence and Law , Barcelona, Spain, Jun 2009

#text segmentation #information extraction

Abs Bib DOI URL

An overwhelming number of legal documents is available in digital form. However, most of the texts are usually only provided in a semi-structured form, i.e. the documents are structured only implicitly using text formatting and alignment. In this form the documents are perfectly understandable by a human, but not by a machine. This is an obstacle towards advanced intelligent legal information retrieval and knowledge systems. The reason for this lack of structured knowledge is that the conversion of texts in conventional form into a structured, machine-readable form, a process called segmentation, is frequently done manually and is therefore very expensive. We introduce a trainable system based on state-of-the-art Information Extraction techniques for the automatic segmentation of legal documents. Our system makes special use of the implicitly given structure in the source digital file as well as of the explicit knowledge about the target structure. Our evaluation on the French IPR Law demonstrates that the system is able to learn an effective segmenter given only a few manually processed training documents. In some cases, even only one seen example is sufficient in order to correctly process the remaining documents.
@inproceedings{loza09segmentation, author = {Loza Menc{\'{i}}a, Eneldo}, month = jun, title = {Segmentation of legal documents}, booktitle = {Proceedings of the 12th International Conference on Artificial Intelligence and Law}, year = {2009}, pages = {88--97}, publisher = {ACM}, location = {Barcelona, Spain}, address = {New York, NY, USA}, isbn = {978-1-60558-597-0}, url = {https://ke-tud.github.io/publications/papers/loza09segmentation.pdf}, doi = {10.1145/1568234.1568245}, }

Enhancement of the Calibrated Label Ranking approach by the efficient voting strategy QWeighted that reduces the predictive costs from quadratic to n log n:

2010

Eneldo Loza Mencía, Sang-Hyeun Park, and Johannes Fürnkranz

Efficient Voting Prediction for Pairwise Multilabel Classification

Neurocomputing, Mar 2010

#multi-label #pairwise learning #aggregation #scalability

Abs Bib DOI URL

The pairwise approach to multilabel classification reduces the problem to learning and aggregating preference predictions among the possible labels. A key problem is the need to query a quadratic number of preferences for making a prediction. To solve this problem, we extend the recently proposed QWeighted algorithm for efficient pairwise multiclass voting to the multilabel setting, and evaluate the adapted algorithm on several real-world datasets. We achieve an average-case reduction of classifier evaluations from n^2 to n + n d log n, where n is the total number of possible labels and d is the average number of labels per instance, which is typically quite small in real-world datasets.
@article{jf:Neurocomputing, author = {Loza Menc{\'{i}}a, Eneldo and Park, Sang-Hyeun and F{\"{u}}rnkranz, Johannes}, keywords = {efficient classification, learning by pairwise comparison, multilabel classification, voting aggregation}, month = mar, title = {Efficient Voting Prediction for Pairwise Multilabel Classification}, journal = {Neurocomputing}, volume = {73}, number = {7-9}, year = {2010}, pages = {1164 - 1176}, issn = {0925-2312}, url = {https://ke-tud.github.io/publications/papers/neucom10.pdf}, doi = {10.1016/j.neucom.2009.11.024}, note2 = {Volume: Advances in Computational Intelligence and Learning - 17th European Symposium on Artificial Neural Networks 2009, 17th European Symposium on Artificial Neural Networks 2009}, }

Dual reformulation of MLPP in order to deal with a large number of labels (up to 4000) though quadratic number of base classifiers, introduction of the EUR-Lex dataset:

2010

Eneldo Loza Mencía, and Johannes Fürnkranz

Efficient Multilabel Classification Algorithms for Large-Scale Problems in the Legal Domain

In Semantic Processing of Legal Texts – Where the Language of Law Meets the Law of Language , May 2010

accompanying EUR-Lex dataset available at \urlhttps://ke-tud.github.io/resources/eurlex

#multi-label #rule-learning #scalability #data set

Abs Bib DOI URL

In this paper we apply multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. For this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates the feasibility of this approach for large-scale tasks.
@incollection{jf:SemanticLaw, author = {Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, editor = {Francesconi, Enrico and Montemagni, Simonetta and Peters, Wim and Tiscornia, Daniela}, keywords = {EUR-Lex Database, learning by pairwise comparison, Legal Documents, multilabel classification, Text Classification}, month = may, title = {Efficient Multilabel Classification Algorithms for Large-Scale Problems in the Legal Domain}, booktitle = {Semantic Processing of Legal Texts -- Where the Language of Law Meets the Law of Language}, edition = {1}, series = {Lecture Notes in Artificial Intelligence}, volume = {6036}, year = {2010}, pages = {192-215}, publisher = {Springer-Verlag}, note = {accompanying EUR-Lex dataset available at \url{https://ke-tud.github.io/resources/eurlex}}, isbn = {978-3-642-12836-3}, url = {https://ke-tud.github.io/publications/papers/loza10eurlex.pdf}, doi = {10.1007/978-3-642-12837-0_11}, }

General extension of pairwise classification by Calibration in order to divide the predicted ranking into relevant and irrelevant labels:

2008

Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker

Multilabel Classification via Calibrated Label Ranking

Machine Learning, Jun 2008

#multi-label #pairwise learning

Bib DOI URL

@article{jf:MLJ-Multilabel,
  author = {F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke and Loza Menc{\'{i}}a, Eneldo and Brinker, Klaus},
  month = jun,
  title = {Multilabel Classification via Calibrated Label Ranking},
  journal = {Machine Learning},
  volume = {73},
  number = {2},
  year = {2008},
  pages = {133--153},
  url = {https://ke-tud.github.io/publications/papers/MLJ08.pdf},
  doi = {10.1007/s10994-008-5064-8},
}

The effective Multilabel Pairwise Perceptrons (MLPP) algorithm on the large Reuters RCV1 dataset:

2008

Eneldo Loza Mencía, and Johannes Fürnkranz

Pairwise Learning of Multilabel Classifications with Perceptrons

In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08) , 2008

#multi-label #pairwise learning #neural networks

Abs Bib DOI URL

Multiclass multilabel perceptrons (MMP) have been proposed as an efficient incremental training algorithm for addressing a multilabel prediction task with a team of perceptrons. The key idea is to train one binary classifier per label, as is typically done for addressing multilabel problems, but to make the training signal dependent on the performance of the whole ensemble. In this paper, we propose an alternative technique that is based on a pairwise approach, i.e., we incrementally train a perceptron for each pair of classes. Our evaluation on four multilabel datasets shows that the multilabel pairwise perceptron (MLPP) algorithm yields substantial improvements over MMP in terms of ranking quality and overfitting resistance, while maintaining its efficiency. Despite the quadratic increase in the number of perceptrons that have to be trained, the increase in computational complexity is bounded by the average number of labels per training example.
@inproceedings{jf:IJCNN-08, author = {Loza Menc{\'{i}}a, Eneldo and F{\"{u}}rnkranz, Johannes}, title = {Pairwise Learning of Multilabel Classifications with Perceptrons}, booktitle = {Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08)}, year = {2008}, pages = {2900--2907}, organization = {IEEE}, address = {Hong Kong}, isbn = {978-1-4244-1821-3}, url = {https://ke-tud.github.io/publications/papers/loza08MLPP.pdf}, doi = {10.1109/IJCNN.2008.4634206}, }