All Publications


  • Convex Geometry and Duality of Over-parameterized Neural Networks JOURNAL OF MACHINE LEARNING RESEARCH Ergen, T., Pilanci, M. 2021; 22
  • CONVEX NEURAL AUTOREGRESSIVE MODELS: TOWARDS TRACTABLE, EXPRESSIVE, AND THEORETICALLY-BACKED MODELS FOR SEQUENTIAL FORECASTING AND GENERATION Gupta, V., Bartan, B., Ergen, T., Pilanci, M., IEEE IEEE. 2021: 3890-3894
  • Energy-Efficient LSTM Networks for Online Learning IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Ergen, T., Mirza, A. H., Kozat, S. 2020; 31 (8): 3114–26

    Abstract

    We investigate variable-length data regression in an online setting and introduce an energy-efficient regression structure build on long short-term memory (LSTM) networks. For this structure, we also introduce highly effective online training algorithms. We first provide a generic LSTM-based regression structure for variable-length input sequences. To reduce the complexity of this structure, we then replace the regular multiplication operations with an energy-efficient operator, i.e., the ef-operator. To further reduce the complexity, we apply factorizations to the weight matrices in the LSTM network so that the total number of parameters to be trained is significantly reduced. We then introduce online training algorithms based on the stochastic gradient descent (SGD) and exponentiated gradient (EG) algorithms to learn the parameters of the introduced network. Thus, we obtain highly efficient and effective online learning algorithms based on the LSTM network. Thanks to our generic approach, we also provide and simulate an energy-efficient gated recurrent unit (GRU) network in our experiments. Through an extensive set of experiments, we illustrate significant performance gains and complexity reductions achieved by the introduced algorithms with respect to the conventional methods.

    View details for DOI 10.1109/TNNLS.2019.2935796

    View details for Web of Science ID 000557365700035

    View details for PubMedID 31536023

  • Unsupervised Anomaly Detection With LSTM Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Ergen, T., Kozat, S. 2020; 31 (8): 3127–41

    Abstract

    We investigate anomaly detection in an unsupervised framework and introduce long short-term memory (LSTM) neural network-based algorithms. In particular, given variable length data sequences, we first pass these sequences through our LSTM-based structure and obtain fixed-length sequences. We then find a decision function for our anomaly detectors based on the one-class support vector machines (OC-SVMs) and support vector data description (SVDD) algorithms. As the first time in the literature, we jointly train and optimize the parameters of the LSTM architecture and the OC-SVM (or SVDD) algorithm using highly effective gradient and quadratic programming-based training methods. To apply the gradient-based training method, we modify the original objective criteria of the OC-SVM and SVDD algorithms, where we prove the convergence of the modified objective criteria to the original criteria. We also provide extensions of our unsupervised formulation to the semisupervised and fully supervised frameworks. Thus, we obtain anomaly detection algorithms that can process variable length data sequences while providing high performance, especially for time series data. Our approach is generic so that we also apply this approach to the gated recurrent unit (GRU) architecture by directly replacing our LSTM-based structure with the GRU-based structure. In our experiments, we illustrate significant performance gains achieved by our algorithms with respect to the conventional methods.

    View details for DOI 10.1109/TNNLS.2019.2935975

    View details for Web of Science ID 000557365700036

    View details for PubMedID 31536024

  • A novel distributed anomaly detection algorithm based on support vector machines DIGITAL SIGNAL PROCESSING Ergen, T., Kozat, S. S. 2020; 99
  • Online Training of LSTM Networks in Distributed Systems for Variable Length Data Sequences IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Ergen, T., Kozat, S. S. 2018; 29 (10): 5159–65

    Abstract

    In this brief, we investigate online training of long short term memory (LSTM) architectures in a distributed network of nodes, where each node employs an LSTM-based structure for online regression. In particular, each node sequentially receives a variable length data sequence with its label and can only exchange information with its neighbors to train the LSTM architecture. We first provide a generic LSTM-based regression structure for each node. In order to train this structure, we put the LSTM equations in a nonlinear state-space form for each node and then introduce a highly effective and efficient distributed particle filtering (DPF)-based training algorithm. We also introduce a distributed extended Kalman filtering-based training algorithm for comparison. Here, our DPF-based training algorithm guarantees convergence to the performance of the optimal LSTM coefficients in the mean square error sense under certain conditions. We achieve this performance with communication and computational complexity in the order of the first-order gradient-based methods. Through both simulated and real-life examples, we illustrate significant performance improvements with respect to the state-of-the-art methods.

    View details for DOI 10.1109/TNNLS.2017.2770179

    View details for Web of Science ID 000445351300049

    View details for PubMedID 29990241

  • Efficient Online Learning Algorithms Based on LSTM Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Ergen, T., Kozat, S. 2018; 29 (8): 3772–83

    Abstract

    We investigate online nonlinear regression and introduce novel regression structures based on the long short term memory (LSTM) networks. For the introduced structures, we also provide highly efficient and effective online training methods. To train these novel LSTM-based structures, we put the underlying architecture in a state space form and introduce highly efficient and effective particle filtering (PF)-based updates. We also provide stochastic gradient descent and extended Kalman filter-based updates. Our PF-based training method guarantees convergence to the optimal parameter estimation in the mean square error sense provided that we have a sufficient number of particles and satisfy certain technical conditions. More importantly, we achieve this performance with a computational complexity in the order of the first-order gradient-based methods by controlling the number of particles. Since our approach is generic, we also introduce a gated recurrent unit (GRU)-based approach by directly replacing the LSTM architecture with the GRU architecture, where we demonstrate the superiority of our LSTM-based approach in the sequential prediction task via different real life data sets. In addition, the experimental results illustrate significant performance improvements achieved by the introduced algorithms with respect to the conventional methods over several different benchmark real life data sets.

    View details for DOI 10.1109/TNNLS.2017.2741598

    View details for Web of Science ID 000439627700038

    View details for PubMedID 28920911