Austrian Research Company NXAI releases its new xLSTM 7B model, once again highlighting the efficiency advantage and performance of the xLSTM architecture.
The new pre-trained model plays in the premier league of 7B models even without fine-tuning. It evolved into the best non-transformer large language model as demonstrated by its next word prediction performance and evaluations on standard benchmarks like Massive Multitask Language Understanding (MMLU). However, the main feature of the 7B xLSTM model is its speed. It is much more efficient than other large language models, that is, it generates results with considerably less compute resources and much faster. Consequently, the 7B xLSTM is the champion with respect to speed and energy efficiency.
The xLSTM architecture is available on Github at NXAI and a pre-trained model is available on Huggingface for fine-tuning for the developer communities.
"Our scaling predictions from our paper have come true. We have the best large language model based on recurrent neural networks (RNNs). We have the most energy-efficient model in the LLM world with fast inference time", explains Prof. Dr. Sepp Hochreiter, Chief Researcher at NXAI and inventor of xLSTM.
What's more, the developers at the Linz-based company are making the dataset (an open-source data set) for training, the scripts for training and evaluation, the code, the weights, the training and inference kernels, and, in addition to Pytorch, the JAX implementations open-source.
"We are pleased that many people will use the advantages of our architecture in their products and will be able to build their own applications based on xLSTM. In particular, AI applications at the edge and embedded benefit massively from our efficiency and speed. However, we are also concerned about trust in AI technology. Our approach finally creates transparency and ensures sovereignty for the developer", says Hochreiter, explaining the open source step.
"We are committed to open source and scientific freedom, which is why every researcher in the world can use the xLSTM7 model for their research. xLSTM B7 is a model from Europe for the world. But NXAI's focus is on the many medium-sized companies in Europe. We want to enable SMEs to use AI in products and services in order to generate added value", Hochreiter explains.
Since the initial publication of the xLSTM architecture in spring of this year, many developers have already presented solutions using the approach. xLSTM is particularly in demand in the industrial sector. "I see xLSTM in the field of robotics, because it is much faster and more memory efficient than transformers in inference and more energy efficient", explains Hochreiter.
A few days ago, a research paper recommended a large recurrent action model for robotics based on xLSTM. Industry experts report that the architecture also comes into its own in the automotive industry due to its longer and variable memory. The same applies to medical technology or the mobility industry. "In addition, xLSTM is already used for multivariate time series forecasting and shows superior performance in long-term forecasting compared to other methods", reports Hochreiter. xLSTM is more than a LLM.
Background: In contrast to the transformer architecture, the xLSTM calculations only increase linearly with the text or sequence length and require less computing power during operation. This is a major advantage, since complex tasks require much more text for both the task description and the solution and time series may have long term dependencies. Fortunately, xLSTM makes industrial applications possible for which transformer models are too slow.