Logo of Koç Üniversitesi

Zafer Doğan - Random feature model on reducing the attention cost in transformers


The proposed Random Feature Attention (RFA) introduces a linear time and space attention mechanism to address the efficiency challenges associated with conventional softmax attention in transformers. By leveraging random feature methods to approximate the softmax function, RFA offers a more scalable alternative for processing long sequences. Here, our goal is to characterize the training and the generalization performance of this model under some universality constraints.