Abstract:
Financial text mining is becoming increasingly important as the number of financial documents rapidly grows. With the progress in machine learning, extracting valuable information from financial literature has gained attention among researchers, and deep learning has boosted the development of effective financial text mining models. However, as deep learning models require a large amount of labeled training data, applying deep learning to financial text mining is often unsuccessful due to the lack of training data in financial fields. Recent researches on training contextualized language representation models on text corpora shed light on the possibility of leveraging a large number of unlabeled financial text corpora. We introduce F-BERT (BERT for financial text mining), which is a domain specific language representation model pre-trained on large-scale financial corpora. Based on the BERT architecture, F-BERT effectively transfers the knowledge from a large amount of financial texts to financial text mining models with minimal task-specific architecture modifications. The results show that our F-BERT outperforms most current state-of-the-art models, which demonstrates the effectiveness and robustness of the proposed F-BERT.