BloombergGPT: Bloomberg Develops Domain-Specific Language Model for Financial NLP Tasks


Published on:

Bloomberg, a leading financial data company, has developed a large-scale language model (LLM) called BloombergGPT, specifically designed for natural language processing (NLP) tasks in the financial industry. 

With its complex terminology and unique language requirements, the financial sector has long required a domain-specific language model, and BloombergGPT represents a significant step in its development and application.

According to Shawn Edwards, Bloomberg’s Chief Technology Officer, the development of BloombergGPT marks the first LLM dedicated to the financial sector. The model is designed to enhance existing financial NLP tasks, including sentiment analysis, named entity recognition, and news classification, while bringing together the massive quantity of data available on the Bloomberg Terminal, unlocking the full potential of AI in the financial sector.

To create the BloombergGPT model, the company’s ML product and research team relied on its 40 years of experience collecting and maintaining financial linguistic documents to develop domain-specific datasets. They created a comprehensive 363 billion token dataset of English financial documents and a public dataset of 345 billion tokens, resulting in a large training corpus of over 700 billion tokens. The team used a portion of this corpus to train a decoder-only causal language model with 50 billion parameters.

BloombergGPT has been tested against popular benchmarks consisting of finance-specific NLP benchmarks, Bloomberg’s internal benchmarks, and broad categories of general NLP tasks. The model has demonstrated superior performance to similarly sized open models on financial tasks and performed equally well or better on general NLP benchmarks.

According to Gideon Mann, director of Bloomberg’s ML product and research team, the quality of machine learning and natural language processing models depends on the data given to them. Thanks to Bloomberg’s extensive collection of financial documents, they have carefully created large, clean, domain-specific datasets for training LLMs that are best suited for financial use cases.

Vishak is a skilled Editor-in-chief at Code and Hack with a passion for AI and coding. He has a deep understanding of the latest trends and advancements in the fields of AI and Coding. He creates engaging and informative content on various topics related to AI, including machine learning, natural language processing, and coding. He stays up to date with the latest news and breakthroughs in these areas and delivers insightful articles and blog posts that help his readers stay informed and engaged.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here