Innovation
as standard
Learn how data and insight is unlocking hidden value in music catalogs. Subscribe to get early access to content.
Building an on the fly LLM parser for music royalty statements
01 May, 2026, by Tom Mullen
We get asked a lot about where, when and how to use AI (LLMs).
We covered in a recent post some of the areas that music catalog investment funds can best benefit from the technology, but I wanted to bring that to life a little more.
Almost all funds or opcos investing in and managing music assets have to process royalty statements. They come in all different shapes and sizes from all different places.
For the 80% they're consistent formats from the same, small number of sources. However the 20% can often be a cause of real pain and when we get down to the last 1-2% we're talking obscure formats that you've never seen before.
Ingesting an entire data room full of statements to find out a small few have jammed the process can be a huge pain and we're either faced with manual entry, or relying on some elaborate techniques to guess what's what.
This is an excellent use case where we can marry traditional technology with a beautiful LLM.
Why the old stuff?
LLMs are hugely powerful, but they can be quite slow and very expensive in comparison to other more established approaches depending on what you're trying to achieve.
Parsing a document (PDF, excel, word) is a great example of where multiple technologies are better than one.
Let's take PDF parsing, there are loads of tried and tested tools that will consume a PDF and spit out text and they do it in a fraction of the time and even less compute than an LLM would.
The output from that initial parse however, makes a fantastic input for an LLM. Text is significantly smaller in size and the LLM doesn't have to waste cycles on parsing the document to get to the important part - understanding what's in it.
When you couple them together, it makes a powerful and fast combination which is very well suited to building a new parser for an obscure royalty statement on the fly.
Quick demo
We set up a quick demo of where we have done this before. The process is simple; a new royalty statement is uploaded and our existing parsers don't recognize it as a known format.
We choose to create a new parser for it in the workflow and use a combination of PDF to text technology (poppler in this case) and pass the text to an LLM (in this case an opensource, locally run one for better privacy controls) to work out which fields relate to those we are interested in.
The outcome is a pretty nifty way to deal with the unknown and get back to the job in hand - working out the value!
Dialing in a workflow
In order to benefit from this type of approach, you need a well thought through workflow with a human in the process. LLMs, Machine Learning models and other AI based technologies are improving all the time, but for many tasks a thoughtful workflow with a touch of human input can be the most effective approach (by a significant margin).
Understanding where those inference bottlenecks may occur and where a human can easily shortcut them is super important and can save a lot of expensive and sometimes unnecessary engineering time.
There are lots of examples in the music catalog acquisition and management domain where this is true - parsing royalty statements and understanding "what is a song" for valuation purposes are two that we see regularly.
They can mean very different things to different people and solving for them with a fully automated process can be a challenging and brave endeavor, a perilous journey that need not necessarily be taken.
LLMs are akin to magic, but there are lots of other tried and tested tools in the box (including the amazing humans at the keyboard) - so use them wisely.
Related
Rethinking music fund operations with agentic AI
06 Mar, 2026, by Sam Morey
Introducing the Catalog Maturity Curve: A new benchmark for music investment funds
21 Jan, 2026, by Tom Mullen