Creating a data base of musical scores

Ichiro Fujinaga
Chair of the Music Technology Area of the Schulich School of Music at McGill University

Ichiro Fujinaga likens his life’s work and goal to something he nicknames “Google Scores.”. Like Google Books, his project aims to establish a huge database of musical scores that will be freely accessible to anyone who uses the search engine.

“What I’ve been working on for the last 30 years is optical music recognition,” says the Chair of the Music Technology Area of the Schulich School of Music at McGill University in Montreal. “Just like optical character recognition, I’m basically trying to teach computers how to read music.”

Fujinaga laughs when he says “it’s taking too long,” a comment that takes on new meaning when you realize just how he happened upon this career in the first place.

Fujinaga’s parents immigrated to Canada from Japan when he was 12 years old. At the time, he was studying piano and basic music theory and ended up playing in a rock band in high school. Inspired by music, and obligated to get a degree because his dad was a professor, he decided he wanted to study music. But he couldn’t pass the music theory exam in Canada because he had learned all of his music theory in Japanese and it wasn’t translating well.

“Given that, I took a year off and worked at IBM,” Fujinaga explains. “I was also taking private music theory lessons and somewhere in that year, someone convinced me I should pursue a math degree. So I did that. Then I went back into music.”

When he was studying music theory in the 1980s, he was being asked to harmonize chords with certain melodies. While he was capable of doing it himself, he decided it was too much work — especially since he figured he could just get a computer to do it.

“I was going to write a program,” he says. “But it turns out it was difficult. During that time, I decided that if computers have a lot of data and, using them, you should be able to figure out the right way to harmonize the melodies. To do that, I had to get lots of music into the computer. And that was the start of my master’s degree project, which I started in the 1990s and which essentially continues to this day.”

Ichiro Fujinaga, McGill University

The project has grown in scope and size, but he admits the initial idea was “that I was being lazy and wanted the computer to do the work.” Talking about his work today, the modest researcher laughs when he says: “Computers are pretty stupid and I’m not smart enough to teach them.”

Joking aside, his project — called SIMSSA or the Single Interface for Music Score Search and Analysis — aims to help musicologists search for the music they’re listening to, or for music for which they don’t have the score. Alternatively, they may want to do big-data studies across different eras of music.

“Our main goal is to allow musicologists to be able to search and then analyze,” he says. “In the digital humanities, researchers do the same thing, but they’re looking at, for example, a large corpus of text. They might be searching for some similarities in 19th-century female writers, for example. They can do that because most of the literature is now scanned and available through optical character recognition. This project would be similar, but it might be called Google Scores.”

Essentially, users will be able to do a full search of lots of music or find a melody or a sequence of chords. But before they can do that, his team needs to establish a data set of music, something that’s currently not available.

The project concentrates on Medieval Renaissance music because he likes the music, but also, and perhaps more importantly, because it’s no longer bound by copyright as it dates between the ninth and 16th centuries.

“Ultimately, we want to be able to make this dataset available worldwide through any browser,” he says.

The challenges are many, however. For one, some of the Medieval scores he has don’t have staff lines so they’re going back in time to figure what the symbols — which are different from modern Western music notations — mean, and what the staffless notations sounded like.

“Once we have lots of data from Europe, we’ll know how music was transmitted, and how it was spread over the years if we can trace the lineage of the same text with similar melodies.”

Fujinaga uses Compute Canada and Calcul Québec’s resources to do his work. Most of the musical score transcripts are on parchment, some of which has been damaged over the decades and centuries, causing the ink to fade and making it harder to read.

“You have to figure out where the text is and where the notes are,” he explains and adds that they use cloud computing to accomplish this. “That’s one of the steps we use Compute Canada resources for. It’s called document analysis.”

While he could do his work without this resource, it would cost a lot more and would be slower, he says.

“In that sense, it’s been invaluable.”

Read more research stories from Calcul Québec and Compute Canada