OR/MS Today - June 2008|
Math & Music
Math & Music: The Perfect Match
Operations research has much to offer in terms of solving problems in music composition, analysis and performance.
By Elaine Chew
Operations research is a field that prides itself in its versatility in the mathematical modeling of complex problems to find optimal or feasible solutions. It should come as no surprise that O.R. has much to offer in terms of solving problems in music composition, analysis and performance. Operations researchers have tackled problems in areas ranging from airline yield management and computational finance, to computational biology and radiation oncology, to computational linguistics, and in eclectic applications such as diet formulation, sports strategies and chicken dominance behavior, so why not music?
I am an operations researcher and a musician. Over the past 10 years, I have devoted much time to exploring and building a career at the interface between music and operations research. At the University of Southern California, I founded the Music Computation and Cognition (MuCoaCo) Laboratory (www-rcf.usc.edu/~mucoaco) and direct research in music and computing. To bring music applications to the attention of the O.R. community, I have organized invited clusters at INFORMS meetings: "O.R. in the Arts: Applications in Music" (www-rcf.usc.edu/~echew/INFORMS/cluster.html) at the 2003 INFORMS meeting in Atlanta and, "Music, Computation and Artificial Intelligence" (www-rcf.usc.edu/~echew/INFORMS/ics2005.html) at the 2005 INFORMS Computing Society meeting in Annapolis, Md. The summer 2006 issue (Vol. 18, No. 34) of the INFORMS Journal on Computing featured a special cluster of papers on "Computation in Music" that I edited with Roger Dannenberg, Joel Sokol and Mark Steedman.
The goal of this article is to provide broader perspectives on the growth and maturation of music and computing as a discipline, resources to learn more about the field, and some ways in which operations research impacts and can influence this rapidly expanding field. My objective is to show that music its analysis, composition and performance presents rich application areas for O.R. techniques.
Any attempt to give a comprehensive overview of the field in a few pages must necessarily be doomed to failure, due to the vastness of the domain. I will provide selected examples, focusing on a few research projects at the MuCoaCo Laboratory and some related work of musical problems that can be framed and solved mathematically and computationally. Many of the techniques will be familiar to the O.R. community and some will borrow from other computing fields. These examples will feature the modeling of music itself, the analysis and description of its structures and the manipulation of these structures in composition and performance.
Beyond such industrial interests in mathematical and computational modeling of music, such approaches to solving problems in music analysis and in music making have their own scientific and intellectual merit as a natural progression in the evolution of the disciplines of musicology, music theory, performance, composition and improvisation. Formal models of human capabilities in creating, analyzing and reproducing music serve to further knowledge in human perception and cognition, and advance the state of the art in psychology and neuroscience. By modeling music making and analysis, we gain a deeper understanding of the levels and kinds of human creativity engaged by these activities.
The rich array of problems in music analysis, generation (composition/improvisation) and rendering (expressive performance) present new and familiar challenges to mathematical and computational modeling and analytical techniques, the bread and butter of operations researchers, in a creative and vast domain.
The intensifying activities in mathematics/computation and music are best reflected in the proliferations of conferences founded in only the past few years. A partial listing of some of the main conferences, together with the years they were founded and URL of their earliest available Web sites, is shown in Table 1. As in other fast moving fields in computing applications, publications in this new field most frequently occur in the peer-reviewed proceedings of the conferences, which are archived in online libraries for ready access.
One cannot begin research in any field without appropriate contextual knowledge. It can take years to familiarize oneself with the state of the art, even in a relatively new field. To give graduate science and engineering students opportunities to acquaint themselves with mathematical and computational modeling of music, and to try their hand at small-scale projects that could potentially grow into larger-scale or into thesis research projects, I have designed a three-semester course on topics in engineering approaches to music cognition (www-scf.usc.edu/~ise575). Each course in the sequence focuses on a topic in one of three areas: music analysis, performance and composition/improvisation.
The course allows students to learn by example, by surveying and presenting literature on current research in the field, and to learn by doing, by creating (designing and implementing) their own research projects in the topic addressed in class. Since the inception of the class in 2003, all course material week-by-week syllabi, better paper reports, presentations and student project descriptions and demonstration software have been posted online as open courseware and serve as a resource to the community. Figure 1 is an example of a week-by-week syllabus from the 2006 class, which focused on computational modeling of expressive performance.
While the courses provide a broad and structured introduction to topics in music and computing, and the reader is urged to check it out for an overview of research in the field, the next sections provide some concrete examples of mathematical and computational modeling of problems in music.
Perhaps more than most time-based or sequential data, music information possesses a high degree of structure, symmetry and invariance (Shepard, 1982). Tonal music, which refers to almost all of the music that we hear, consists of collections of sequences, or sequences of collections, of tones, also called pitches. The most common way to represent the fundamental frequencies of these tones is on a logarithmic scale as on a piano keyboard. However, two pitches next to each other on this logarithmic scale can sound jarring when sounded simultaneously, while pitches farther apart can sound more pleasing. A number of representations have been proposed in which proximity mirrors perceived closeness.
An example of such a model is the spiral array (Chew, 2000). Like Euler's (Cohn, 1997) tonnetz and Longuet-Higgins' (1962ab) harmonic network, the spiral array arranges pitch classes, classes of pitches whose frequencies are related by a factor of a power of two, in a lattice so that neighbors along one axis are related by 2:3 frequency ratios, and neighbors along a second axis are related by 4:5 frequency ratios. Unlike planar and network models of musical entities, the spiral array wraps the plane into its three dimensional configuration to exploit its continuous interior space, and uses the convexity of tonal structures, such as chords and keys, to define spatial representations of these objects in the interior.
In tonal music, the key refers to the predominant pitch set used in the music and is identified by what is perceived to be the most stable tone. An early key finding algorithm by Longuet-Higgins and Steedman (1971) used shape matching on the harmonic network to determine the key (Figure 2). Inspired by Longuet-Higgins and Steedman's method and by interior point approaches to linear optimization, the center of effect generator (CEG) algorithm by Chew (2000) uses the three-dimensional configuration of the harmonic network and the interior space to track the evolving tonal context. In the CEG method, a piece of music or its melody generates a sequence of centers of effect that traces a path in the interior of the array of pitches. The key at any given point in time is computed by a nearest neighbor search for the closest major or minor key representation on the respective major/minor key helices. By moving from the lattice to the interior space, the model becomes more robust to noise and finds the key in fewer note events.
The spiral array model and its associated tonal analysis algorithms (Figure 3) have been implemented in the MuSA.RT (Music on the Spiral Array Real-Time) system for interactive tonal analysis and visualization (Chew & François, 2005). Any operations researcher who has implemented a mathematical algorithm for computational use can attest that it is one thing to design an algorithm of low computational complexity and another to implement it so that it runs in real time. MuSA.RT is built using François' Software Architecture for Immersipresence framework (2004), which allows for the efficient processing of multiple concurrent data streams. MuSA.RT has been featured in numerous presentations, including The Mathematics in Music concert-conversation, which made its debut in Los Angeles, in Victoria, B.C., and in Singapore in 2007. Venues in 2008 include North Carolina State University and Massachusetts Institute of Technology.
Apart from the pitch structures described in the previous paragraphs, music also possesses time structures. When listening to music, humans are quickly able to pick out the beat and to tap along with it. The patterns of long and short durations along the grid of the beat produce rhythm. Periodicity in the accent patterns lead to the establishing of meter. Using the mathematical model for metrical analysis, developed at the Multimedia Laboratory headed by Guerino Mazzola (now at the University of Minnesota), and described and advanced by Anja Volk (2002), Chew, Volk and Lee (2005) proposed a method for classifying dance music by analyzing their metrical patterns, providing an alternative to the inter-onset-interval distribution and autocorrelation methods proposed by Dixon, Pampalk and Widmer (2003).
Apart from the issues of determining tonal and rhythmic contexts, which can be thought of as finding vertical structures, there are equally interesting and challenging questions with regard to the determining of horizontal structures, such as melody and voice. A number of models have been proposed for finding motivic patterns (an example of a motive is the opening four notes of Beethoven's Fifth) and for separating voices (independent melodic threads superimposed in time). Their sequential nature means that techniques inspired by DNA sequence analysis and tree structure approaches lend themselves readily to solving problems of motivic pattern discovery see, for example, Conklin and Anagnostopoulou (2006) and Lartillot (2005). In the same strain of computational biology-inspired approaches, we proposed a contig mapping approach to voice separation by first fragmenting, then assembling, the voices that make up a polyphonic (multi-voice) composition (Chew and Wu, 2004).
Taking advantage of the fact that music bears similarities with language, Steedman (1996) proposed a formal grammar for generating jazz chord sequences. Computational linguistics approaches typically require large corpora of annotated data for learning, which poses numerous challenges for music data. In efforts spearheaded by Ching-Hua Chuan and Reid Swanson, we are seeking ways to generate style-specific accompaniment given only a few examples (Chuan & Chew, 2007) and to segment melodies using unsupervised techniques so as to assist in computational creativity projects in music (Swanson, Chew and Gordon, 2008).
Most systems for generating music originate in some Markov model, as exemplified by Pachet's Continuator (2003), which builds a variable order Markov model from an input sequence for generating new sequences. An alternate approach, using Factor Oracles, is proposed by Assayag and Dubnov (2004) for their OMax human-machine improvisation system.
Inspired by OMax, Alexandre François led the development of MIMI Multimodal Interaction in Musical Improvisation that centers on a performer-centric interaction environment. MIMI's visual interface (François, Chew and Thurmond, 2007), achieved through collaborative design, gives the user information about the current state of the system, including the region from which musical material was being sampled for recombination, 10 seconds' lead time before the sounding of the new material and an equal amount of time to review the improvisation and to plan future strategies during the performance. MIMI received its international debut at the Musical Instrument Museum in Berlin in May 2007 at the International Conference on Mathematics and Computation in Music.
In the Expression Synthesis Project (Chew et. al., 2005, 2006) we employ an analysis-by-synthesis approach to study expressive performance. Taking the cue from motion capture for the generation of realistic animation, ESP uses the metaphor of driving for expressive performance, and a driving (wheel and pedals) interface to map car motion to tempo (speed) and amplitude in the rendering of an expressionless piece. The road is designed to guide expression: bends in the road encourage slowdowns, while straight sections promote speedups. Buttons on the wheel allow the user to change the articulation by shortening or lengthening the notes. A virtual radius mapping strategy (Liu et. al.) ensures tempo smoothness.
A goal of ESP is to make high-level decision-making in expressive performance widely accessible to both experts and novices alike. The system was exhibited at the University of Southern California Festival 125 Pavilion over three days in October 2005. It was showcased again at the National University of Singapore Techno-Arts Festival in March 2007.
One cannot discuss expression without considering its effect on the emotion perceived or experienced by the listener. In Parke, Chew and Kyriakakis (2007ab), we present quantitative modeling, visualization and regression analyses of the emotion perceived in film with and without music, showing that perceived emotion in film with music can be reliably predicted by perceived emotion in film alone and in music alone. In Mosst's Masters thesis (2006), he models and visualizes individuals' emotion responses to music using features extracted from the music's audio. An interesting finding is that humans' responses to music differ widely and require individualized models for accurate prediction.
Elaine Chew (email@example.com) is the Edward, Frances and Shirley B. Daniels Fellow at the Radcliffe Institute for Advanced Study at Harvard University in Cambridge, Mass. She is currently on sabbatical from the University of Southern California Viterbi School of Engineering in Los Angeles, where she is associate professor of industrial and systems engineering, and of electrical engineering.
OR/MS Today copyright © 2008 by the Institute for Operations Research and the Management Sciences. All rights reserved.
Lionheart Publishing, Inc.
506 Roswell Rd., Suite 220, Marietta, GA 30060 USA
Phone: 770-431-0867 | Fax: 770-432-6969
Web Site © Copyright 2008 by Lionheart Publishing, Inc. All rights reserved.