Introducing Music Space

Dr Jacques Steyn
Monash SA

In this presentation it is argued that the Symbolic Music Representation's logical domain operates only within the notation system of music, and that there is a need for a more abstract or higher-level logical domain, similar to that of SMDL. I call this domain Music Space, in which music objects and events are described on a 3-dimensional coordinate grid. The axes have various possible systems of reference points, ranging from none at all (to allow for unmeasured scores), to time, duration, frequency, and pitch. Music Space allows for the music objects and events within this coordinate system to be rendered graphically in any kind of notation system, including different typographical styles, as well as audibly via some musical instruction system, such as MIDI.

In this paper the XML-specific aspects of a possible SMR system are addressed. No details about rendering machines are presented, except to point out where they are required for interpreting and rendering the base XML code. The focus is thus on an XML-based music markup language.

I would like to begin with the big picture of music. Although the mandate of the present SMR work group is to specifically investigate music notation within the MPEG framework, there are many aspects of music that should be kept in mind. A couple of years from now MPEG may wish to address other aspects of music, and if we do not consider the bigger issues, what we propose now may be a handicap in future.

Thus far in the history of MPEG, the sections of the standards that deal with audio music revolved around what I will call canned music, that is, the focus is on the final event that is heard. The emphasis was on recorded music, which I call external aspects of music. In a manner of speaking, the focus was on sound waves.

The SMR project focuses on internal aspects of music. The focus is not on the complete and final output of audible music, but on more intrinsic details. In other words, the properties of a music event will be described by SMR, although these properties fosuc on a specific subset of the totality of music, namely notation. The totality of internal aspects of music is very complex and covers much more ground than the mandate of the MPEG SMR workgroup. Despite focusing on only a subset of music, I maintain that the totality of music needs to be kept in mind. Although music notation is merely a symbolic expression of a subsystem of the complex music system, developing a markup language for this subset will eventually need to fit into the bigger picture.

SMR's position within a multimedia document

The attempt by the SMR work group to create an XML-based music notation system, is really an attempt to create part of a multimedia document. Within a networked environment, the basic requirements of a multimedia document need to incorporate the following features (adapted from Boll et al - 1999; and the W3C's CSS and XML Recommendations):

temporal model: multimedia documents unfold during the course of time. Different media objects may occur at different time intervals and the application needs to know when to trigger which object.
spatial model: different visual objects need to be positioned in relation to one another within the display unit (which may be paper, CRT, plasma, etc) and within its available dimensions. Within the available space objects may be positioned relatively or absolutely in terms of some reference point, which is typically dependent on the nesting of objects
fragmentation: the user must be in control on selecting which part(s) of a document she wants to access. The user must be able to fragment a document according to will: in terms of a text document, the user may want to view only a small section of the entire document; in video, the user may want to view only a specific scene; in music, the user may want to listen to a certain phrase, or even instrument within an orchestra.
user preferences: the user must be in control of stylistic features. For text the font-family, color schemes, etc must be possible. For sound controls such as volume and effects must be possible, but on a more atomic level, the user must be able to select instruments, tempo, performance genre, and so on of a composition.
scalability: it must be possible for a user to access a multimedia document on the device of choice, whether I-TV, PC, handheld device or whatever.

The Music Space I introduce here, accommodates each of these requirements.

Underlying any approach to the internal or intrinsic aspects of music would be some or other definition of this thing we call "music", and its components. On the most abstract level, and related to what SMDL calls the analytical domain, there are philosophical issues about music as an object of scientific analysis. Through the centuries there have been various theories, ranging from the mystical Greek philosophers' notion of music of the spheres to 20th century reduction of music to physical sound waves. No matter what kind of philosophical approach we have for defining this object, our ideas need to be expressed in some written form. But let me not linger at a philosophical discussion my personal view is more along the lines of process philosophy which would regard music not as an object, but as an emerging event but I will be very pragmatic. My approach to the music object will be biased toward the physics of music. A notational system would merely be a conventional graphic representation of music sound.

SMDL and SMR

SMDL distinguishes between four domains for a language that would mark the components of music. Note that the SMDL notion of Logical Domain differs from the definition of the term as implemented by the MPEG SMR work group.

Logical domain: this domain covers basic music content, and includes aspects such as pitches, rhythms, harmonies, dynamics, tempi, articulations, accents, etc.
Gestural domain: this is an expression of the logical domain during a performance
Visual domain: this is the visual expression of scores
Analytical domain: this domain addresses theoretical analyses of the other three domains.

The logical domain of the MPEG SMR work group focuses on sequences of symbolic music representation events and their relationships, with the intention to be a kind of meta-notation. It does not apply to such a broad scope as SMDL's definition. For the sake of clarity, in this paper I will call the SMR logical domain rather by the name logical notation domain to avoid confusion with the SMDL definition.

The scope of MPEG SMR falls within the SMDL visual domain. It is my view that although the SMR work group focuses on the visual domain, the other domains should nevertheless be considered, even if not addressed explicitly by the eventual system of markup. The reason is simple: we should not paint ourselves into a restrictive corner as future MPEG projects may consider other parts of the totality of music. The SMDL logical domain is much more universal and generic, and could be a good starting point. Developing systems for expressing that SMDL Logical domain visually would be relatively easy if we have a solid basis to work from.

Time and timeless systems

One of the most challenging aspects of SMR is to develop a system that can handle both time-based and "time-less" (or unmeasured or unbarred) music. All other multimedia systems that have been developed thus far accept some concept of time (however abstractly) to serve as basic reference timeline on which multimedia objects are placed, and within which events take place.

Here is a brief summary of how various systems implement time in their temporal models, following Boll et al (1999). These authors distinguish between three types of temporal models:

point-based: each media element in the multimedia document is referenced by a point in time. A point has three possible relationships to other points. It occurs before, after or at the same time (i.e. equal) to another point.
interval-based: intervals of time are used as basis. However, time is not always known, and variations on the classic interval-based system have been developed to accommodate open-time intervals
event-based: each multimedia event is referenced to an action located within the event. An action is triggered when its location within the event is reached

MHEG-5 has an event-based temporal model. These events are linked with actions.

HyTime uses a Finite-Coordinate-Space (FCS) module which consists of n-dimensional coordinate spaces, which may include time dimensions. These events are placed at points within the FCS, and are thus point-based. HyTime distinguishes between musical time and real time. Real time is concrete time and typically clock-based. Musical time is an abstract time system where units are relative to one another. It is like a fragment of CWN notation with relative note values, without knowing what the absolute time values are. Both these systems can be applied and synchronized with one another.

SMIL is an interval-based application. Multimedia elements are scheduled either sequentially (or in my terms, horizontally) or in parallel (for simultaneous events, in my terms, vertically or along the z-axis).

Music Space

My present view is to express the basic module of an XML-based music markup language, which relates to SMDL's Logical Domain, in terms of a Cartesian coordinate system, which I will call Music Space in order to identify it as a specific application of a Cartesian system. There are several reasons for this, which should become evident as I proceed. The most important reason is perhaps that my approach is very pragmatic. We could design a totally novel system with new vocabulary and thereby ignore centuries of music concepts, or we could use a kind of best practice approach, which is often done in software development. I will follow this second approach and assume that CWN, despite all its flaws, does a pretty damn good job. But although my terminology may show a bias towards western music concepts, there is not a one-to-one correlation between the way I use the terms within Music Space, and traditional CWN, which will also become (hopefully) evident as I proceed.

SMDL's Logical Domain is based on the HyTime Finite Coordinate Space (FCS), which is expressed on a single axis, measure in virtual time. The Cartesian coordinate system I have in mind has three dimensions, expressed on three different axes. As argued before, I view music more or less along the same lines as the SMDL Logical Domain. My definition is derived from the physics of sound, and the terms I use are biased toward the terms of western music.

A definition of music depending on the physics of sound, would regard music as a function of frequency and time. In Music Space time would be on the x-axis, while frequency would be on the y-axis, and the bias toward western music concepts should be evident. Common Western Music Notation can also be viewed as expressing music events on these two axes, and in fact, I would argue, from the physics of sound point of view, there can be no music without time and frequency. By its very definition, the cultural activity we call music has some or other form of pitch and duration.

Although this approach of Music Space on the surface may seem to restrict the expression of kinds of music such as time-less music, of which James Ingram is a proponent, it is not, as I will show later.

As I have also pointed out previously, music is also characterized by two kinds of simultaneous sets of events. A specific instrument, such as a piano, can be used to play different notes simultaneously. And of course, different instruments can produce different sounds that are expressed simultaneously in time. As the sounds of all these instruments are expressed along the time axis, they will need to be represented on the x-axis.

But I am also a pragmatist. And the score of a symphony consisting of many instruments will become very much illegible if all the instrument's notes are expressed visually on the same x-axis. The western notation tradition assigns a different score part to the different instruments, and synchronize the parts by using a bar-structure. My approach with Music Space is to use the z-axis for the scores of different instruments. In a manner of speaking, the z-axis adds depth to the x-axis and the y-axis. Different layers on top of the x-axis and y-axis can be expressed by their position on the z-axis within Music Space.

Music Space contains the basic or core objects and events of music, similar to SMDL's Logical Domain. Symbolic Music Notation is concerned with the SMDL Visual Domain. A basic criterion for the success of the Music Space is that it must be translatable not only into the Visual Domain, but into other possible domains as well. It must be possible to translate Music Space into MIDI or other systems as well.

Music Space allows for a user to select modes for reference points on the different axes. For example, for the x-axis, when Time is selected as the base reference, grid reference points can be either point-based, interval-based, or event-based. The coordinate system is thus much more complex than the HyTime FCS as many different reference systems can be used. More than one system can be used simultaneously and they can be synchronized with one another.

Music Space is an abstract system, but for human interface it will obviously need some graphical representation, as illustrated below. However, do not confuse this representation with the Music Space itself. Music objects and events within this space will be marked with XML elements and attributes, using a still different graphic description system, namely human language, specifically in the case of MML, names that are recognizable in English.

In the illustration below there is no significance within Music Space for the red or blue rectangles. They only obtain meaning once a reference system is attached to them. For example, when a time-based system is attached to the x-axis, the meaning of a particular event would be duration in terms of time. When a space-based system is attached, the meaning of a particular event would be in a space measuring system, such as millimeters or picas. To make this more explicit, although the x-axis may be used in the majority of scores to represent time, its inherent property is not time. It has no inherent property. If Time is assigned to it, it obtains time-related attributes; if Space is assigned to it, it obtains space-related attributes; and both may be assigned to it.

This illustration of Music Space is for the events of two objects, which in this case may be instruments, a red (the base instrument) and a blue object. It is difficult to represent this 3D system in the 2D graphic display you see right now.

Music Space is a 3-D system that can be easily used within 3-D computer monitors which are already being built in laboratories, and may be available commercially within the next five years or so. By assigning different XSL style sheets, it will be possible to collapse the z-axis in order to make 2-D representation possible. In terms of a 2-D computer monitor, a user may be able to scroll down cascading levels, as one does in a typical cascading windows interface environment. This is one option. Another is this. By using a different style sheet a user may be possible to arrange different parts in the traditional printed CWN score format, or other systems of representation.

The above is somewhat abstract, so let me proceed with more practical matters.

When implemented, applications of MPEG SMR will have to render the XML either onto paper or onto display media, as well as an instruction system such as MIDI. Some or other pixel-related output must thus be possible. It may be possible to use the MPEG BIFS and XMT. For my MML project I envisage a browser-based system which should demand relatively little programmatic coding to translate the XML language into some or other graphical format, most probably using SVG and XSL. There are already a host of W3C Recommendations which, when eventually implemented by browsers, will demand very little additional programming to translate MML into a browser-friendly expression. Already the content of MML can be displayed in textual format by any standard XML-browser without the need for any plug-in. This displayed content is not in the format of the blobs and sticks of CWN, but the core music objects and events can nevertheless be read in a human-friendly format. If SVG and XSL cannot handle the translation of MML into a graphic format properly, it is envisaged that translating this textual content into music notation glyphs can be achieved with a Java browser plug-in. This approach would thus comply with the SMR requirements of multi-platform expressions. And as the W3C is working on several initiatives, such as Content Selection for Device Independence, to make XML content available on different devices, a browser-based approach makes much sense. This approach would avoid the need to develop complex specialized software applications to handle music in general, and specifically notation.

The Recommendations proposed by the W3C are based on the basic concepts of scalability, interoperability, the sensible rendering of document fragments, and the ability to render documents (or fragments) on any compliant devices. These are also the goals of the MPEG SMR work group. As a large proportion of the W3C's Recommendations have been implemented by various developers, notably the Mozilla open source browser builders, I think it wise to follow suit and focus on the delivery of an XML-based SMR system on compliant browsers. By following this route, quite a vast body of practical implementation tools are already available. If we could successfully adapt the SMR specifications to fully acknowledge and depend on W3C Recommendations, the complexity of the rendering engines that need to be developed for the idiosyncratic requirements of symbolic music notation, would be minimal. Being browser-biased does not prevent software development companies to nevertheless develop their own non-browser based interfaces.

The XML-based markup of Music Space can be translated graphically utilizing various W3C Recommendations. For text the CSS Recommendation could be used. For look-up tables, Schema could be used. For restructuring or re-ordering music events, the DOM and XSLT can be used. For the notation blobs and sticks, SVG could be used. For user input XForms could be used.

The MML application model

Given the above background, this is how I envisage the implementation of an XML-based SMR system. During my discussions with various participants in this group, I realized that if one does not have experience in developing web solutions, its magnificent possibilities are not appreciated. So to simplify matters, I will take you through some steps of how a user might use an implementation, such as I envisage for MML -- and SMR would be a component (or module) within this larger system.

As introductory note, keep in mind the different usage strategies and procedures followed by the different types of users of the proposed system. The software developers of this system will need to provide a lot of standard data. For example, many different types of look-up tables, catering, for example, for the many different possible tuning systems, will need to be set up. A user such as a composer may not want to use any of the provided data structures, but may want to set up his own tuning system, to mention but one possibility. The end-user, or consumer of the product, may merely want to see some or other symbolic expression of a composition. In this case the default of the composition, as specified by the composer, or some or other musicologist, must be available. And if the user has specific needs, such as a blind user, or a user who prefers a different display style, he must be able to implement the changes with minimal effort. Then there is of course the more adventurous user who may want to see how the score of a Gregorian chant is rendered using the setup of Baroque notation, or Schenkerian analysis. All these different options should be relatively easy to achieve with the approach I propose.

For my explanation of the system, I will follow a typical set of procedures that a composer user may follow. It should be possible to implement the steps in different following orders, although for some configurations there may be dependencies, which means that one selection may influence the possible remaining possibilities.

Very briefly, to set up the system the following steps are required. Note that as this system is browser-based, several sub-systems (e.g. selecting the human language and character set to be used) may already have been set-up by the user. Furthermore, a user may skip some steps. SMR and MIDI, for example, have different requirements to translate objects and event from Music Space into their respective formats. Developers would supply many possible sets from which a user merely needs to select. For example, the average Joe Earthling may not be interested in exactly how CWN note names map to frequencies when the tuning system using A440 is used as Reference Note. But such data will nevertheless be available for editing if so required.

Set-up Music Space Parameters
Placing objects in Music Space
Name objects
Select Tuning Reference Note
Select Tuning System
Select Timing Mechanism
Define inherent object and event properties

Note that these steps need not be setup in a specific order, although, due to interdependencies, it may be more practical to do so in real applications.

Step 1. Set-up Music Space Parameters

In this step a user specifies the base values of the Music Space coordinates. The abstract values contained in Music Space will eventually be rendered graphically (either on paper or on a computer or similar screen), or may be translated into a Performance Instruction (PI) system, such as MIDI. Here are some possbilities for the different axes.

x-axis

On the x-axis horizontal relationships between objects and events in Music Space are expressed. The possible value systems are either time or space based. It should also be possible to declare no system at all, in which case objects and events are defined in terms of their neighbours on the same axis, such as Object 1 precedes Object 2.

In terms of Performance Instructions (PI), real time would be of importance. A user needs to specify which timing system should be the basis (e.g. MIDI Time Code, SMPTE, BPM, etc) for a default time.

In terms of the graphic rendering of objects and events, the default spacing between rendered glyphs must be made explicit. Although a user specifies the measurements here, the actual rendering should be done with a style sheet. CSS 2.0 provides various systems of measurement units. Here follows my own representation of the CSS 2.0 Recommendation.

Relative length

This length is a percentage value, with reference to the available space as defined by the containing box of the object

Fixed Relative length

em (or quad-width): this is the value of the computed font size of the glyph
A font-size value in em refers to the font size of the parent element.
ex (x-height) : a font's x-height, equaling the height of the lowercase "x" character in the Latin font family. In music notation some other measurement may be used, such as the height of the blob in typical music notation.
px (pixel): a device-dependent measurement. A pixel is the smallest unit on a viewing device.

Its actual size varies according to screen resolution.

Absolute length

in (inch): unit of measurement (originally a British system of measurement)
1 in = 2.54 cm
cm (centimeter): unit in the decimal measuring system
1 cm = 0.39 in = 10 mm
mm (millimeter): unit in the decimal measuring system
10 mm = 1 cm
pt (point): there are different possible absolute values for point. In CSS2 the value of a point is fixed at 1/72th of an inch.
1 pt = 0.013 inch (ie 1/72th) = 1 pt = 0.357 cm
pc (pica): there are different possible absolute values for pica. The length of a pica depends on the length of a point. In CSS2 the length of a pica is 12 points.
1 pc = 12 pts = 1 pc = 4.28 cm

Music publishers will be able to implement a particular set of proposed measurements, but an end-user should be able to override these settings according to her own preferences. Note that the above settings are for the overall general default values, but these values may be specified for individual events, which means that each object and event can have its own unique properties. For example, the default space value between event representations for print may be 5px, but each event representation can be moved to any other configuration. It is thus possible that not a single event representation in a score actually contains the default values.

The requirements for rendering Music Space in audio or graphically are quite different. In practice, publishers of scores may not be interested in providing mark-up for rendering a score in audio via some or other system, such as MIDI. And a composer may perhaps only be interested in the audio rendition of his work, and not be interested at all in the notational representation of his work. Ideally both systems should be marked. But the absence of one of the descriptive systems is not a requirement for the successful implementation of the other. The system, as I envisage it, would allow setting up the x-axis for either graphic rendering, or for eventual audio rendering, or for both, in which case I suspect they need to be synchronized.

It may, of course, also be possible to develop filters to translate between the systems. Such translation filters are already available in music software sequencers that can translate between MIDI and notation, although the results are typically very frightening to notation purists.

y-axis

On the y-axis vertical relationships between objects in Music Space are expressed. The possible value systems are either frequency or space based. If Time is the base reference, the y-axis would be used for pitch and frequency. If Space is the base structure, again the measurement systems mentioned above for the x-axis would apply to the y-axis. On the y-axis simultaneous events played by the same instrument are indicated.

Ideally, though, as I will explain in more detail below, a system should be developed for music notation glyphs along the lines of CSS font descriptors.

z-axis

On the z-axis simultaneous relationships between different sets of objects and events in Music Space are expressed. This axis represents different layers of x-axes and their y-axes. For example, a piano object (marked on both the x-axis and y-axis) may be on layer 1 of the z-axis, while a tuba may be on layer 2. Each layer has its own x-axis and y-axis. The first (or top) layer will always serve as reference point for the other layers, but as intrinsic values of objects and events can be specified, lower layers may be offset in both time and space with reference to the top (or base) layer.

Step 2. Placing objects in Music Space

The basis of the MML system is the Music Space. Within this space each music object and event is assigned an Identifier (using the SGML and XML ID token attribute) and, if so required, a human-friendly name. The implementation system will automatically assign an ID if the user does not do so. The values of an ID and a Name may be the same. Although this abstract environment is called Music Space, non-music objects may also be placed here, or at least referred to. In this space the most typical objects will be music notes. Other possible generic classes of objects may be actors, dancers, props, steps (as in dance steps), and so on, and each class may take many different members. I will focus on music notes in this discussion.

Step 3. Name objects

Each individual object must be assigned a unique name. Here I will only explain objects within the class notes.

The content of a specific implementation of Music Space would be specified by using an XML Schema. The most typical note names may be the CWN note names, and in my opinion this would be the default if no other system is specified. As this is Schema-driven, all possible music systems can be described. The names of notes within Music Space could be anything, eg. Blip1, Blip2, Blip3, etc, using any human language and any character set. This is in keeping with the design of XML 1.0 element names, which could be written in any Unicode character.

Although Music Space is abstract, and can take any value for name content, I suspect the most commonly used names would be A,B,C,D,E,F and G. As the proposed system is browser-based, the textual interface would be language and character-set specific. This means that a user may set up his browser to use the ISO-10646 or ISO-2022-JP (Japanese) character-set. In HTML 4.0 and XHTML 1.0 it is possible to mix character sets by using the charset attribute. This functionality should be extended to the domain of the music application.

The note names could be listed in a look-up table which may be in the format of a data array, or the Schema List Datatype (in which values are whitespace separated). I suspect a data array would be more efficient. Assigning note names to music space would cover the whole possible range of possible music pitches, but a particular music instance may not utilize all, in fact, it probably never will.

The default setup of note names must most probably range from (A0...G0)...(A9...G9). Many other naming conventions are possible, ranging from Helmholz numbers, to note names in any conceivable language.

Note that the naming of events and objects within Music Space does not need to correlate with the note names of a specific notation system. What I envisage herfe is that the names of events in Music Space may indeed be Blip1, Blip2, etc. But the CWN note names can be mapped onto these abstract names, where Blip1 may map to A4, Blip2 to Bb5, and so on. But the same events in Music Space can also be mapped to the names of a different notation system.

Step 4. Select Tuning Reference Note

For audio rendering via a system such as MIDI, a Tuning Reference Note would be important. This would not be relevant for notation purposes. When required, there should be a mapping between the names in Step 2 and possible base frequencies. Step 4 relates in conceptualization to the SMDL pitchgram (1995: 28).

The purpose of Step 4 is to make possible the quick selection of tuning to either A=440Hz, or A=442Hz, or other possible systems. Once a user makes a selection, the frequencies of other notes depend on the related look-up table, which can specified in detail in Step 5.

Step 5. Select tuning system

A user must have the ability to select between tuning systems such as Equal Temperament and Just Intonation, and any other possible systems.

Although developers should provide look-up tables of standard relationships between Tuning Reference Notes, Dominant Pitches, and Tuning Systems, users will have the ability to change any specific single aspect.

Here is a possible set-up, provided to the user by the system, for the frequency of the set of A notes, where A4 is selected as 440Hz:

A0		27.5Hz
A1		55Hz
A2		110Hz
A3		220Hz
A4		440Hz
A5		880Hz
A6		1'760Hz
A7		3'950Hz
A8		7'900Hz
A9		15'900Hz

Step 6. Select Timing Mechanism

A default Timing Mechanism is selected with the following possible values: none, CWN-duration (for score fragments that serve as examples for pitches and relative note duration, but for which the time signature is absent), MTC (MIDI Time Clock), BPM (Beats Per Minute), SMPTE, etc.

Step 7. Define inherent object and event properties

Especially in performed music, no two notes have the exact values in terms of time, and often frequency (in which case overtones and harmonics have an impact on the base frequency). For notation purposes duration is often averaged out to the convention of the specific notation system. For example, the default value of note duration may be a quarter note. This is a requirement of CWN and not necessarily of other systems. In piano roll notation, for example, the rectangle's horizontal dimension relates to a much more precise duration, which could be precise to the level of milliseconds.

The general frequency (or pitch) and time (or duration) parameters are set up in Step #1 (Music Space Parameters), while the inherent properties of each object and event are set up in Step 7, if required.

If the values of both the x-axis and y-axis are declared as none, objects and events can only be defined in terms of one another. In a manner of speaking, the inherent x-axis and y-axis value of objects and events are specified without reference to the external x-axis and y-axis of Music Space.

But let me take the possible values one at a time.

If inherent time is specified, values such as the following could be declared. For the sake of simplicity, let us consider some possibilities of only two events, Event1 and Event2:

1. <event1 duration=150ms />...

The duration of event1 is 150ms

2.  <event1 begin=ABC end=BXF /> 
    <event2 begin=BXF end=XZZ />

Event2 begins when event1 ends, as indicated by the same values of the two different attributes. In this second example the actual length of Event1 is unknown, and neither is the absolute onset time of Event2. All we know is that when Event1 stops, Event2 begins.

By being able to declare the properties of events in this way, no reference to a timeline is required, and thus this approach makes possible the description of unmeasured or time-less music.

Set-up sets

It should be possible to set up the above value systems for each object on the z-axis. The base object on the z-axis will serve as default, and values may populate automatically as other layers are added. However, a user should be able to change any property of any specific object. For example, the base layer may represent a piano, while the 20th layer (as indicated on the z-axis) may be for a saxophone. A user may specify one tuning system for the piano, and another for the saxophone.

This system would also allow for the synchronization of other classes of objects, such as choreographed dance steps, or introducing non-music events into the score.

SMR - Notation

So where does the notation specific markup come in? In MML a Notation Module is linked to the base or core module of Music Space.

Again the application system can be set up according to user requirements, while the common practice of CWN may serve as default. Let me explain some aspects of how I envisage the system.

Bar

The traditional barline may be set up by default. If a user selects 4/4 as measure, this measurement is mapped as an overlay on top of the x-axis of Music Space. A bar is not an inherent characteristic of Music Space, but a notation convention. A user will be able to insert bars anywhere in a score. This may result in a score which does not follow the conventions of CWN. A user may have the option to enforce this weird set-up, or, as envisaged, he may push a button that will automatically translate his incorrect notation into a more conventional one.

By distinguishing the core aspects of music from notational ones, Music Space allows for bar-less or unmeasured music. Any notational system can be overlayed over Music Space.

Music Notation

Music Space does not contain notation-specific markup. Such markup is done within a notation module. The markup of the notation module is linked to the markup of Music Space when so required. The markup of the notation module can be expressed using a wide variety of graphic symbols, or glyphs.

Several systems for the graphical representation of CWN music symbols are available, such as WEDELMUSIC's Table of Symbols, Berthelemy and Bellini's Music Notation Examples for Validation (2003), Unicode 3.1 Range 1D100-1D1FF, and others. A rendering machine needs to interpret the music markup and translate it into a graphic representation. One method to achieve this is to use one of these systems as a look-up table and map specific markup to matching glyphs.

Another method would be to describe glyphs more abstractly in a fashion similar to typographical Panose, which is a classification system for Latin typeface, and which describes fonts in terms of 10 categorizing key features. The font descriptors of CSS 2.0 make use of this system, and in addition to this, web fonts can be designed by using a grid system within which a glyph's main characteristics are described. In my view this approach would be much more economical.

To implement this approach, the typological symbols first need to be categorized in terms of their functions. The work of Berthelemy and Bellini may serve as a basis for this, but I view this much more abstract and functionally.

And yet another method would be to use SVG. For the rendering of music notation this would be pretty musch straightforward. But for creating SMR into SVG would imply some advanced editor.

Grouping function in notation

In CWN there are several graphic symbols for grouping, and their functions are very similar. Some of these symbols are the following, and my definitions for them are not formal at all.

bar: for grouping sequences of notes into temporal-based groups
tie: for grouping notes together artificially in order to maintain a logical sum of note duration values within a bar
beam: to group notes visually together in sub-groups within a bar
tuplet: to group visually together with a particular rhythm
slur: to group visually together that form a kind of unit

Abstractly these symbols all serve the function of grouping notes together within different contexts. There are conventions that determine how the grouping is done in CWN, but other visual methods of grouping are also possible. One example that comes to mind is the use of color: to use different colors for these functions, although the results may not be pleasing if different functions overlap and result in the mixing of colors.

From the perspective of the SMR's logical notation domain, a possible markup fragment may look like this:

<group>
  <bar id=1>A B C 
   <notation>
      <bind begin="1" end="2" />
      <slur beat="1" begin="1" /> 
   </notation>
  </bar>
</group>

But the actual graphic rendering of a bind or a slur may depend on the preferences of a user. Here are some possible variations in the rendering, and in my view this should be handled by a music style sheet.

Bind

Slur

Synchronizing Music Space with other systems

MHEG-5, HyTime, SMDL, SMIL, etc. all have a kind of timeline which may serve as tool for synchronizing with the x-axis of Music Space. Mapping may not always be successful, as Music Space allows a much more open-ended horizontal reference system. Recall that the x-axis can be specified in terms of various time systems, or space systems. And it could even be specified without any reference to a timeline when properties are defined internally with reference to neighboring objects and events.

Pressing needs

The system described here depends on many conventional data sets, ranging from tuning systems, to scales, to note names, to notation symbols and so on. What needs to be done is to collect all these posibilities, classify them (in an abstract manner as I have shown above with Grouping) and assign metadata to them. After all, given a specific tuning system, and A is tuned to say, 442Hz, the data for this look-up table needs to be composed only once. Perhaps SMR may look into collecting all these possibilities.

Conclusion

In this presentation I introduced Music Space as an abstract coordinate system which serves as a referencing mechanism for the XML-based markup of music (and non-music) objects and events within a multimedia document. Within Music Space the core of music is described. This space can be assigned many different systems of measurement and labeling, and serves as the basis for the universal aspects of music.

More specific cultural and other conventions are mapped onto the Music Space. For instance, a notational module can be mapped onto the Music Space, without altering the basic characteristics of music objects and events. When a notation module is attached, the music objects and events can be translated into many different notation systems. Any system of symbolic music representation can be hooked onto the base of Music Space.

In addition to be rendered graphically as music notation, the music objects and events within Music Space can also be mapped to instructional systems such as MIDI, for audio rendering. Again, adding this module will not alter the basic music objects and events.

There are several W3C Recommendations that may be able to make all the above possible. We have just obtained some funding to build an implementation of this, but the goals seems to be quite achievable.

In my view only an approach as proposed here -- where an abstract system, such as Music Space, is marked first, with all the systems of variations are marked as additional modules and the generic data contained in look-up tables -- will be able to achieve the very broad and universal goal of the MPEG SMR work group.

References

Standards and Recommendations

Content Selection for Device Independence (DISelect) 1.0
W3C Working Draft 2004
CSS 1 W3C Recommendation 1996, revised 1999
CSS 2 W3C Recommendation 1998
CSS 3 W3C Recommendation several Candidate Recommendations in 2003 and 2004
DSSSL ISO/IEC 10179:1995
HyTime ISO/IEC 10744:1992
MHEG-5 ISO/IEC 13522-5
MHEG-6 13522-6
SMDL ISO/IEC DIS 10743 (1995)
SMIL 1 W3C Recommendation 1998
SMIL 2 W3C Recommendation 2001
SVG W3C Proposed Recommendation 2001
Unicode 3.1 Musical Symbols: Range 1D100-1D1FF 2001
Xforms 1.0 W3C Recommendation 2003
XML 1.0 W3C Recommendation 1998

Berthelemy J, Bellini P 2003
Music Notation Examples for Validation
MusicNetwork

Boll S, Klas W, Westermann U 1999
A Comparison of Multimedia Document models Concerning Advanced Requirements
Technical Report - Ulmer Informatik-Berichte No 99-01, Department of Computer Science, University of Ulm, Germany

Boll S, Klas W, Westermann U 2000
Multimedia Document Models
Multimedia Tools and Applications, Vol 11 No 3

Goldfarb C, Newcomb SR, Kimber WE, Newcomb PJ 1997
A Reader's Guide to the HyTime Standard

Ingram J The Notation of Time 1985, 1999

Ingram J Developing Traditions of Music Notation and Performance on the Web 2002

Ingram J Music Notation 2002

Steyn J 1999 2004 Music Markup Language http://www.musicmarkup.info/