Archived Posts from this Category
Archived Posts from this Category
I am helping to start a regular face-to-face event series which will bring together the people in the Vancouver area who work in technology globalization, internationalization, localization, and translation (GILT) for networking and learning. This post is the second in a series where I put into words my percolating thoughts about this group. See also, A Technology Globalization meetup for the Vancouver Area: (1) What, Who (Oct 31, 2014).
Happily, this group has already started. We held our first meeting on Monday, Dec 8, 2014. Our placeholder Twitter feed is @imlig1604; follow that and you’ll stay connected when we pick our final name. And we have a group on LinkedIn for sharing ideas. The link isn’t very memorable, but go to LinkedIn Groups and search for “Vancouver localization”; you will find us. (We don’t yet have an account on the Meetup.com service.) If you are in the Lower Mainland and are interested, I would welcome your participation.
Continuing with my reflections about this group, here are thoughts on why this group should exist, and what it might be named.
Serious or “classical” music has brought me great joy throughout my life. I have sung in choruses since childhood, and in operas for twenty years. I’m not a skilled musician. But being a participant makes the beauty and value of our shared musical heritage vividly alive. The efforts of musicians world-wide, amateur and pro, great and small, are what lets us pass the heritage on to future generations.
The information age is transforming our lives, sector by sector. Business, science, entertainment, communication. We have SMS and emails to help us communicate. We have spell-checkers and auto-correct help us write. We have web terminals in our pockets that let us read the best of the old books and the freshest of the newest microblogs. We have a huge range of recordings and videos for playback on demand.
Yet in all of this, the practice of music is in some ways stuck in the 1500’s — or, at best, the 19th century. When we start to sing, we pull out printed paper booklets more often than we pull out tablet screens. Rehearsals are bogged down because different people have different editions of the same musical work, with different page numbers. Wrong notes, or missing accidentals, in 50-year-old scores are uncorrected. Music directors lose rehearsal time to dictating cuts, assigning this lines to the tenor 1s and that to the tenor 2s, telling us on where to breathe and what bowing to use. And for the grand “Messiah” sing-along, a chorus must haul out hundreds of excess copies of chorus scores, distribute them to the audience, and then, hardest of all, collect them all back at the end.
The information age has provided us tools to solve these problems much more simply, for text and photos at least. We have word-processor files and photo-editors, which let us make corrections. We take for granted being able to re-typeset the modified text into a beautifully laid-out document, with our choice of typefaces. We can cast the documents into PDF files, and send them to their destinations. If there are errors, or tweaks specific to our project, it’s no problem to make a quick modification and redo the layout. If we want everyone in the room to read something, we can have them load it on a web page using their mobile device.
It is time that we do the same thing with music. It is time that it become routine for music scores to be handled in a revisable, reusable, high-quality digital form. Let’s call them “digiscores”. We should be able to make minor corrections. We should have the music equivalent of ebook readers at our disposal. We should be able to distribute scores electronically as conveniently as we distribute ebooks or emails.
Much of the great works of serious music date from the 19th century or earlier. They have long since entered the public domain. They are our shared heritage, part of our cultural soup. They should be freely available to everyone to mash-up and create with. But the notes of Verdi and Mozart are trapped in printed form, in books that are hard to obtain, or expensive due to the high overhead of low sales volumes. Publishers layer a new libretto translation on top of the public domain notes, and put a “do not photocopy” on the combination. A secondary school music teacher cannot pull Mozart from the cultural soup to use for the choir, because the packaging is obstructed by unnecessary copyright.
What we need are the public domain music scores, in revisable, reusable, high quality “digiscore” form, available as public domain digital files. In this form, they can be hosted cheaply, distributed for free, and used by everyone from the top symphonies, to the school music teachers, to the music-lovers exploring on their own.
Many talented people are innovating in this space. Many pieces are available. The Internet Music Score Library Project (IMSLP), aka the Petrucci Music Library, is making scanned images of public domain music scores freely available by the hundreds of thousands — but they are not revisable “digiscores”. There is music recognition computer software like Audiveris, SmartScan, and many others — but their output needs proofreading and correcting by humans before it is a usable “digiscore”. Project Gutenberg has proved the model of providing revisable digital versions of public domain works — but for texts, not music. The Project Gutenberg Distributed Proofreading project has a powerful structure for turning computer-generated drafts into final form — but they too have more traction for texts than for music. The Musopen project is commissioning quality recordings of a few of these works — but a recording of someone else’s performance is not what a chorus needs to make its own performance. MusicXML provides a promising foundation for a digiscore format — but a format is not a corpus. Musescore, Lilypond, Sibelius, Finale, and other tools put music entry and notation in the hands of a wider and wider audience — but we need a wider and wider group to use those tools. The Internet Archive is willing and able to host and distribute freely-available content — but someone has to provide the content.
There is a need for initiatives to harness the good will of music lovers, to equip them with tools and social structures, and help them turn public-domain music scores (and scans of scores) into public-domain digiscores, for free public use and re-use. I seek to contribute my energy to forming one such initiative. I will communicate more in the future. For now, this is my direction and my purpose.
If this vision excites you, please let me know in the comments below. (Later, there will be an announcement email list to join, and a web site at which to register, and so on.) There is a lot of work to do, and with many volunteers in an effective social structure, great results are possible. Wikipedia has shown us that. I would love to have your help.
Think of the applications programming interface (API) for an application environment: an operating system, a markup language, a language’s standard library. What internationalisation (i18n) functionality would you expect to see in such an API? There are some obvious candidates: a text string substitution-from-resources capability like gettext(). A mechanism for formatting dates, numbers, and currencies in culturally appropriate ways. Data formats for text that can handle text in a variety of languages. Some way to to determine what cultural conventions and language the user prefers. There is clearly a whole list one could make.
Wouldn’t it be interesting, and useful, to have such a list? Probably many organisations have made such lists in the past. Who has made such a list? Are they willing to share it with the internationalisation and localisation community? Is there value in developing a “good practices” statement with such a list? And, most importantly, who would like to read such a list? How would it help them? In what way would such a list add value? Continue Reading »
OpenDataDay 2013 was celebrated last Saturday, February 23rd 2013, at over 100 hackathons and work days in 38 countries around the world. The City of Vancouver hosted a hackathon at Vancouver City Hall, and I joined in. My project was a language census of Vancouver’s open data datasets. Here’s what I set out to do.
Vancouver’s Open Data Day was a full house of some 80 grassroots activists, with attendance throughout the day by city staff, including Linda, the caretaker of the Vancouver Open Data portal and the voice of @VanOpenData on Twitter. I missed the “Speed Data-ing” session in the morning, where participants could circulate among city providers of datasets to talk directly about was available and what each side wanted. I’m told that national minister the Honourable Tony Clement was also there (who now is responsible for the Government of Canada’s Open Data portal data.gc.ca, but who also in 2010 helped turn off the spigot of open data at its source by killing the long form census). I saw Councilmember Andrea Reimer there for the afternoon working session, and listening to the day-end wrap-ups, tweeting summaries of each project. I won’t try to describe all the projects. Take a look at the Vancouver Open Data Day 2013 wiki page, or the tweets tagged #vodhd13 (for Vancouver), and tagged #OpenData (worldwide).
I gave myself two goals for the hackathon. First, provide expertise and increased visibility for internationalisation and multi-lingual issues among the participants. Second, work on a modest project which would move internationalisation of local data forward.
My vision is that apps based on Vancouver open data should be localised into all the languages in which Vancouver residents want them. Over 30% of the people in the Vancouver region speak a language other than English at home, says Stats Canada. That is over 700,000 people of the 2.9m people in the area. Now of course localising those apps and web sites is a task for the developer. My discipline, internationalisation (i18n), is a set of design and implementation techniques to make it cheaper and easier to localise an app or web site. At some point, an app or web site presents data sourced from an open data dataset. In order for the complete user experience to be localised, the dataset also needs to be localised. A challenge of enabling localisation of open data-sourced apps is to set up formats, social structures, and incentive structures which makes it easier for datasets to get localised into the languages which matter to the end users.
To that end, I picked a modest project for the day. It was to make a language census of the city of Vancouver’s Open Data datasets. The link is to a project page I started on the Open Data Day wiki. I intended it to be a simple table describing the Vancouver, but it ended up with a good deal of explanation in the front matter. I won’t repeat all that, but just give a couple of examples.
The 3-1-1 Contact Centre Interactions dataset (CSV format) has rows like (I’ve simplified):
Category1 , Category2 , Category3 , Mode , 2012-11, 2012-12, 2013-1 CSG - Licenses, Animal Control, Dead Animals Pickup, Voice In, 22, 13, 13
While the Animal Control Inventory Deceased Animals dataset (CSV format) has rows like (again, simplified):
ID, Date ,CatOther , Description ,Sex,ACO , Bag 7126,2013-02-23,SDC , Tan/black medium hair cat, ,Duty driver- JT, 13-00033 7127,2013-02-23,Dead Budgie, , ,Duty driver-JT , 13-00034 7128,2013-02-26,Cat , Black and White ,F , , 13-00035
Note that most of the fields are simply data: dates, numbers, codes. These do not need to be localised. Some of the fields, like the Category fields in the 311 Interactions, are English-language phrases. But they are pulled from a controlled vocabulary, and so could be translated once into the target language, and would not usually need to be updated when new data is release. In contrast, a few fields in the Animal Control Inventory dataset, e.g. CatOther, Description, and ACO, seem to contain free text in English. Potentially, every new record in the dataset represents a new translation task.
The purpose of the language census is to go through the datasets in the Vancouver Open Data catalogue, and the fields for each dataset, and simply identify which fields are data, which are controlled vocabulary, and which are free text. It’s not a major exercise. It doesn’t involve programming. Yet I believe it’s an important building block towards the vision of localised apps driven by open data.
Incidentally, this exercise inspired me to propose another dataset for the Vancouver catalogue: a dataset listing the datasets. There are 130 datasets in the Vancouver Open Data catalogue, and more are on the way. The only listing of them is an HTML page intended for human consumption. It would be nice to have a machine-readable table in CSV or XML format, describing the names and URLs and formats of the datasets in some structured way.
I’m happy to report success at my first goal, also. Several participants stopped by to talk with me about language support and internationalisation. I’m hopeful that it will help the non-English localisation of the apps, and city datasets, happen a little bit sooner.
If you would like to help in the language census, the project page is a wiki, and you are welcome to make constructive edits. See you there! Or, add a comment below.
With the year coming to an end, it is the season of making donations to organisations doing good in the world. In both Canada and the USA, this is motivated by a tax deadline; donations to certain charities by December 31 can be tax deductions for that year. It’s an opportunity to lay out here a concept that I helped draft a decade ago: the “Social Justice Tithe”.
The Social Justice Tithe means giving at least 10% of your income to some combination of charities, religious groups, and political groups that enact your values.
Another stimulating Internationalisation and Unicode Conference (IUC36) just finished up last week (October 22-24, 2012). As usual it was rich with interesting people, stimulating subjects, and inspiration. My tutorial, Building multilingual websites in Drupal 7 and Joomla! 2.5, was well-attended and seemed to go well. My final paper and slides are posted at the preceding link.
I never met Derek Miller. I take that back. I may well have met him, say at the Northern Voice conference, the annual gathering of the B.C. blogging and social media scene. I almost certainly heard him play drums; I’m told his band, The Neurotics, played at the start line of the Vancouver Sun Run, our annual 50,000 person 10k stampede. Certainly we had a lot of friends in common. But I became aware of Derek Miller through one of his intriguing ideas. I then grew to admire his bravery, his unsentimental clarity, his humour, his compassion, as he compellingly narrated his own journey towards death. And as the community, in which he made waves and I bob in the ripples, mourned him, it became clear how many people loved and admired him.
I first came across Derek when researching what people were learning about digital legacies: what happens to one’s online persona and works when one dies. Derek apparently coined the term “digital executor”, the person who has the responsibility to take over all one’s blogs and accounts and presence on the net on one’s death. I think it is a brilliant term. Continue Reading »
I’m an amateur opera and symphonic chorus singer. Most of the classical music and opera I perform is old. Not just pre-iPhone old, but usually well over a hundred years old. These works have outlived even the outrageously long copyright terms imposed on our culture by greedy commercial interests. They are clearly in the public domain; they have returned to the shared culture from which they grew.
But when I want to learn a new work, like Verdi’s opera Macbeth or Mozart’s Requiem, why do I find myself paying $24-$40 for a music score which probably cost $5 to print? Why does the book contain stern warnings not to photocopy the contents, even it is little more than a facsimile of a previous edition, which itself is in public domain? It is because these music score products still cling to a pre-internet business model, based on selling “molecules” (the physical artifact of the book) for a price based on the value of the “bits” (the information or arrangement of notes we call the musical composition, plus the value of the editing, plus the value of the typesetting), and the costs of distributing and warehousing those molecules.
This shouldn’t be. The music itself — the bits, the abstract genius which is Beethoven’s or Mahler’s, not the later editorial changes, or the molecules on which the bits are printed — is in the public domain, so its cost is zero. Volunteers are willing to scan or transcribe old musical scores for free. So a digital file with a score ought to be accessible for the marginal cost of storage, duplication and delivery. And in an era of cheap disks and high-speed internet, that marginal cost is zero.
Many classical music and opera scores are indeed available, free for the downloading. Below are links to some useful sites for the classical or opera musician to find them. But there’s more. In the digital world, scores should get better, too: more correct, easier to use, more customised. If a fraction of every chorus and orchestra pitched in to ratchet forward the quality of the free scores for music they perform, we could make a huge difference.
During July-Sept 2009, the Government of Canada held public copyright consultations, with an eye to writing new copyright law. They asked for submissions addressing five topics. Here’s one of my submissions, on “Competition and Investment“. It’s hard to tell what will become of these consultations. My submission did eventually show up on the official submissions page, but I still want to publish it for the record on my own blog. I have two more submissions, “Copyright and you (me)” and “Copyright and the test of time“, which I published in recent weeks.
Q: What sorts of copyright changes do you believe would best foster competition and investment in Canada?
A: Three changes:
During July-Sept 2009, the Government of Canada held public copyright consultations, with an eye to writing new copyright law. They asked for submissions addressing five topics. Here’s one of my submissions, on the “test of time“. It’s hard to tell what will become of these consultations, because the government may fall (again) before Parliament gets a chance to pass a new bill. My submission did eventually show up on the official submissions page, but I still want to publish it for the record on my own blog. I have two more submissions, one on “Copyright and you (me)” which I published last month, and one which I’ll dribble out in the coming days.
Q: Based on Canadian values and interests, how should copyright changes be made in order to withstand the test of time?
A: The largest single dynamic is the change in delivery of cultural works from physical containers (paper books, CD disks, celluloid film) to digital information (ebooks, music files, computer networks).
Physical containers are: