Open Data Day was celebrated here on Sunday, 4. March 2017. The Open Data Society of B.C. sponsored a buzzing, successful hackathon, with participants from several communities in the Lower Mainland, Vancouver Island, and even Washington State.
I plunged back into my continuing project for Vancouver Open Data Day, to make a language census of Vancouver’s Open Data Catalogue. You can check out our Team Meta #VODay hackathon report as submitted via github. I’ve summarised it below. I was very pleased to be awarded the City of Vancouver Focus Challenge Prize for the work we accomplished that day.
I had worked on this language census in past years. See for instance my 2014 blog post, Open Data Day 2014, and a dataset dataset for Vancouver. Each team took on a name, so I took on “Team Meta”. The language census is “metadata” — data about data. And “Meta” is an anagram of “Team”.
Prototype Problem Statement
My vision is that apps based on Vancouver open data should be localised into all the languages in which Vancouver residents want them.
Over 30% of the people in the Vancouver region speak a language other than English at home, says Stats Canada. That is over 700,000 people of the 2.9m people in the area. Now of course localising those apps and web sites is a task for the developer. My discipline, internationalisation (i18n), is a set of design and implementation techniques to make it cheaper and easier to localise an app or web site. At some point, an app or web site presents data sourced from an open data dataset.
In order for the complete user experience to be localised, the dataset also needs to be localised. A challenge of enabling localisation of open data-sourced apps is to set up formats, social structures, and incentive structures which makes it easier for datasets to get localised into the languages which matter to the end users.
To that end, I picked a modest project for the day. It was to make a language census of the city of Vancouver’s Open Data datasets. The link is to a project page I started on the Open Data Day wiki.
However, the Vancouver Open Datasets catalogue is published as an HTML table. It is difficult to copy this table and process it with software. So, the first problem to solve was converting this catalogue into a standard, machine-readable format.
- Generate a version of the Vancouver Open Data catalogue as a dataset.
- Update the dataset list in the Vancouver Open Data language census with the most recent datasets.
- Incremental progress on the language census.
What Open Data Sets Will You Use?
Our work is captured in two wiki pages.
- http://wiki.opendataday.org/Vancouver_Open_Data_language_census lists all the datasets in the Vancouver Open Data catalogue, and notes what kind of language-specific data would need to be translated.
- http://wiki.opendataday.org/Vancouver_Open_Data_Catalogue_dataset is a JSON object listing the datasets in the Vancouver Open Data Catalogue. It is based on the Project Open Data Metadata Schema.
Next Steps for next year
It was satisfying to keep making forward progress on this language census, one day a year. But for next year, I think, I would like to see an actual service delivered in a language other than English. Perhaps I’ll be able to make progress on this between now and Open Data Day 2018.
- Complete the Vancouver Open Data language census. This can happen incrementally. It will need to be updated as Vancouver changes its open data catalogue.
- Define requirements for an open-data-driven service delivered in a language other than English. This will require market research. For example, it would be helpful to ask groups who provide services for Lower Mainland residents in languages other than English, what smartphone- or website-hosted app or service would really help them, if it were in a particular language other than English.
- Implement this service. That might be the right project for Open Data Day 2018 itself.
- Translate the datasets needed to drive this service. That could happen during Open Data Day, if the right translators attend, or outside that event.