吳實錄

Annals of Wu

漢藏緬語々言研究ㄟ博客
a sinotibetoburman linguistics blog
2013-06-15

Update Coming For Shanghainese Phonetic Corpus tools - corpus

In a couple weeks I'll have a huge update done to the phonetic corpus. Previously I put together a rough tool of a few thousand characters being based on the Guanyun tables, but obviously this takes a big his in accuracy.

The latest update will cover over 8400 characters, plus a pretty large set of mono- and multi-syllabic words, over 30,000 in all.

In addition to the IPA data, the new set also includes uniform romanisation and tentative definitions pulled from a number of open source dictionaries and open forums covering this sort of thing.

Also as part of this update I'll be updating the version of the data used on Tatoeba and similar sites.

If you'd like to take the romanisation for a test run, you can do so at this page:
http://phonemica.net/fawu

Note that it currently only supports traditional characters.

Very busy week coming up, but after June 20 I'll have a lot more free time, and I'll be trying to update here regularly.

    Leave a comment




    About

    A semi-academic linguistics blog about Sinotibetan, previously focused primarily on Wú, a Sinitic language spoken in the Yangtze Delta region. Topics now include historical linguistics, documentation, language rights, sociolinguistics and learning materials, as well as acting as the dev blog for Phonemica from time to time.

    I'm a linguist based in Asia, working on documentation and historical development of Sinotibetan. In addition to academic research, I'm heavily involved in Phonemica, an organisation that promotes crowd-sourced preservation of local languages.

    I'm currently in the field, so getting in touch isn't easy. However you can try to email me at the following address and I'll respond as soon as I'm able:

    yhilan.ko@gmail.com
    © 2009-2017