Challenges

Fix broken links in XWiki CMS

Site odo.lv uses XWiki content management system. There are many broken links in this site, which can be fixed automatically by LinkFixer application. However not all links are fixed properly (or fixed at all). Biggest known issue is that, if link cannot be fixed and is removed, fixed content has extra space left, which may break markdown syntax of XWiki documents. The task is to review existing solution and improve it, mostly by fixing rules for text changes using regular expressions. Participants of this project should know general design principles of Java web applications, relational databases and regular expressions.

Development of the Diphone Java editor

Diphone studio is a closed source * .wav sound file editor running on Windows that can prepare data for MBROLATOR tool, what prepares database for MBROLA text-to-speech synthesizer. The task is to study functionality of the Diphone Studio tool (the tool will be delivered to the student by special agreement) and to implement it as an open source tool in the Java programming language, to be able tor run it on several operating systems.

Att01.png

Additional information:

Multi-lingual text tokenization solution

In speech synthesis one of important part is tokenization, which classifies characters depending on their semantic, and converts them to words. E.g. . (full stop) may be used as end of sentence, but also as decimal delimiter, or ordinal number (in some languages). The job is to create java application, which reads text from standard input and provides tokenized text on standard output according to tokenization rules saved in text configuration file. For example:

In 01.01.2020 avg. download speed in Latvia was 34.23Mb/s

is converted to

In zero one point zero one point two thousand and twenty average download speed in
latvia was thirty four point twenty three megabits per second.

Configuration file for tokenization rules should be compatible (but may be extended) to rules of eSpeak NG engine.

Probably useful links:

Continue to develop eSpeakNG Jeditor

espeakedit is the editor of the phonemas (spoken sounds) used by eSpeak text-to-speech synthesiser. Currently, eSpeakEdit is not being maintained, so it needs to be developed in more easier and modern technologies. In Github is the latest version of eSpeakEdit development available, which can be compiled and built but works with errors.

The job job is to reimplement the editor's graphic environment to the Java Swing frame by fully implementing functions of original eSpeakEdit written in C++. An incomplete version of the tool in Java is available at https://github.com/valdisvi/espeak-ng-jeditor/. Look for TODO's for some of the known tasks.

fig3.png

Tags Java English
Created by Valdis Vītoliņš on 2017-07-06 09:39
Last modified by Valdis Vītoliņš on 2022-01-03 13:54
 
Xwiki Powered
Creative Commons Attribution 3.0 Unported License