primoqert.blogg.se - Dex online romanian

We show that using a carefully selected subset of entries for training can result in a similar performance to the performance obtained by a larger set of randomly selected entries (twice as many). A word error rate of 3.08% and a character error rate of 1.08% were obtained this way. The best results were obtained when the orthographic form of the entries was augmented with the complete morphosyntactic tags. The evaluation looked into several dataset design factors, such as the minimum viable number of entries for correct prediction, the optimisation of the minimum number of required entries through expert selection and the augmentation of the input with morphosyntactic information, as well as the influence of each task in the overall accuracy. The dataset’s inherent knowledge is then evaluated in a task of concurrent prediction of syllabification, lexical stress marking and phonemic transcription. The process of selecting the list of word entries and semi-automatically annotating the complete lexical information associated with each of the entries is thoroughly described. RoLEX includes over 330,000 curated entries with information regarding lemma, morphosyntactic description, syllabification, lexical stress and phonemic transcription. The dataset was developed mainly for speech processing applications, yet its applicability extends beyond this domain. In this article, we introduce an extended, freely available resource for the Romanian language, named RoLEX. The tool, the processed phonetic lexicons and trained G2P models are made freely available. The phoneme and word error rates of the resulting G2P converters are presented and discussed. With the different degree of orthographic transparency, as well as the varying amount of phonetic entries across the languages, the DNN's hyperparameters are optimised with an evolution strategy. The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource. The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming, as well grapheme-to-phoneme conversion of the prompts in eight languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish. RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications. In this paper, the RECOApy tool is introduced. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. With the same best performing model for Romanian an accuracy of 88.83% without syllables and 93.84% without lexical stress is obtained.ĭeep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. The same model for English obtains an accuracy of 59.70% when syllables are discarded and 64% when the prediction of lexical stress is ignored. The best results were obtained with a combination of convolution and attention layers, where the accuracy of the joint prediction for the three tasks was of 58.96% for English and 86.64% for Romanian. The accuracy of the models was evaluated in terms of accuracy for the concurrent prediction of all three tasks, as well as by discarding the syllabification or lexical stress predictions. The proposed network architectures include recurrent, convolution and attention neural layers and were evaluated on hand-checked English and Romanian datasets. In general, the lexical stress assignment and syllabification are used as augmenting input features to the phonetic transcription model and not considered as target features. These text processing tasks are considered essential components for high quality text-to-speech or automatic speech recognition systems, with the phonetic transcription being the most frequently used in these types of applications.Īlthough each of the tasks has been individually and extensively analyzed in the scientific literature, there are few studies which target a concurrent solution for them. This paper evaluates four different sequence-to-sequence deep neural network architectures aimed to jointly solve the tasks of: phonetic transcription, lexical stress assignment and syllabification.