Researcher Develops System Called Saroi To Detect, Correct Syntactic Mistakes

September 21, 2009

Saroi is a general tool which, apart from dealing with errors, is used for making consultations about structure in the trees of analysis and for undertaking searches for linguistic structures in such trees.

A research worker at the University of the Basque Country (UPV/EHU) has analyzed the existing tools for the detection and correction of syntactic mistakes. To detect errors of context “”for example, concordance errors”” it is necessary to analysis the tree structure of the sentence. The researcher did not find a suitable tool for this purpose and so she created one “”Saroi””, which deals with mistakes in syntax and, moreover, can be used for consulting tree structure analysis and for carrying out searches for linguistic structures in such trees.

Saroi is a flexible and easy to use tool, and is capable of working with linguistic characteristics at different levels “”morphological, morphosyntactic, syntactic, etc.””

The author of the PhD thesis and of Saroi is Ms Maite Oronoz Anchordoqui, and her work is entitled Euskarazko errore sintaktikoak detektatzeko eta zuzentzeko baliabideen garapena: datak, postposizio-lokuzioak eta komunztadura (Development of resources for the detection and correction of syntactic errors in Basque: dates, postpositional phrases and concordance). In the opinion of the researcher, given the situation of Euskara (the Basque language) “”in process of standarisation, rich in dialects and surrounded by strong languages””, it is common for users not to use the linguistic structures established by the norms of the Euskaltzaindia (the Royal Academy of the Basque Language). The author of the PhD thesis has defined errors as being those structures which deviate from correct structures and norms.
Errors in concordance

In her research work, the author of the thesis detected errors in the way dates were expressed, in postpositional phrases and, above all, in concordances between the verb and those elements that function as subject, direct object and indirect object.

To find these mistakes, analysing automatically both correct corpora and corpora with errors, Ms Oronoz used the process of syntax analysis of the IXA Group at the Faculty of Informatics in the UPV/EHU and, in order to represent the linguistic information, the annotation network using XML.

After analyzing the various error detection and correction techniques, Ms Oronoz opted to use those based on the knowledge of language. So, to detect and correct mistakes in local contexts “”groups of five or six words””, she used two tools web known in the world of the automatic treatment of language: Xerox Finite State Tool (XFST), to work on the errors detected, and Grammar of Restrictions (GR), to deal with the incorrect usage of postpositional phrases.
About the author

Ms Maite Oronoz Anchordoqui (Hondarribia, 1972) is a graduate in Informatics and has been a researcher with the IXA Group since 1996 and assistant lecturer at Faculty of Informatics at the UPV/EHU since 2001. Her thesis was led by Ms Arantza Díaz de Illarraza and Mr Koldo Gojenola from the Department of Languages and Computer Systems of the same Faculty

