Publications

Network Security for QoS Routing Metrics

Conference paper

Abstract— Data security is an essential requirement, especially when sending information over a network. Network security has three goals called confidentiality, integrity and availability (or Access). Encryption is the most common technique used to achieve this goal. However, the computer society has not yet agreed on a standard method to measure data security. The ultimate goal of this study is to define security metrics based on different aspects of network security, and then demonstrate how these metrics could be used in Quality of Service (QoS) routing to find the most secure path connecting two distant nodes (source and destination) across an internetwork. Three security metrics are proposed in this document, these metrics have been derived from three important issues of network security, namely: authentication, encryption and traffic filtration techniques (firewalls and intrusion detection systems). The metrics follow different composition rules in that the first is binary, the second is either concave or additive and the last is multiplicative. Routing algorithms that make use of such metrics have been implemented in the C# programming language to test the viability of the proposed solution. Computational effort and blocking probability are the most commonly used performance measures were used to assess the behavior and the performance of these routing algorithms. Results obtained show that the algorithms were able to find feasible paths between communicating parties and helped in making reasonable savings in the computational effort needed to find an acceptable path. Consequently, higher blocking probabilities were encountered, which is thus the price to be paid for the savings.

Ibrahem Ali Mohammed Almerhag, Abduelbaset Mustafa Alia Goweder, (05-2010), The International Islamic University, Kuala Lumpur, Malaysia: Proceedings of ICCCE 2010, 151-157

Publication link

Unsupervised Sentence Boundary Detection Approach for Arabic.

Conference paper

ABSTRACT

Punkt (German for period) is a sentence boundary detection system that divides an English text into a list of sentences using an unsupervised algorithm developed by Kiss and Strunk (2006) [6]. This algorithm is based-on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified.

The Punkt system was adapted to support Arabic language. The modified Punkt is trained on Arabic Corpus to build a model for abbreviation words, collocations, and words that start sentences. An evaluation of the performance of the modified Punkt system has revealed that an accuracy rate close to 99% has been achieved for detecting Arabic sentence boundaries.

Abduelbaset Mustafa Alia Goweder, (12-2009), University of Science and Technology, Yemen.: Proceedings of ACIT 2009, 289-297

A General Technique for Graduating SQL Schema from XML Schema.

Conference paper

It is possible to generate an SQL schema from XML Schema manually; however automatically generating an SQL schema from XML Schema would generally be very beneficial. This paper presents XML Schema-driven generation architecture components with XSL Stylesheet. In this paper, an algorithm for this type of generation is presented. The inputs of the algorithm are XML Schema and XSL Stylesheet, and the output is an SQL schema. The proposed algorithm shows how this component can automatically be generated. An evaluation of the proposed algorithm is also presented by testing the algorithm with different examples.

Ali Sayeh Ahmed Elbekai, Abduelbaset Mustafa Alia Goweder, (12-2009), University of Science and Technology, Yemen.: Proceedings of ACIT 2009, 177-185

An Anti-Spam System using Artificial Neural Networks and Genetic Algorithms

Conference paper

Nowadays, e-mail is widely becoming one of the fastest and most economical forms of communication .Thus, the e-mail is prone to be misused. One such misuse is the posting of unsolicited, unwanted e-mails known as spam or junk e-mails. This paper presents and discusses an implementation of an Anti-spam filtering system, which uses a Multi-Layer Perceptron (MLP) as a classifier and a Genetic Algorithm (GA) as a training algorithm. Standard genetic operators and advanced techniques of GA algorithm are used to train the MLP. The implemented filtering system has achieved an accuracy of about 94% to detect spam e-mails, and 89% to detect legitimate e-mails.

Abduelbaset Mustafa Alia Goweder, (12-2008), University of Safax, Safax, Tunisia: Proceedings of ACIT2008, 177-185

Publication link

Arabic Broken Plural using a Machine Translation Technique

Conference paper

Abstract The Arabic language presents significant challenges to many natural language processing applications. The broken plu rals (BP) problem is one of these challenges especially for information retrieval applications. It is difficult to deal with Arabic broken plurals and reduce them to their associated singulars, because no obvious rules exist, and there are no standard stemming algorithms that can process them. This paper attempts to handle the problem of broken plural by de veloping a method to identify broken plurals in an unvowelised Arabic text and reducing them to their correct singular forms by incorporating the simple broken plural matching approach, with a machine translation system and an English stemmer as a new approach. A set of experiments has been conducted to evaluate the performance of the proposed method using a number of text samples extracted from a large Arabic corpus (AL-Hayat newspaper). The obtained re sults are analyzed and discussed.

Abduelbaset Mustafa Alia Goweder, (12-2008), University of Safax, Safax, Tunisia: Proceedings of ACIT2008, 64-71

Publication link

A Hybrid Method for Stemming Arabic Text

Conference paper

Abstract There are several stemming approaches that are applied to Arabic language, yet no a complete stemmer for this language is available. The existing stem-based stemmers for stemming Arabic text have a poor performance in terms of accuracy and error rates. In order to improve the accuracy rates of stemming, a hybrid method is proposed for stemming Arabic text to produce stems (not roots). The improvement of the accuracy of stemming will lead by necessity to the improvement of many applications very greatly, including: information retrieval, document classification, machine translation, text analysis and text compression. The proposed method integrates three different stemming techniques, including: morphological analysis, affix-removal and dictionaries.

Abduelbaset Mustafa Alia Goweder, (12-2008), University of Safax, Safax, Tunisia: Proceedings of ACIT2008, 125-132

Publication link

Identifying Broken Plurals in Unvowelised Arabic Text

Conference paper

Irregular (so-called broken) plural identification in modern standard Arabic is a problematic issue for information retrieval (IR) and language engineering applications, but their effect on the performance of IR has never been examined. Broken plurals (BPs) are formed by altering the singular (as in English: tooth→ teeth) through an application of interdigitating patterns on stems, and singular words cannot be recovered by standard affix stripping stemming techniques. We developed several methods for BP detection, and evaluated them using an unseen test set. We incorporated the BP detection component into a new light-stemming algorithm that conflates both regular and broken plurals with their singular forms. We also evaluated the new light-stemming algorithm within the context of information retrieval, comparing its performance with other stemming algorithms.

Abduelbaset Mustafa Alia Goweder, (07-2004), Barcelona, Spain: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 246-253

Publication link

Broken Plural Detection for Arabic Information Retrieval

Conference paper

Abstract

Due to the high number of inflectional variations of Arabic words, empirical results suggest that stemming is essential for Arabic information retrieval. However, current light stemming algorithms do not extract the correct stem of irregular (so-called broken) plurals, which constitute ~10% of Arabic texts and ~41% of plurals. Although light stemming in particular has led to improvements in information retrieval [5, 6], the effects of broken plurals on the performance of information retrieval systems has not been examined.We propose a light stemmer that incorporates a broken plural recognition component, and evaluate it within the context of information retrieval. Our results show that identifying broken plurals and reducing them to their correct stems does result in a significant improvement in the performance of information retrieval systems.

Abduelbaset Mustafa Alia Goweder, (07-2004), The University of Sheffield, UK: The 27th Annual International ACM SIGIR Conference, 566-567

Publication link

Assessment of a Significant Arabic Corpus

Conference paper

The development of Language Engineering and Information Retrieval applications for Arabic require availability of sizeable, reliable corpora of modern Arabic text. These are not routinely available. This paper describes how we constructed an 18.5 million word corpus from Al-Hayat newspaper text, with articles tagged as belonging to one of 7 domains. We outline the profile of the data and how we assessed its representativeness. The literature suggests that the statistical profile of Arabic text is significantly different from that of English in ways that might affect the applicability of standard techniques. The corpus allowed us to verify a collection of experiments which had, so far, only been conducted on small, manually collected datasets. We draw some comparisons with English and conclude that there is evidence that Arabic data is much sparser than English for the same data size.

Abduelbaset Mustafa Alia Goweder, (08-2001), Tolouse, France: Proceedings of ACL 2001, 71-78

Publication link