This slide covers CROSPELL Engine; an engine made with multiple approaches for Natural Language Processing. It covers a wide variety of topics in text and image processing. From spell checking to topics prediction. It’s a project made in late 2012 and delivered in early 2013 at the F.I.T.E of Damascus, Syria as the final project in NLP course (with Ola Al Naameh and Mhd Hasan Sarhan.)

 System Specification (Implementation details can be found in the doc.)

1. Auto-correction

The auto-correction algorithm make sure that the misspelled word is matched with a proper correct word. Many approaches can be implemented for this. The option I opted to is the distance between keys on the keyboard map.


But ones should make sure he got the right algorithm. Keys on the keyboard map are not scattered linearly.


The distance between keys are also not linear. The best thing for this is Gaussian curve to measure the right distance.


The CyperSpell Algorithm maps the (possible) misspelled words with their correct-spelled counterparts (using a dictionary).


The user can, in realtime, write and the system will auto-correct (or suggest) the correct words when the user misspell. The system also knows what words the user has misspelled before and rank their chosen correct words higher in the list of suggestions.


2. Language Identification

The user can input any language and the system can figure out what than language is (as long as the corresponding corpses are provided).


if there are more than one language in the text, the system will list them (rank them) according to their occurrences (frequencies in the text).


3. Word Prediction

Using bi-grams and tri-grams the system can successfully suggest auto-completion while writing words.


3. Topic Prediction

Using bi-grams and tri-grams the system can successfully suggest the best topic that match the paragraph. The system, actually, lists all the possible topics prediction and rank them according to the best match.


4. Dictionary

The system also provide and Arabic-English dictionary.


5. Image Processing using NLP Approaches

Using Minimum Edit Distance (MED), we can match images with others having similar properties (colors in our case). Though, this approach is shallow since it fail completely when images are re-sized or rotated. Anyway, it’s just for fun!


The system can best compare images having similar sized and not-transformed.



6. ISRI and Porter Stemming Algorithms

Both, ISRI and Porter stemming algorithms are implemented in the engine.



7. Genome Matching using Minimum Edit Distance

The engine interestingly implement Genome matching using MED. The initial interface is:


The user can input two genomes and the system will find the match between the two.



8. Sentiment Analysis

The system implement a light sentiment analyzer. Just write a sentence or a paragraph and the system will provide the corresponding emotion for it.


You can download the full project documentation [in Arabic – بالعربية] here. I would be happy to upload the engine source code along with its interface for anyone to use! but the languages corpus are quite big (the project in 400 MB!) so if anyone is interested don’t hesitate to contact me by mail and I’ll figure something out!

One response to “CROSPELL ENGINE – Natural Language Processing Engine”

  1. mohammadshaker Avatar

    Reblogged this on Mohammad Shaker.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: