<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
    <channel>
        <title><![CDATA[NLP — Mohammad Shaker]]></title>
        <description><![CDATA[NLP articles by Mohammad Shaker]]></description>
        <link>https://mohammadshaker.com/en/blog/category/NLP</link>
        <image>
            <url>https://mohammadshaker.com/opengraph-image</url>
            <title>NLP — Mohammad Shaker</title>
            <link>https://mohammadshaker.com/en/blog/category/NLP</link>
        </image>
        <generator>RSS for Node</generator>
        <lastBuildDate>Wed, 15 Apr 2026 10:01:54 GMT</lastBuildDate>
        <atom:link href="https://mohammadshaker.com/en/blog/category/NLP/feed.xml" rel="self" type="application/rss+xml"/>
        <language><![CDATA[en-US]]></language>
        <item>
            <title><![CDATA[ElasticSearch Out of the Box Use Cases]]></title>
            <description><![CDATA[Elasticsearch ships with NLP-friendly features that most teams underuse: phrase-based did-you-mean suggestions, completion-based autocomplete, fuzzy matching, and built-in text analyzers. This post surveys those out-of-the-box capabilities and how they apply directly to Arabic and multilingual search applications.]]></description>
            <link>https://mohammadshaker.com/en/blog/elasticsearch-out-of-the-box-use-cases</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/elasticsearch-out-of-the-box-use-cases</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[engineering]]></category>
            <category><![CDATA[genre-classification]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <category><![CDATA[visualization-libraries]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 09 Aug 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Automatically Extracting Valuable Content from News Streams.]]></title>
            <description><![CDATA[A news content aggregator pulling from 50+ sources needs more than a firehose — it needs a pipeline that scores, filters, and ranks articles by quality. The main signals are readability, informativeness, and source reliability. Combining these lets you surface the 5% of articles worth reading and suppress the rest.]]></description>
            <link>https://mohammadshaker.com/en/blog/automatically-extracting-valuable-content-from-news-streams</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/automatically-extracting-valuable-content-from-news-streams</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Fri, 21 Feb 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Abstractive Summarization in Underresourced Languages]]></title>
            <description><![CDATA[Abstractive summarization for low-resource languages is harder than extractive summarization because it requires generating new text, not just selecting sentences. Morphological complexity and the scarcity of training data compound the difficulty for languages like Arabic. Transfer learning from high-resource language models is the most practical path forward.]]></description>
            <link>https://mohammadshaker.com/en/blog/abstractive-summarization-in-underresourced-languages</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/abstractive-summarization-in-underresourced-languages</guid>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Wed, 29 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[An Initial, Failed Solution For The Event Detection Task]]></title>
            <description><![CDATA[Our first Arabic event detection system combined TF-IDF vectors, NER features, and timestamp proximity, then clustered articles sequentially against a 1,400-article ground truth spanning 120 events. It failed — and the failure analysis revealed which feature combinations actually helped and why the similarity threshold was the breaking point.]]></description>
            <link>https://mohammadshaker.com/en/blog/an-initial-failed-solution-for-the-event-detection-task</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/an-initial-failed-solution-for-the-event-detection-task</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Wed, 29 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Building a Test Collection for Event Detection Systems Evaluation]]></title>
            <description><![CDATA[Evaluating an event detection system requires a labeled test collection — but building one for Arabic news means resolving annotation disagreements, defining event boundaries, and selecting a corpus that reflects real news diversity. We detail our methodology for constructing a 1,400-article benchmark across 120 events.]]></description>
            <link>https://mohammadshaker.com/en/blog/building-a-test-collection-for-event-detection-systems-evaluation</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/building-a-test-collection-for-event-detection-systems-evaluation</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Wed, 29 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Initial Genre Classification Experiments]]></title>
            <description><![CDATA[Automatically classifying Arabic news articles by genre — politics, sports, business, science — lets a news aggregator route and rank content intelligently. We describe initial experiments using NLP features extracted from a corpus of Arabic news articles across major outlets, evaluating multiple classification models and reporting where genre confusion is highest.]]></description>
            <link>https://mohammadshaker.com/en/blog/initial-genre-classification-experiments</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/initial-genre-classification-experiments</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[genre-classification]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Wed, 29 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[News Stream Clustering - Sequential Clustering in Action]]></title>
            <description><![CDATA[“Sequential clustering for news streams groups incoming articles into event clusters in real time, without a fixed cluster count. Each document is compared to existing cluster centroids and assigned to the best match above a similarity threshold — or starts a new cluster. We show this algorithm applied to an Arabic news stream with real results.”]]></description>
            <link>https://mohammadshaker.com/en/blog/news-stream-clustering-sequential-clustering-in-action</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/news-stream-clustering-sequential-clustering-in-action</guid>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Wed, 29 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Towards Contrary-View Detection in News]]></title>
            <description><![CDATA[Contrary-view detection finds news articles that cover the same topic but from opposing viewpoints. The problem splits into two steps: topic identification and viewpoint divergence ranking. We formalize the task, review approaches from stance detection and topic modeling, and present a document-similarity-based pipeline for surfacing contrasting perspectives in Arabic news.]]></description>
            <link>https://mohammadshaker.com/en/blog/towards-contrary-view-detection-in-news</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/towards-contrary-view-detection-in-news</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Mon, 27 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[An Implementation of a News Stream Sequence Clustering Algorithm]]></title>
            <description><![CDATA[Sequential clustering for news streams assigns each incoming article to an existing event cluster or creates a new one, without knowing the number of clusters in advance. We implement this online algorithm with incremental centroid updates and a similarity threshold, then evaluate it against our custom Arabic news test collection.]]></description>
            <link>https://mohammadshaker.com/en/blog/an-implementation-of-a-news-stream-sequence-clustering-algorithm</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/an-implementation-of-a-news-stream-sequence-clustering-algorithm</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Automatic Sentence Paraphrasing]]></title>
            <description><![CDATA[Automatic sentence paraphrasing rewrites text to preserve meaning while changing surface form — useful for data augmentation, summarization, and plagiarism detection. Rule-based systems use hand-crafted transformation templates; statistical approaches learn from parallel corpora; neural models like seq2seq generate more fluent paraphrases but require large training sets.]]></description>
            <link>https://mohammadshaker.com/en/blog/automatic-sentence-paraphrasing</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/automatic-sentence-paraphrasing</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Contrary View Detection Based On Document Similarity]]></title>
            <description><![CDATA[Contrary view detection using document similarity works in two stages: first find articles on the same topic via high similarity, then re-rank by dissimilarity of viewpoint signals — sentiment polarity, entity framing, and opinion markers. We implement and evaluate this pipeline on Arabic news, measuring how well similarity metrics alone can approximate ideological opposition.]]></description>
            <link>https://mohammadshaker.com/en/blog/contrary-view-detection-based-on-document-similarity</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/contrary-view-detection-based-on-document-similarity</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Contrary view detection based on VODUM]]></title>
            <description><![CDATA[VODUM (Viewpoint and Opinion Discovery Unsupervised Model) extends LDA by jointly modeling topics and viewpoints, making it a natural fit for contrary view detection. We apply VODUM to Arabic news to surface articles that cover the same event from opposing ideological angles, comparing it against document similarity baselines on viewpoint divergence metrics.]]></description>
            <link>https://mohammadshaker.com/en/blog/contrary-view-detection-based-on-vodum</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/contrary-view-detection-based-on-vodum</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[analysing-content,-not-publishers]]></category>
            <category><![CDATA[biased-and-sensational]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <category><![CDATA[values]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Major Tasks in Dialectical Arabic Processing]]></title>
            <description><![CDATA[Dialectal Arabic NLP is harder than Modern Standard Arabic because dialects vary widely across 22 countries, lack standardized spelling, and have far fewer labeled datasets. This survey covers the best available systems and benchmarks for dialect identification, sentiment analysis, and machine translation across Arabic varieties.]]></description>
            <link>https://mohammadshaker.com/en/blog/major-tasks-in-dialectical-arabic-processing</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/major-tasks-in-dialectical-arabic-processing</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Multi-document summarization. The What, Why and How]]></title>
            <description><![CDATA[Multi-document summarization merges information from several articles covering the same event into one coherent summary. The main challenges are redundancy elimination, cross-document coreference, and information ordering. Extractive methods copy sentences directly; abstractive methods generate new text. Both need to handle temporal inconsistencies across articles.]]></description>
            <link>https://mohammadshaker.com/en/blog/multi-document-summarization-the-what-why-and-how</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/multi-document-summarization-the-what-why-and-how</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Smart Services For Social Media Marketing]]></title>
            <description><![CDATA[NLP-powered smart services for social media marketing go beyond scheduling tools — they handle content selection, consumer intent analysis, trend detection, and automated generation. We survey the key service categories and the NLP techniques behind them: sentiment analysis, topic modeling, entity recognition, and text generation pipelines.]]></description>
            <link>https://mohammadshaker.com/en/blog/smart-services-for-social-media-marketing</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/smart-services-for-social-media-marketing</guid>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[social-media-marketing]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 26 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Auto-Tagging Content with NLP]]></title>
            <description><![CDATA[Auto-tagging articles with NLP can save writers significant time and improve content discoverability. The main approaches are NER-based candidate extraction, graph-based keyword ranking like TextRank, statistical methods like TF-IDF, and deep learning keyphrase generation. Each trades precision for coverage in different ways.]]></description>
            <link>https://mohammadshaker.com/en/blog/auto-tagging-content-with-nlp</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/auto-tagging-content-with-nlp</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Tue, 21 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Aspect-level Vs Entity-level Sentiment Analysis]]></title>
            <description><![CDATA[Document-level sentiment analysis misses critical nuance: a review can hate a phone's RAM but love its price. Aspect-level sentiment analysis (ALSA) and entity-level sentiment analysis (ELSA) solve this by pinpointing the sentiment target. This post explains the difference and why it matters for political bias detection.]]></description>
            <link>https://mohammadshaker.com/en/blog/aspect-level-vs-entity-level-sentiment-analysis</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/aspect-level-vs-entity-level-sentiment-analysis</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 19 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[From Sentiment to Political Bias in the Arab World and the Arabic Content]]></title>
            <description><![CDATA[Political bias detection in Arabic news requires going beyond sentiment analysis — political framing, entity alignment, and source stance all carry ideological signal. We trace the pipeline from basic sentiment labeling to multidimensional orientation detection, covering the unique challenges Arabic presents: dialectal variation, implicit framing, and geopolitical media alignment patterns.]]></description>
            <link>https://mohammadshaker.com/en/blog/from-sentiment-to-political-bias-in-the-arab-world-and-the-arabic-content</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/from-sentiment-to-political-bias-in-the-arab-world-and-the-arabic-content</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 19 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Multidimensional Topic Modelling. The What? and The How?]]></title>
            <description><![CDATA[Standard LDA assigns each document a topic distribution across a single latent dimension. Multidimensional topic modeling extends this to learn multiple latent variables simultaneously — such as topic, perspective, and writing style — in a single generative model, giving richer document representations for tasks like stance detection and bias analysis.]]></description>
            <link>https://mohammadshaker.com/en/blog/multidimensional-topic-modelling-the-what-and-the-how</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/multidimensional-topic-modelling-the-what-and-the-how</guid>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 19 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Viewpoint, Topic and Opinion Discovery in an Opinionated Document]]></title>
            <description><![CDATA[Two outlets can cover the same event with opposite viewpoints while using different words entirely. Probabilistic topic models like LDA and VODUM can discover these latent viewpoints, topics, and opinions from text without labels. This post explains the theory and how it applies to detecting partisan bias in news coverage.]]></description>
            <link>https://mohammadshaker.com/en/blog/viewpoint-topic-and-opinion-discovery-in-an-opinionated-document</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/viewpoint-topic-and-opinion-discovery-in-an-opinionated-document</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 19 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[An Overview of The Event Extraction Task in NLP]]></title>
            <description><![CDATA[Event extraction identifies triggers, arguments, and roles from text using NLP, covering ACE-style structured extraction and open-domain approaches.]]></description>
            <link>https://mohammadshaker.com/en/blog/an-overview-of-the-event-extraction-task-in-nlp</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/an-overview-of-the-event-extraction-task-in-nlp</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Analysis of the Readability Metric Results in Almeta News Feed]]></title>
            <description><![CDATA[Applying the AARIBase readability formula to Almeta's Arabic news feed shows a clear pattern: longer articles score lower on readability, and long sentences have an outsized effect. Two articles with identical read-time can differ dramatically in readability score based entirely on sentence length, not word complexity.]]></description>
            <link>https://mohammadshaker.com/en/blog/analysis-of-the-readability-metric-results-in-almeta-news-feed</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/analysis-of-the-readability-metric-results-in-almeta-news-feed</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Aspect Detection and Named Entity Linking (NEL): Using SPARQL and DBpedia]]></title>
            <description><![CDATA[Named entity linking (NEL) connects entity mentions in news text to structured knowledge bases like DBpedia via SPARQL queries, enabling richer aspect detection. We use this pipeline to capture how different publishers cover the same entity — person, organization, location — and surface cross-article interaction patterns.]]></description>
            <link>https://mohammadshaker.com/en/blog/aspect-detection-and-named-entity-linking-nel-using-sparql-and-dbpedia</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/aspect-detection-and-named-entity-linking-nel-using-sparql-and-dbpedia</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[engineering]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Automatically Tagging Data for Content Informativity Scoring]]></title>
            <description><![CDATA[Training a supervised informativeness classifier requires labeled data, but manually annotating thousands of Arabic articles is impractical. We use summarization-based similarity as a proxy label — comparing each article's lead paragraph to its summary — and validate this approach against the Kalimat and EASC Arabic datasets.]]></description>
            <link>https://mohammadshaker.com/en/blog/automatically-tagging-data-for-content-informativity-scoring</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/automatically-tagging-data-for-content-informativity-scoring</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[informativity]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Can you measure a text Informativeness using its summary?]]></title>
            <description><![CDATA[If a summary captures the essential information in an article, then similarity between the full text and its summary should proxy for informativeness. We test this hypothesis using ROUGE, cosine similarity, and other metrics against Arabic summarization datasets, examining whether high similarity reliably predicts that an article is informative rather than creative.]]></description>
            <link>https://mohammadshaker.com/en/blog/can-you-measure-a-text-informativeness-using-its-summary</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/can-you-measure-a-text-informativeness-using-its-summary</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Clickbait Detection Using Word2Vec Representation]]></title>
            <description><![CDATA[Representing clickbait headlines as averaged word2vec vectors captures semantic similarity between sensational phrases better than bag-of-words models. We use t-SNE projections to validate whether clickbait and non-clickbait headlines cluster separately in embedding space before training a classifier on Arabic news headlines.]]></description>
            <link>https://mohammadshaker.com/en/blog/clickbait-detection-using-word2vec-representation</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/clickbait-detection-using-word2vec-representation</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Comparison of Available TTS Services]]></title>
            <description><![CDATA[A comparison of text-to-speech services for Arabic, evaluating Google Cloud TTS WavNet and other APIs on voice quality, latency, and cost.]]></description>
            <link>https://mohammadshaker.com/en/blog/comparison-of-available-tts-services</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/comparison-of-available-tts-services</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[How to Detect Cliches in Text]]></title>
            <description><![CDATA[Cliches reduce text informativeness because they carry no new information — the reader already knows what comes next. Detecting them computationally means extracting collocations with high PMI scores, then cross-referencing against known idiom lists. High cliche density is a reliable signal that an article is formulaic rather than informative.]]></description>
            <link>https://mohammadshaker.com/en/blog/how-to-detect-cliches-in-text</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/how-to-detect-cliches-in-text</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[How to Detect Clickbait Headlines using NLP?]]></title>
            <description><![CDATA[Clickbait detection in NLP uses a combination of linguistic features: exaggerated sentiment, forward-reference patterns, question-form headlines, and punctuation abuse. We survey the key NLP approaches — from rule-based methods to machine learning classifiers — and explain how each maps to observable headline characteristics.]]></description>
            <link>https://mohammadshaker.com/en/blog/how-to-detect-clickbait-headlines-using-nlp</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/how-to-detect-clickbait-headlines-using-nlp</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[How to Measure Text Readability?]]></title>
            <description><![CDATA[Text readability formulas like Flesch-Kincaid and Gunning Fog Index estimate difficulty from sentence length and syllable count — but they were designed for English. Arabic readability requires adapted metrics like AARIBase, which weighs character count, average word length, and average sentence length to score Arabic text difficulty.]]></description>
            <link>https://mohammadshaker.com/en/blog/how-to-measure-text-readability</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/how-to-measure-text-readability</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[How to Rank Articles Based on How Informative They Are - Using Snorkel]]></title>
            <description><![CDATA[Ranking articles by informativeness is easier than scoring them. Humans naturally excel at pairwise comparisons, and AI models mirror that advantage. Snorkel's weak supervision framework lets you build an informativeness ranker without expensive manual labels by turning human heuristics into labeling functions.]]></description>
            <link>https://mohammadshaker.com/en/blog/how-to-rank-articles-based-on-how-informative-they-are-using-snorkel</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/how-to-rank-articles-based-on-how-informative-they-are-using-snorkel</guid>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Informativity Detection - Almeta's Research Gist]]></title>
            <description><![CDATA[Measuring article informativeness requires breaking an abstract concept into quantifiable features: readability, cliche density, term-level informativeness, and skimmability. Almeta's approach treats informativeness as a supervised learning problem — training a model on proxy-labeled data from human summaries to rank Arabic news articles by quality.]]></description>
            <link>https://mohammadshaker.com/en/blog/informativity-detection-almetas-research-gist</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/informativity-detection-almetas-research-gist</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Political Orientation Detection - AI and NLP Approach]]></title>
            <description><![CDATA[Political orientation detection automatically classifies news articles along ideological dimensions using NLP — left, right, or center. The main approaches use stance classification, topic modeling, and framing analysis. Arabic media presents additional challenges because political alignment often maps to geopolitical affiliation rather than a simple left-right spectrum.]]></description>
            <link>https://mohammadshaker.com/en/blog/political-orientation-detection-ai-and-nlp-approach</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/political-orientation-detection-ai-and-nlp-approach</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Search Service Frameworks Evaluation]]></title>
            <description><![CDATA[Choosing between Lucene, Elasticsearch, and Solr for an Arabic NLP application depends on more than raw indexing speed. Arabic language support — tokenization, diacritization, stemming — is often the deciding factor. We evaluate all three frameworks on indexing performance, query capabilities, Arabic-language handling, and horizontal scalability.]]></description>
            <link>https://mohammadshaker.com/en/blog/search-service-frameworks-evaluation</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/search-service-frameworks-evaluation</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Stance Detection - State of the Art]]></title>
            <description><![CDATA[Stance detection classifies a text's position toward a specific target — agree, disagree, or neutral — and splits into two branches: objective stance for fact verification and subjective stance for opinion classification. State-of-the-art methods range from feature-engineered classifiers to attention-based deep learning models fine-tuned on task-specific datasets.]]></description>
            <link>https://mohammadshaker.com/en/blog/stance-detection-state-of-the-art</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/stance-detection-state-of-the-art</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Subjective Stance Detection What is it? and How to do it?]]></title>
            <description><![CDATA[Subjective stance detection extracts sentiment toward a specific target — not the whole document — enabling fine-grained opinion mining. Given a review saying 'The RAM is small, but the price is low,' it can identify opposing sentiments for each aspect. This post explains the task and covers both feature-based and neural approaches.]]></description>
            <link>https://mohammadshaker.com/en/blog/subjective-stance-detection-what-is-it-and-how-to-do-it</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/subjective-stance-detection-what-is-it-and-how-to-do-it</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[engineering]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Supervised Article Informativeness Prediction - The What and the How]]></title>
            <description><![CDATA[Supervised informativeness prediction trains a classifier to score text quality using features like readability, cliche density, content-to-function word ratio, and term informativeness. We explain the full pipeline — feature extraction, label generation, model selection, and evaluation — applied to Arabic news articles in the Almeta project.]]></description>
            <link>https://mohammadshaker.com/en/blog/supervised-article-informativeness-prediction-the-what-and-the-how</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/supervised-article-informativeness-prediction-the-what-and-the-how</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Term Informativeness Estimation in the Arabic Language]]></title>
            <description><![CDATA[Not all words contribute equally to a document's meaning. Term informativeness estimation assigns a score to each word or phrase based on how much unique content it carries — using TF-IDF variants, semantic similarity, and corpus-level statistics. We apply these methods to Arabic text, where morphological richness makes term boundaries harder to define.]]></description>
            <link>https://mohammadshaker.com/en/blog/term-informativeness-estimation-in-the-arabic-language</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/term-informativeness-estimation-in-the-arabic-language</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[What Makes an Article Informative - And How Computers Can Measure Informativity of a Text Content]]></title>
            <description><![CDATA[An informative article is easy to recognize but hard to define computationally. Measurable proxies include readability score, skimmability via header and list structure, content-to-function word ratio, cliche density, and term-level informativeness. We explore how to combine these features into a quantitative informativeness signal for Arabic news.]]></description>
            <link>https://mohammadshaker.com/en/blog/what-makes-an-article-informative-and-how-computers-can-measure-informativity-of</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/what-makes-an-article-informative-and-how-computers-can-measure-informativity-of</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sat, 18 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Google's AutoML Overview]]></title>
            <description><![CDATA[Google AutoML lets teams with limited ML expertise build custom models for text classification, translation, and image recognition. Its main strength is automating architecture search and training. Its key limitation for Arabic NLP is that several services are English-first, with variable quality on other languages.]]></description>
            <link>https://mohammadshaker.com/en/blog/googles-automl-overview</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/googles-automl-overview</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Thu, 16 Jan 2020 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[How to Fact-Check using Natural Language Processing Techniques? A Literature Review]]></title>
            <description><![CDATA[Automated fact-checking with NLP breaks into three sub-tasks: claim detection, evidence retrieval, and verdict prediction. Closed-source tools like FullFact cover well-known claims; open research systems use knowledge graphs, stance classifiers, and claim verification models. We survey the landscape and explain what each approach can and cannot handle.]]></description>
            <link>https://mohammadshaker.com/en/blog/how-to-fact-check-using-natural-language-processing-techniques-a-literature-revi</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/how-to-fact-check-using-natural-language-processing-techniques-a-literature-revi</guid>
            <category><![CDATA[almeta.io]]></category>
            <category><![CDATA[fact-check]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[research]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Tue, 08 Oct 2019 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Event Detection in Media using NLP and AI]]></title>
            <description><![CDATA[Event detection in NLP automatically identifies real-world occurrences in news text — who did what, where, and when. Document-level approaches cluster articles by topic; sentence-level approaches extract ACE-style event triggers and arguments. Both are foundational for news aggregators, fact-checking systems, and media monitoring pipelines.]]></description>
            <link>https://mohammadshaker.com/en/blog/event-detection-in-media-using-nlp-and-ai</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/event-detection-in-media-using-nlp-and-ai</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[ml]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Mon, 30 Sep 2019 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Top 3 Exciting Ideas in NLP in 2018]]></title>
            <description><![CDATA[Three NLP ideas from 2018 reshaped the field: BERT's bidirectional pre-training that reads context from both directions, the SWAG benchmark for commonsense reasoning across 113k question pairs, and LISA — a model that jointly learns syntactic structure and semantic role labeling in a single pass. Each attacked a different gap between machine and human language understanding.]]></description>
            <link>https://mohammadshaker.com/en/blog/top-3-exciting-ideas-in-nlp-in-2018</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/top-3-exciting-ideas-in-nlp-in-2018</guid>
            <category><![CDATA[ideas]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Sun, 04 Aug 2019 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[Biggest Challenges in Arabic Natural Language Processing]]></title>
            <description><![CDATA[Arabic NLP is harder than most languages for three compounding reasons: rich morphology produces thousands of word forms from a single root, dialectal variation across 26 countries means Modern Standard Arabic models often fail on colloquial text, and annotated training data remains scarce compared to English. Each challenge amplifies the others.]]></description>
            <link>https://mohammadshaker.com/en/blog/biggest-challenges-in-arabic-natural-language-processing</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/biggest-challenges-in-arabic-natural-language-processing</guid>
            <category><![CDATA[arabic]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Thu, 01 Aug 2019 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title><![CDATA[4 Biggest Open Problems in NLP]]></title>
            <description><![CDATA[The four biggest open problems in NLP are natural language understanding, ambiguity resolution, training data scarcity, and semantic meaning extraction. Ambiguity alone covers lexical, syntactic, and referential confusion that models still struggle with. Despite LLM advances, these challenges remain unsolved research frontiers with no clean algorithmic fix.]]></description>
            <link>https://mohammadshaker.com/en/blog/4-biggest-open-problems-in-nlp</link>
            <guid isPermaLink="true">https://mohammadshaker.com/en/blog/4-biggest-open-problems-in-nlp</guid>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Mohammad Shaker]]></dc:creator>
            <pubDate>Fri, 26 Jul 2019 00:00:00 GMT</pubDate>
        </item>
    </channel>
</rss>