What are NGrams?

N-Grams are a contiguous sequence of characters, syllables, phonemes or words found in text and speech. When used to discuss a series of words in sequence, some people refer to the sequence as shingles.

However, since Shingles is also the name for a rash, we’ll refer to them as NGrams or N-Grams. When there are two words in sequence, this is called a Bigram and three words a Trigram and so forth.

While any combination of words may be considered a bigram or trigram, the objective is to identify common pairs or triplets. For example, “loved ones” is a bigram that frequently appears in social care and pet care industries, but not as often for automobiles.

These sequences of words are common in English and form part of the Natural Language Processing (NLP) and Natural Language Understanding (NLU). In English, words can modify the semantic meaning without changing the literal meaning. Here are some examples:

  • Time flies when you’re having fun.
  • Fruit flies like a banana.

In the former example, the word flies is used as a verb to mean passing quickly. However, the second example flies is used as a noun to refer to insects. Therefore, we can see that the surrounding words modify meaning.

How can I check my NGrams?

Google has kindly provided an NGram Viewer for all the books they analysed and offer some great examples. For example, the name Albert is often referenced alongside the surname Einstein, but this behaviour started around 1920.

Another great tool you can use is the Guide to Data Mining Analyser, which helps filter the NGrams out of a body of text. It is a great tool for quickly finding useful bigrams and trigrams without reading your competitor’s content. However, you will lose a lot of information about the tone and audience by skipping competitor research.

Google's NGram Viewer

How can I improve my NGrams?

Since NGrams form a part of natural language processing, it makes sense to think about these terms naturally. In this way, being familiar with the topic will most likely help you to use bigrams or trigrams effectively. Furthermore, spending the time to consider your audience and the tone will improve your writing.

The temptation of search analysts is always to view content as a commodity, not as a service. Therefore, the most common question is can I modify or improve the NGrams in my text?

If you read and research the content in the SERPs, you will often stumble across repeated phrases. These contiguous sequences can often be as long as 4 – 5 words in length. Including these phrases in your copy can improve your chances of ranking.

Another great way to improve the use of NGrams in your content is through Keyword Research. When you discover long-tail phrases that would make sense in your content – you should include them! I have found that including them 2 – 3 times often works best.

How many bigrams and trigrams should I use?

Depending on how specific is your bigram or trigram will, most likely, change how frequently you should use the phrase. Broad phrases may appear more frequently in the article. For expressions that are specific, you should use them 2 – 3 times per page.

While there are no limits to your usage, it is uncommon in English to repeat the same phrases. An example of this is the use of pronouns instead of common nouns.

In summary, you shouldn’t worry too much about NGram usage. However, you should consider whether your content is sounding stale or repetitive. Furthermore, for peculiar expressions limit yourself to 2 – 3 uses per web page.