What is TF-IDF?

TF-IDF means term frequency – inverse document frequency. It is a form of information retrieval that gives weight to each word on the page. To do this a search engine will use the following formula:

(Term Frequency / Total # of Terms) * log(Total # of Documents / Documents with Term)

Here are some examples of how that weighting might look:

  • (13/1500) * log(1,000,000,000/1,000,000) = 0.026
  • (22/1000) * log(1,000,000,000/500,000) = 0.073

This formula increases the weight on lexical morphemes and reduces the weight on functional morphemes. Examples of words that are functional include ‘the’, ‘and’, ‘him’, ‘or’, ‘her’.

These are a combination of conjunctions, prepositions, articles, and pronouns. Since they appear in almost every document, their weighting is reduced with TF-IDF. Therefore, the rarest words are weighted most valuable.

TF-IDF Scatter Graph

How can I check TF-IDF?

To check for TF-IDF, you may wish to use free tools such as Wordcounter on your website. This tool will show you the most commonly used words for that page.

The column you’re looking for is titled ‘non-common keywords’. These are the words that show up least across the web.

Through manual checking, you can quickly pick up toward which terms your page has an affinity.

Using Wordcounter for TF-IDF

How can I improve my TF-IDF?

The best way to improve TF-IDF is to create meaningful content that is concise and interesting. Therefore you should avoid repetition and look for the terms that aren’t mentioned.

Shorter content will improve the term frequency while using rare terms will enhance the inverse document frequency. Combined these will give you lots of valuable descriptive words for each page.

Two useful tools for this task are the online Thesaurus and Dictionary. These tools can help in discovering words within the same field that inspire your content.

All of these are considered part of keyword research for your website.

Using a Thesaurus for Keyword Research