At some point if you write a blog, you will get a nagging feeling that you out to be doing more about SEO, or Search Engine Optimisation. Or the plugin you use round the back end of the blog to monitor the situation will start nagging you about it, which is what generally happens in my case.
Search Engine Optimisation is how you explain to search engines (well, Google really, let’s face it) what your blog post is about and why they should put it number one in the list of any results when someone types in a query related to your topic.
And this has a lot to do with what discourse analysis says about aspects of how texts hang together. And the limitations of computing. Or possibly the limitations of people’s understanding of computing and/ or discourse. I haven’t quite decided which yet.
Now, obviously we are all aware that texts have structure. Introductions, arguments for, arguments against, conclusion. Sort of thing.
But we also use words to tie it all together, which is called lexical cohesion.
One way we do that is via referencing. ‘Boris Johnson’ will not always appear as ‘Boris Johnson’ in a 2025 newspaper article about the last Prime Minister of a United Kingdom. He might become ‘he’.
But he may well also be ‘Boris’, ‘Johnson’, ‘the Prime Minister’, ‘that shambolic figure’, ‘the Churchill wannabe’ and ‘the man of ruthless self interest’. Which is an example of elegant variation in a lexical chain of words and phrases all really referring to the same thing. It is considered good style and is a feature of the way we recognise a text as a text and not a list of random sentences.
One of the things linguists have been able to do with the advent of computing is feed texts into a computer to make a bank of language samples called a corpus. They can then analyse this corpus for patterns of language use with far greater statistical authority than some old English geezer with a quill pen using his intuition and a few interviews to work out which words we actually use and what they mean in order to write them all down in the first dictionary.
And one of the things you can look at using this sort of software is how the words of a text tell you what it is about.
(I know! The revelation! Bear with me!)
This has led (via a few other steps) to the somewhat depressing conclusion that in order to understand comfortably any given English language text, you need to know in the region of the most frequent 16 000 words*. Just to give you some idea of why this is somewhat horrifying for a learner of English, survival levels, a sort of every day vocabulary for getting about, is said to be 2 – 3 000 words and once you reach Upper Intermediate level, which is pretty jolly competent really, you might expect to know 10 000. For a given value of know. For a given value of words too.
So an extra 6 000 is quite a big ask, especially as there are some complicating factors which I am not going to get drawn into here.
Why has it led to this conclusion?
Well, a large proportion of a text, a very large proportion, will be made up of very common words. But the final few percent are the words that are specific to the topic of the text. And even in non-specialist texts, this will probably involve a whole bunch of words that are not at all frequent, words that are connected to the topic at hand. Words that, if you broke the text down into its component parts, you might be able to categorise as ‘verbs to describe ways of eating’, ‘adjectives to describe smell, taste and food texture’, ‘different ways of referring to cooking techniques’, and ‘ lots of synonyms or near synonyms for “cod”, “fried potatoes” and “eating with your fingers”’. Until you could confidently say without actually reading it from start to finish that this text is probably about a foodstuff called fish and chips.
Which, of course, is what Google’s search engines (and similar) do when they are trying to work out what your online text is about in preparation for serving it up when someone types’fish and chips’ in the search bar.
More or less.
However, there is a lot of writing out there on the internet now. If you want your piece to compete with all of the many many lots of words about fish and chips you are going to have to make it stand up and SCREAM ‘my post is the most relevant’.
Preferably by using the exact fish and chip related phrase that people are googling for.
In the past, this meant breaking some of the offline text construction tendencies in that once you had settled on your keywords, you overdid the repetition element of lexical cohesion. Which is always a feature of texts, because after all there really are only so many ways you can call Boris Johnson an idiot before you run out of paraphrase. So a text about Boris Johnson will indeed contain a higher number of examples of the name Boris Johnson than you would expect in a text which is not about Boris Johnson.
But only to a certain degree. Normally.
Have you, in short, ever come across a post online and noticed that they seem to have rather overdone the term ‘learn about SEO and lexical cohesion’ by shoehorning that exact phrasing no less than three times into one short paragraph? And thereafter another twenty times with no variation at all for the rest of the piece?
It doesn’t help that such writers will also eschew ellipsis (If you want to learn about SEO and lexical cohesion this article will give you everything you need
to learn about SEO and lexical cohesion) or substitution (If you have read a thousand articles where you learn about SEO and lexical cohesion, you may not want to read another article where you learn about SEO and lexical cohesion one, but…).
Luckily for teachers trying to find authentic online texts to train our students on, this practice is starting to be considered outdated, presumably because either search engines are now a little more sophisticated, or the people who try to game them have become more sophisticated in their understanding of how they work.
Although my SEO plugin does still tell me off if I use the keyword I am trying to rank for fewer than X number of times for a given length of text, which generally results in me having to go back and do a bit of editing out of pronouns or synonyms.
I suspect there is still a bit of an advantage therefore in a using your keywords more frequently than seems natural (to me), but I have not tried out what happens if I go overboard and use too many examples of the SEO keywords. Is there an upper limit according to Yoast? Might have to experiment.
Of course, more interesting is what will happen if copywriters win and shift what is considered the optimal, correct, elegant or [insert your own adjective here] way to achieve lexical cohesion in a text. Certainly one wonders what keyword stuffing has done to corpus data in the meantime.
And of course, I have made a massive assumption that this practice should be considered ‘wrong’, which is always a dangerous thing to do when considering language use in the wild.
Have you ever noticed this phenomenon or is it less glaringly obvious than I think it is? Would it not bother you if it became the norm or are you relieved it is on the wane?
*I got this from From Corpus to Classroom: Language Use and Language Teaching by O’Keefe, MacCarthy and Carter.