Lexical Cohesion and SEO for Bloggers

At some point if you write a blog, you will get a nagging feeling that you out to be doing more about SEO, or Search Engine Optimisation. Or the plugin you use round the back end of the blog to monitor the situation will start nagging you about it, which is what generally happens in my case.

Search Engine Optimisation is how you explain to search engines (well, Google really, let’s face it) what your blog post is about and why they should put it number one in the list of any results when someone types in a query related to your topic.

And this has a lot to do with what discourse analysis says about aspects of how texts hang together. And the limitations of computing. Or possibly the limitations of people’s understanding of computing and/ or discourse. I haven’t quite decided which yet.

Now, obviously we are all aware that texts have structure. Introductions, arguments for, arguments against, conclusion. Sort of thing.

But we also use words to tie it all together, which is called lexical cohesion.

Image by Alicja from Pixabay

One way we do that is via referencing. ‘Boris Johnson’ will not always appear as ‘Boris Johnson’ in a 2025 newspaper article about the last Prime Minister of a United Kingdom. He might become ‘he’.

But he may well also be ‘Boris’, ‘Johnson’, ‘the Prime Minister’, ‘that shambolic figure’, ‘the Churchill wannabe’ and ‘the man of ruthless self interest’. Which is an example of elegant variation in a lexical chain of words and phrases all really referring to the same thing. It is considered good style and is a feature of the way we recognise a text as a text and not a list of random sentences.

One of the things linguists have been able to do with the advent of computing is feed texts into a computer to make a bank of language samples called a corpus. They can then analyse this corpus for patterns of language use with far greater statistical authority than some old English geezer with a quill pen using his intuition and a few interviews to work out which words we actually use and what they mean in order to write them all down in the first dictionary.

And one of the things you can look at using this sort of software is how the words of a text tell you what it is about.

(I know! The revelation! Bear with me!)

This has led (via a few other steps) to the somewhat depressing conclusion that in order to understand comfortably any given English language text, you need to know in the region of the most frequent 16 000 words*. Just to give you some idea of why this is somewhat horrifying for a learner of English, survival levels, a sort of every day vocabulary for getting about, is said to be 2 – 3 000 words and once you reach Upper Intermediate level, which is pretty jolly competent really, you might expect to know 10 000. For a given value of know. For a given value of words too.

So an extra 6 000 is quite a big ask, especially as there are some complicating factors which I am not going to get drawn into here.

Why has it led to this conclusion?

Well, a large proportion of a text, a very large proportion, will be made up of very common words. But the final few percent are the words that are specific to the topic of the text. And even in non-specialist texts, this will probably involve a whole bunch of words that are not at all frequent, words that are connected to the topic at hand. Words that, if you broke the text down into its component parts, you might be able to categorise as ‘verbs to describe ways of eating’, ‘adjectives to describe smell, taste and food texture’, ‘different ways of referring to cooking techniques’, and ‘ lots of synonyms or near synonyms for “cod”, “fried potatoes” and “eating with your fingers”’. Until you could confidently say without actually reading it from start to finish that this text is probably about a foodstuff called fish and chips.

Which, of course, is what Google’s search engines (and similar) do when they are trying to work out what your online text is about in preparation for serving it up when someone types’fish and chips’ in the search bar.

More or less.

However, there is a lot of writing out there on the internet now. If you want your piece to compete with all of the many many lots of words about fish and chips you are going to have to make it stand up and SCREAM ‘my post is the most relevant’.

Preferably by using the exact fish and chip related phrase that people are googling for.

In the past, this meant breaking some of the offline text construction tendencies in that once you had settled on your keywords, you overdid the repetition element of lexical cohesion. Which is always a feature of texts, because after all there really are only so many ways you can call Boris Johnson an idiot before you run out of paraphrase. So a text about Boris Johnson will indeed contain a higher number of examples of the name Boris Johnson than you would expect in a text which is not about Boris Johnson.

But only to a certain degree. Normally.

Have you, in short, ever come across a post online and noticed that they seem to have rather overdone the term ‘learn about SEO and lexical cohesion’ by shoehorning that exact phrasing no less than three times into one short paragraph? And thereafter another twenty times with no variation at all for the rest of the piece?

It doesn’t help that such writers will also eschew ellipsis (If you want to learn about SEO and lexical cohesion this article will give you everything you need to learn about SEO and lexical cohesion) or substitution (If you have read a thousand articles where you learn about SEO and lexical cohesion, you may not want to read another article where you learn about SEO and lexical cohesion one, but…).

Luckily for teachers trying to find authentic online texts to train our students on, this practice is starting to be considered outdated, presumably because either search engines are now a little more sophisticated, or the people who try to game them have become more sophisticated in their understanding of how they work.

Although my SEO plugin does still tell me off if I use the keyword I am trying to rank for fewer than X number of times for a given length of text, which generally results in me having to go back and do a bit of editing out of pronouns or synonyms.

I suspect there is still a bit of an advantage therefore in a using your keywords more frequently than seems natural (to me), but I have not tried out what happens if I go overboard and use too many examples of the SEO keywords. Is there an upper limit according to Yoast? Might have to experiment.

Of course, more interesting is what will happen if copywriters win and shift what is considered the optimal, correct, elegant or [insert your own adjective here] way to achieve lexical cohesion in a text. Certainly one wonders what keyword stuffing has done to corpus data in the meantime.

And of course, I have made a massive assumption that this practice should be considered ‘wrong’, which is always a dangerous thing to do when considering language use in the wild.

Have you ever noticed this phenomenon or is it less glaringly obvious than I think it is? Would it not bother you if it became the norm or are you relieved it is on the wane?

*I got this from From Corpus to Classroom: Language Use and Language Teaching by O’Keefe, MacCarthy and Carter.

Hello World

My name is Heather. I am a native Brit, and I have been an English as a Foreign Language teacher since 1996.

I started out volunteering as a teaching assistant in Moscow. Having discovered to my complete surprise I really liked teaching, I got qualified in EFL instruction, moved to Russia full time and never looked back. Except that time I taught History to teenagers.

Since then I have worked in both the UK and Russia, in private language schools and the state sector, as a teacher, an academic manager and a teacher trainer.

This blog isn’t really about that though. It’s my love letter to discourse analysis, social media and online communication.

Image by Free Photos from Pixabay

What discourse analysis is can be quite hard to define. Whole books have been written on the topic, but let’s have a stab, shall we?

Discourse analysis is the study of language at text level, with text being defined much more widely than neatly complete written articles in newspapers or whole novels. It’s a fairly interdisciplinary sort of field involving everyone from linguists, the language teaching profession, sociologists, anthropologists, to computer scientists trying to programme AI, and that’s not even an exhaustive list.

To me, it’s super interesting because it’s where sentences or words stop and communication begins. It’s about the choice of phrasing. What intonation does to the message. It’s about the aspects of language which cannot be described by a grammar reference book. And it’s about the nature of how we cope with trying to construct utterances in real time, and what happens when we can wield words with more consideration.

And it’s about why it all goes wrong and we have cut all ties with Auntie Vera because of the way she used ‘well’ on WhatsApp.

I will mostly be writing about whatever I have last been reading on the topic, possibly illustrated with stuff people have said on social media. I love online communication. I happen to think that because it is an interesting blend of spoken and written language, it has turned us all into discourse analysts. Moves that people might have got away with in ephemeral speaking get clocked much more easily by casual onlookers on the Internet. Plus, of course, some of the gloves are off in a medium which transcends the need to get along with your neighbour for the foreseeable future.

I wanted to call the blog WEAPONISING DISCOURSE ANALYSIS ON SOCIAL MEDIA, in fact.

But I couldn’t figure out how to fit that neatly into a URL. Or come up with a version of the name that would not get me an immediate reputation on Twitter.

So Those Sharp Words it is. Thanks to my friend who is much better at snappy titles than I am.

I am highly unlikely to have an original thought on this topic. I am not going to be doing formal discourse analysis myself. But I hope there are other people out there who find this as interesting as I do, and I am looking forward to connecting with them.