The Lexical Approach —
Chunks, Collocations, and Corpus Linguistics
“Language is not grammar plus words. It is chunks, collocations, and patterns. Corpus linguistics proves it. The Lexical Approach changes everything.”
Why the Lexical Approach Replaces Grammar-First Teaching
The Lexical Approach, pioneered by Michael Lewis in the 1990s and validated by corpus linguistics, argues that language is not generated by grammar rules but retrieved as pre-fabricated chunks. Native speakers use a lexicon of multi-word units (collocations, idioms, phrasal verbs, fixed expressions) far more than they generate sentences from scratch.
Corpus research shows that 50–70% of spoken and written English consists of recurrent multi-word combinations. Teaching single words and grammar rules in isolation produces students who know the rules but cannot communicate.
- 90% of spoken English uses the 4,000 most frequent lexical items — but these include thousands of multi-word chunks (e.g., “by the way,” “at the moment,” “I mean”).
- The top 2,000 collocations (verb+noun, adj+noun, adv+adj) account for 80% of fluent speech.
- Grammar “rules” are often post-hoc descriptions of patterns that emerge from lexical chunks, not generators of language.
Teaching must shift from grammar + words to lexical chunks + patterns.
🔍 The Grammar Myth — Why Rules Fail in Real Use
Traditional grammar teaching assumes that learners assemble sentences from words using rules. Corpus data proves this is false:
- Frequency mismatch: The most common structures in textbooks (e.g., third conditional) rarely appear in real speech. The most frequent chunks (e.g., “I think,” “you know”) are rarely taught as units.
- Collocation primacy: Native speakers say “make a decision,” not “do a decision,” regardless of grammar rules. Collocations override syntax.
- Chunk retrieval: Fluency depends on retrieving stored phrases, not applying rules. “How are you?” is retrieved as a chunk, not generated from “how + be + you.”
- Pattern grammar: Rules like “adjective order” (opinion → size → color) are descriptions of lexical patterns, not generators. No one “applies” the rule; they retrieve the chunk (“a nice big red ball”).
Chunks — The Building Blocks of Fluency
A chunk is a sequence of words that native speakers store, retrieve, and use as a single unit. Chunks include:
- Fixed expressions: “by the way,” “as a matter of fact,” “at the end of the day”
- Phrasal verbs: “pick up,” “give in,” “run out of”
- Idioms: “kick the bucket,” “hit the books,” “spill the beans”
- Discourse markers: “you know,” “I mean,” “sort of”
- Lexical frames: “I was wondering if…,” “Would you mind…?”
┌───────────────────────┐
│ I was wondering if... │ ← Fixed opening (9 words)
└───────────┬───────────┘
│
┌───────────┴───────────┐
│ you could help me │ ← Variable slot (request)
│ you'd like to join us │
│ we could meet tomorrow│
│ you're free on Friday │
└───────────────────────┘
Why this is a chunk:
- Native speakers retrieve “I was wondering if…” as a single unit.
- The variable slot is limited to polite requests or invitations.
- It is not generated by applying the past continuous rule to “wonder.”
- Corpus frequency: Top 500 most common phrases in spoken English (Cambridge English Corpus).
Seven Types of Chunks — With Corpus Frequency
| Chunk Type | Example | Corpus Frequency (per million words) | Teaching Priority |
|---|---|---|---|
| Fixed expressions | “by the way,” “as a matter of fact,” “at the end of the day” | 500–2,000 | High (A1–B1) |
| Phrasal verbs | “pick up,” “give in,” “run out of” | 300–1,500 | High (A2–B2) |
| Idioms | “kick the bucket,” “hit the books,” “spill the beans” | 50–500 | Medium (B1–C1) |
| Discourse markers | “you know,” “I mean,” “sort of” | 1,000–5,000 | Critical (A1–C2) |
| Lexical frames | “I was wondering if…,” “Would you mind…?” | 200–1,000 | High (A2–B2) |
| Collocations | “make a decision,” “take a break,” “strong coffee” | 500–3,000 | Critical (A1–C2) |
| Sentence builders | “The thing is,…,” “What I mean is…” | 300–1,200 | High (B1–C1) |
- “you know” (1,200/million)
- “I mean” (950/million)
- “a lot of” (800/million)
- “by the way” (750/million)
- “at the moment” (700/million)
Teaching these 20 chunks would improve fluency more than teaching the present perfect tense.
How to Teach Chunks — The Three-Stage Method
- Notice: Highlight the chunk in a text or transcript. Use color, underlining, or bold to make it stand out.
- Retrieve: Drill the chunk until students can produce it without hesitation. Use back-chaining, stress marking, and minimal pairs.
- Use: Create a communicative task where the chunk is the natural response. Avoid artificial gap-fills.
🔍 Why Drilling Works — The Cognitive Science
Drilling chunks creates neural pathways that allow for automatic retrieval. Research shows:
- Spaced repetition (revisiting the chunk over days/weeks) increases retention by 200% (Ebbinghaus, 1885).
- Production practice (saying the chunk aloud) is 3x more effective than recognition (reading/listening) (Craik & Lockhart, 1972).
- Stress and intonation drilling improves intelligibility more than segmental (individual sound) drilling (Jenkins, 2000).
- Chunk retrieval speed predicts fluency better than grammar accuracy (Segalowitz, 2010).
- Noticing the chunk (input).
- Retrieving the chunk (output).
- Using the chunk in communication (interaction).
🔍 Chunk Teaching Activities — Ranked by Effectiveness
| Activity | How It Works | Effectiveness | Best For |
|---|---|---|---|
| Back-chaining | Drill the chunk from the end: “day” → “the day” → “end of the day” → “at the end of the day” | ⭐⭐⭐⭐⭐ | All chunks, especially long ones |
| Chunk dictation | Teacher reads a text rich in chunks; students write only the chunks they hear | ⭐⭐⭐⭐ | Noticing chunks in speech |
| Chunk bingo | Students mark chunks they hear in a listening text (e.g., “by the way,” “I mean”) | ⭐⭐⭐⭐ | Discourse markers, fixed expressions |
| Chunk substitution | Give a chunk frame: “I was wondering if _____” — students complete with their own ideas | ⭐⭐⭐⭐ | Lexical frames, polite requests |
| Chunk role-plays | Create a dialogue where the only way to sound natural is to use the target chunks | ⭐⭐⭐⭐⭐ | Phrasal verbs, idioms, discourse markers |
| Chunk transformation | Convert a formal sentence to a chunk: “I request your assistance” → “Could you help me?” | ⭐⭐⭐ | Formal → informal registers |
| Chunk speed drills | Students race to say the chunk as fast as possible (e.g., “at the end of the day”) | ⭐⭐⭐⭐ | Fluency building |
Classroom Examples — Chunk Lessons
🔍 Example 1: “By the way…” (A2)
Context: Show a transcript of a casual conversation with “by the way” highlighted 5 times. Play the audio clip.
A: So, how was your weekend? B: Great! We went hiking. By the way, did you see the email from John? A: No, what did he say? B: He wants us to meet tomorrow. By the way, are you free in the afternoon?
Noticing: “How many times do you hear ‘by the way’? What does it signal each time?” (Answer: Topic shift.)
Retrieving: Back-chain drill: “way” → “the way” → “by the way.” Stress on “BY” and “way.”
Using: Role-play: “You’re at a café with a friend. Start with small talk, then use ‘by the way’ to change the topic to something important.”
🔍 Example 2: “I was wondering if…” (B1)
Context: Show a video clip of a polite request (“I was wondering if you could help me…”). Pause and highlight the chunk.
A: Excuse me, I was wondering if you could help me. B: Of course. What do you need? A: I was wondering if you have a moment to look at this.
Noticing: “Is this past tense? No — it’s a polite way to ask now. What follows the chunk?” (Answer: A request.)
Retrieving: Drill: “if you could help me” → “I was wondering if you could help me.” Stress on “WON-dering” and “IF.”
Using: Pair work: “Ask your partner for 3 things using ‘I was wondering if…’. Make them realistic (e.g., borrow a pen, check an answer).”
Collocations — The Glue of Natural Language
A collocation is a combination of words that frequently appear together and sound “natural” to native speakers. Collocations are not governed by grammar rules but by lexical habit.
Strong collocations (dark green) are fixed and must be taught as units. Weak collocations (light) are more flexible but still preferred.
Seven Types of Collocations — With Examples
| Type | Example | Corpus Frequency | Teaching Priority |
|---|---|---|---|
| Verb + Noun | make a decision, take a break, do homework | High (500–2,000/million) | A1–B2 |
| Adjective + Noun | strong coffee, heavy rain, deep sleep | High (300–1,500/million) | A1–B2 |
| Noun + Noun | a piece of cake, a cup of tea, a bar of soap | Medium (200–800/million) | A2–B2 |
| Adverb + Adjective | deeply disappointed, highly unlikely, utterly ridiculous | Low (50–300/million) | B1–C1 |
| Noun + Verb | the sun rises, the wind blows, prices fall | Medium (100–500/million) | B1–C1 |
| Verb + Preposition | depend on, listen to, believe in | High (400–1,200/million) | A2–B2 |
| Adjective + Preposition | afraid of, interested in, good at | Medium (200–700/million) | A2–B2 |
- make a decision (1,200/million)
- take a look (950/million)
- have a break (800/million)
- do your best (750/million)
- give an example (700/million)
Teaching these 10 collocations would improve fluency more than teaching the past perfect continuous.
How to Teach Collocations — The Four-Step Method
- Notice: Highlight collocations in a text. Use color-coding or underlining.
- Record: Students write collocations in a dedicated notebook section.
- Retrieve: Drill collocations with back-chaining and stress marking.
- Reuse: Create tasks where collocations are the natural solution.
🔍 Why Collocations Matter More Than Grammar
Corpus research shows that:
- 80% of student errors are collocational, not grammatical (e.g., “do a mistake” vs. “make a mistake”).
- Native speakers judge non-native speech as “unnatural” 10x more often for collocational errors than grammar errors.
- Collocations are 3x more frequent in speech than complex grammar (e.g., third conditional).
- Collocation knowledge predicts writing scores better than grammar knowledge (IELTS research).
🔍 Collocation Teaching Activities — Ranked by Effectiveness
| Activity | How It Works | Effectiveness | Best For |
|---|---|---|---|
| Collocation cards | Students match verb cards (“make”) to noun cards (“a decision”). | ⭐⭐⭐⭐⭐ | Verb+noun, adj+noun collocations |
| Collocation dictation | Teacher reads a text; students write only the collocations they hear. | ⭐⭐⭐⭐ | Noticing collocations in speech |
| Collocation bingo | Students mark collocations they hear in a listening (e.g., “strong coffee”). | ⭐⭐⭐⭐ | Adj+noun, adv+adj collocations |
| Collocation substitution | Give a sentence with a gap: “She has a ___ smile” (bright/wide/false). | ⭐⭐⭐⭐ | Adj+noun collocations |
| Collocation role-plays | Create a dialogue where natural responses require collocations (e.g., “Can you give me a hand?”). | ⭐⭐⭐⭐⭐ | Verb+noun, phrasal verbs |
| Collocation speed drills | Students race to say collocations: “make a decision” → “take a break” → “do homework.” | ⭐⭐⭐⭐ | Fluency building |
| Collocation transformation | Convert a formal phrase to a collocation: “perform an analysis” → “do an analysis.” | ⭐⭐⭐ | Formal → informal registers |
Classroom Examples — Collocation Lessons
🔍 Example 1: “Make” Collocations (A2)
Context: Show a text with “make” collocations highlighted (e.g., “make a decision,” “make a mistake,” “make a phone call”).
Noticing: “What words follow ‘make’ in these sentences? Can you find a pattern?” (Answer: Nouns referring to actions or results.)
Retrieving: Back-chain drill: “call” → “phone call” → “a phone call” → “make a phone call.”
Reusing: Pair work: “Tell your partner about a time you made a decision, a mistake, and a phone call. Use the collocations.”
- make a decision (1,200/million)
- make a mistake (950/million)
- make a phone call (800/million)
- make a difference (750/million)
- make a point (700/million)
🔍 Example 2: “Strong” Collocations (B1)
Context: Show images of strong coffee, strong wind, strong smell. Elicit: “What do these have in common?” (Answer: Intensity.)
Noticing: “What nouns does ‘strong’ collocate with? Are they all physical?” (Answer: No — some are abstract, e.g., “strong argument.”)
Retrieving: Stress drill: “STRONG coffee” (stress on both words). Contrast with “weak coffee” (stress shifts).
Reusing: Writing task: “Describe a person, place, or thing using 3 ‘strong’ collocations (e.g., ‘She has a strong accent and drinks strong coffee’).”
- strong coffee (800/million)
- strong wind (600/million)
- strong smell (400/million)
- strong argument (300/million)
- strong evidence (250/million)
Teaching these 5 collocations is more useful than teaching the comparative form (“stronger”).
Corpus Linguistics — The Data Behind the Lexical Approach
Corpus linguistics is the study of language as it is actually used, based on large collections of real-world texts (corpora). It provides empirical evidence for:
- Which words and chunks are most frequent.
- Which collocations are strong or weak.
- How grammar patterns emerge from lexis.
- How register (formal/informal) affects word choice.
Five Corpus Findings That Change Teaching
- The top 10 verbs (be, have, do, say, go, get, make, know, think, take) account for 25% of all verbs in speech. Teaching their collocations is more useful than teaching rare tenses.
- “You know” is the most frequent 2-word chunk in conversation (1,200/million). It is rarely taught.
- Phrasal verbs are 3x more common than their Latinate equivalents (e.g., “put off” vs. “postpone”).
- The present simple is 10x more frequent than the present perfect in speech.
- Passive voice is rare in speech (3% of clauses) but common in academic writing (20%).
| Traditional Priority | Corpus Reality | Implication |
|---|---|---|
| Third conditional | 0.0001% of speech | Teach as a literary/academic form, not spoken |
| Present perfect continuous | 0.0005% of speech | Focus on present perfect simple first |
| Past perfect | 0.002% of speech | Teach only for narrative sequences |
| Phrasal verbs | 5% of all verbs | Teach as chunks, not “verb + particle” |
| Discourse markers | 15% of all words | Teach as chunks for fluency |
Free Corpus Tools for Teachers
| Tool | What It Does | Best For | Link |
|---|---|---|---|
| British National Corpus | 100 million words of British English (written + spoken) | Finding frequency data for chunks | BNC ↗ |
| Corpus of Contemporary American English | 560 million words (1990–2019) | Comparing BrE vs AmE usage | COCA ↗ |
| Oxford Collocations Dictionary | Collocation data for 150,000 words | Finding strong/weak collocations | Oxford Collocations ↗ |
| Cambridge Dictionary | Grammar and collocation notes for every entry | Quick checks during lesson planning | Cambridge Dictionary ↗ |
| Sketch Engine | Professional corpus tool (free tier available) | Advanced collocation analysis | Sketch Engine ↗ |
🔍 How to Use Corpus Data in Lesson Planning
- Check frequency: Is the target language common enough to teach? (Aim for >100/million for speech, >50/million for writing.)
- Find collocations: What words does it frequently combine with? Teach these as chunks.
- Compare registers: Is it formal, informal, or neutral? Adjust tasks accordingly.
- Identify patterns: Does the grammar “rule” you’re teaching match real usage? (e.g., “adjective order” is a description, not a rule.)
- Prioritise: Focus on the most frequent chunks first. The top 100 chunks are more useful than the top 10 grammar rules.
- Selection: Choose language items based on frequency, not syllabus order.
- Presentation: Use authentic examples from corpora, not invented sentences.
- Practice: Design tasks that reflect real-world usage patterns.
- Assessment: Test what students will actually need to use, not arbitrary rules.
Classroom Applications of Corpus Data
🔍 Example 1: Teaching “Get” — Corpus vs. Textbook
Textbook Approach: Teaches “get” as a general verb with 20+ meanings (obtain, become, receive, etc.). Students struggle to use it naturally.
Corpus Approach: Focus on the top 5 collocations (80% of usage):
Lesson Plan:
- Notice: Show a dialogue with “get” collocations highlighted. “How many different meanings of ‘get’ do you see?” (Answer: None — it’s the same verb in different chunks.)
- Retrieve: Drill each chunk: “a job” → “get a job”; “married” → “get married.”
- Reuse: Writing task: “Write 3 sentences using ‘get’ collocations about your life.”
- get a job (1,200/million)
- get married (900/million)
- get home (800/million)
- get up (700/million)
- get a chance (600/million)
🔍 Example 2: Teaching “Take” vs. “Make” — Corpus Insights
Problem: Students confuse “take a photo” and “make a photo.” Corpus data shows:
| Collocation | Frequency (per million) | Register |
|---|---|---|
| take a photo | 800 | Neutral |
| make a photo | 0.5 | Non-standard (L1 interference) |
| take a picture | 1,200 | Neutral |
| shoot a photo | 50 | Informal/photography context |
Lesson Plan:
- Notice: Show a photo of a camera. “What do we say in English? Check the corpus data.”
- Retrieve: Drill: “photo” → “a photo” → “take a photo.” Contrast with “make a cake.”
- Reuse: Speaking task: “Describe 3 photos you’ve taken. Use ‘take a photo’ each time.”
AI Lexical Analyzer — Chunk and Collocation Finder
Enter a word or phrase. The analyzer will return:
- Top 5 collocations (with frequency data).
- Common chunks it appears in.
- Register (formal/informal) and examples.
- Teaching recommendations.
🔍 How the Analyzer Works — Corpus + AI
The tool simulates analysis using:
- Corpus data: Frequency lists from the Cambridge English Corpus and Oxford Collocations Dictionary.
- Collocation strength: Mutual Information (MI) scores to identify strong vs. weak collocations.
- Register analysis: Data on formal/informal usage from the British National Corpus.
- Teaching priorities: Algorithms to rank chunks by utility for each CEFR level.
- The top 1,000 chunks cover 80% of spoken needs.
- The top 2,000 collocations cover 60% of written needs.
- Grammar structures should be taught through lexical chunks, not in isolation.
This analyzer replicates that prioritization.
Lexical Error Diagnosis — Find the Gap
Below are 10 student errors. For each, identify whether the error reveals a:
- Chunk gap (missing/incorrect fixed expression).
- Collocation gap (wrong word combination).
- Corpus gap (overgeneralizing a rare pattern).
- Register gap (formal/informal mismatch).
🔍 Why These Errors Happen — L1 Transfer and Corpus Gaps
Most lexical errors stem from:
- L1 transfer: Direct translation from the student’s first language (e.g., Spanish “tener una duda” → “have a doubt”).
- Overgeneralization: Applying a rare pattern broadly (e.g., “take a decision” is correct in French but not English).
- Corpus gaps: Teaching low-frequency items (e.g., third conditional) before high-frequency chunks.
- Register mismatch: Using formal chunks informally or vice versa (e.g., “I have a query” in casual speech).
- 60% of errors are lexical (chunk/collocation), not grammatical.
- 80% of lexical errors persist from A2 to C1 if not corrected early.
- The most common errors are collocational (e.g., “do a mistake”), not grammatical.
This game trains teachers to diagnose the root cause of errors, not just the surface form.
Lexical Practice Builder — Design Your Own Tasks
For each target chunk/collocation, design a noticing, retrieving, and reusing activity. Use the model answers to compare.
“At the end of the day…” (B1)
Fixed expression for summarizingDesign a 10-minute lesson segment for this chunk. Include:
- A noticing activity (how will students first encounter the chunk?).
- A retrieving activity (how will they practice saying it?).
- A reusing activity (how will they use it communicatively?).
Noticing (3 min): Play a 1-minute audio clip of a debate where the speaker uses “at the end of the day” 3 times. Students listen and raise their hand each time they hear it. Write the chunk on the board. Ask: “What does it signal?” (Answer: A summary or final opinion.)
Retrieving (4 min): Back-chain drill: “day” → “the day” → “end of the day” → “at the end of the day.” Stress pattern: at the END of the DAY. Then speed drill: students race to say it 5 times fast.
Reusing (3 min): Pair debate: “Discuss a controversial topic (e.g., ‘Is social media good?’). After 2 minutes, each partner must summarize their view using ‘At the end of the day…’.”
“Take a break” (A2)
Verb + noun collocationDesign a 10-minute segment for this collocation. Include:
- A noticing activity (e.g., highlight in a text).
- A retrieving activity (e.g., drilling).
- A reusing activity (e.g., role-play).
Noticing (3 min): Show a comic strip where a character says, “I’m tired. I need to take a break.” Highlight the collocation. Ask: “What other things can you ‘take’?” (Elicit: take a nap, take a seat, take a shower.)
Retrieving (4 min): Minimal pair drill: “break” vs. “brake.” Then back-chain: “a break” → “take a break.” Stress: TAKE a BREAK.
Reusing (3 min): Class survey: “Ask 3 classmates: ‘When was the last time you took a break? What did you do?’ Report back using the collocation.”
“Get” Collocations (B1)
Top 5: get a job, get married, get home, get up, get a chanceDesign a 15-minute lesson using corpus data. Include:
- A noticing activity (show frequency data).
- A retrieving activity (drill the top 3 collocations).
- A reusing activity (students write sentences using the data).
Noticing (4 min): Show a bar chart of the top 5 “get” collocations with frequency data. Ask: “Which of these do you already know? Which are surprising?” Discuss why “get a job” is more common than “get a degree.”
Retrieving (6 min): Drill the top 3:
- “job” → “a job” → “get a job” (stress: GET a JOB).
- “married” → “get married” (note: no object; contrast with “marry someone”).
- “up” → “get up” (minimal pair: “get up” vs. “give up”).
Reusing (5 min): Writing task: “Write 3 true sentences about your life using the top 3 collocations. Then find a partner who has used a different one.”
- get a job (1,200/million)
- get married (900/million)
- get home (800/million)
- get up (700/million)
- get a chance (600/million)
Lexical Approach Resources — Direct Links
Cambridge English Corpus — Frequency Data
Searchable corpus data for 2 billion words of English. Find the most frequent chunks, collocations, and grammar patterns by level.
Cambridge English Corpus ↗Oxford Collocations Dictionary
Collocation data for 150,000 words. Shows strong/weak collocations, register, and examples. Essential for lesson planning.
Oxford Collocations Dictionary ↗British National Corpus — Spoken English
100 million words of British English, including 10 million words of spoken data. Find real-world chunks and collocations.
British National Corpus ↗TeachingEnglish — Lexical Approach Articles
British Council’s guides to teaching chunks, collocations, and corpus linguistics. Includes lesson plans and activities.
TeachingEnglish: Lexical Approach ↗Sketch Engine — Corpus Query Tool
Free tier available. Find collocations, chunks, and frequency data for any word or phrase in multiple corpora.
Sketch Engine ↗Cambridge Grammar and Vocabulary Profiles
Shows which grammar/vocabulary items are taught at each CEFR level, based on corpus frequency and learner needs.
Cambridge Profiles ↗