Complete career guide · United Kingdom

Become a Data Analyst
in 12 months

A structured, beginner-to-job-ready roadmap with projects, certifications, CV guidance, and model interview answers — built for the UK job market.

5Phases
12Months
30+Curated resources
10Interview Q&As

The roadmap

Phase 1 Foundations Months 0 – 3
Excel / Google Sheets Basic statistics SQL fundamentals Python basics pandas & numpy Data cleaning

Start here regardless of background. Master pivot tables, VLOOKUP/XLOOKUP, and basic charting in Excel — most UK employers still test this. SQL is the single most important skill: learn SELECT, WHERE, GROUP BY, aggregates, and JOIN types. Build Python fluency with pandas and numpy. Understand basic statistics: mean vs median, distribution, correlation, and p-value intuition.

Recommended learning path
Phase 2 Core tools Months 3 – 6
Power BI Tableau Advanced SQL Window functions Python viz Git & GitHub Data warehousing

Power BI is the dominant BI tool in UK enterprises and the NHS — prioritise it. For advanced SQL, master window functions (ROW_NUMBER, LAG, RANK), CTEs, and subqueries. Progress Python to matplotlib and seaborn. Set up a GitHub repo and commit notebooks. Learn star schema and dimensional modelling concepts.

Recommended learning path
Phase 3 Projects & certifications Months 6 – 9
UK public datasets Dashboard design End-to-end analysis Clustering / ML-lite Kaggle GitHub portfolio

Build three portfolio projects using real UK public data (ONS, data.gov.uk, NHS Digital, HM Land Registry). Earn the Google Data Analytics Certificate, Microsoft PL-300, and optionally the dbt Analytics Engineer certification. Enter Kaggle competitions to benchmark your skills against peers.

Recommended learning path
Phase 4 CV, LinkedIn & job search Months 9 – 11
ATS-optimised CV LinkedIn profile Job boards Networking Salary research

UK CVs: 1–2 pages, no photo, no DOB. Lead with a 3-line summary, then a skills section with ATS keywords (SQL, Python, Power BI, DAX, ETL, KPI). Quantify every bullet. On LinkedIn, set your headline and populate the Featured section with project screenshots. Apply via LinkedIn Jobs, Indeed UK, Reed, Totaljobs, and CW Jobs.

Job boards & resources
Phase 5 Interview preparation Months 11 – 12
SQL challenges Case studies STAR method Tool demos Mock interviews Offer negotiation

UK interviews have three components: a technical SQL/Python screen, a take-home case study or live task, and a competency interview using the STAR method. Practise SQL on StrataScratch and HackerRank. Prepare 3–4 STAR stories from your portfolio projects. For the take-home task, structure your answer: context → finding → recommendation → caveat.

Practice platforms & resources

Portfolio projects

PROJECT 01
UK Housing Prices Dashboard
Python · Power BI · HM Land Registry data
PROJECT 02
E-commerce Sales Analysis
Python · SQL · Tableau · Kaggle dataset
PROJECT 03
Customer Segmentation
Python · scikit-learn · K-Means clustering

Recommended certifications

Google Data Analytics
Coursera — widely recognised
Microsoft PL-300
Power BI Data Analyst
dbt Analytics Engineer
Modern data stack — tech roles
AWS Cloud Practitioner
Optional — cloud awareness

UK salary benchmarks (2025)

Level London Outside London Experience
Junior / Entry £28k – £38k £24k – £32k 0 – 2 years
Mid-level £38k – £55k £32k – £45k 2 – 5 years
Senior £55k – £75k £45k – £62k 5+ years
Lead / Principal £75k – £100k+ £60k – £85k 8+ years

Interview Q&A

01
Write a SQL query to find the top 3 products by revenue in each category.
SQL

Window functions appear in almost every UK technical screen. Employers want to see you can write CTEs and RANK() without a hint.

Model answer
WITH ranked AS ( SELECT category, product_name, SUM(revenue) AS total_revenue, RANK() OVER ( PARTITION BY category ORDER BY SUM(revenue) DESC ) AS rnk FROM sales GROUP BY category, product_name ) SELECT category, product_name, total_revenue FROM ranked WHERE rnk <= 3 ORDER BY category, rnk;
Interviewer tip: Walk the interviewer through your logic as you type. Saying "I'm partitioning by category so the ranking resets for each one" shows you understand the concept, not just the syntax.
02
How would you identify and handle duplicate records in a dataset using SQL?
SQL

Data quality is a core analyst responsibility. UK interviewers often ask this after a "tell me about a data problem you've solved" question.

Model answer
-- Find duplicates SELECT email, COUNT(*) AS cnt FROM customers GROUP BY email HAVING COUNT(*) > 1; -- Keep only the most recent row per email WITH deduped AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY created_at DESC ) AS rn FROM customers ) SELECT * FROM deduped WHERE rn = 1;
Interviewer tip: Always ask what "duplicate" means in context — same primary key, same email, or same combination of fields. This question tests your data quality instincts as much as your SQL.
03
What is the difference between mean and median, and when would you use each?
Statistics

A classic statistics screening question. Interviewers want to see you reach for median when data is skewed — a very common real-world situation.

Model answer
The mean is the arithmetic average — it sums all values and divides by count. The median is the middle value when data is sorted. The mean is sensitive to outliers: a dataset of salaries like £25k, £28k, £30k, and £500k gives a mean of £145k — deeply misleading. The median gives £29k, which better represents the typical employee. Use the mean when data is roughly symmetric and free of extreme outliers. Use the median for skewed distributions, income data, and house prices. The ONS always reports median house prices for exactly this reason.
Interviewer tip: Referencing a UK-specific example like ONS house price data shows local market awareness and tends to land well with UK interviewers.
04
A/B test results show a p-value of 0.03. Is the result significant? What would you do next?
Statistics

Tests your understanding of statistical significance without blindly accepting results. Especially relevant in e-commerce, fintech, and product analyst roles.

Model answer
A p-value of 0.03 means there is a 3% probability of observing this result if the null hypothesis is true. With a 0.05 threshold, the result is statistically significant — but I would not stop there. I would check: (1) Was the sample size large enough? (2) Was the test run for a full business cycle to avoid novelty effects? (3) Is the effect size practically significant — a 0.1% lift might be significant but not worth shipping. (4) Were there confounding factors like a simultaneous marketing campaign? I would report both p-value and confidence interval, framed around business impact.
Interviewer tip: Interviewers love candidates who push back on "just ship it because p < 0.05". Mentioning effect size and practical significance shows maturity.
05
Walk me through how you would build a sales performance dashboard in Power BI from scratch.
Tools

Power BI is the dominant BI tool in UK enterprise. This question is extremely common in retail, finance, and NHS roles.

Model answer
I would start by connecting to the data source — typically SQL Server or SharePoint Excel in a UK enterprise context. I would use Power Query to clean and transform: remove nulls, standardise date formats, create a proper date dimension table. In the data model, I would set up relationships between the fact table and dimensions (products, regions, customers). I would write DAX measures for the key KPIs — total revenue, revenue vs target, month-on-month growth, and rolling 12-month average. On the canvas: a card visual for headline KPI, a bar chart for regional breakdown, a line chart for trend, and slicers for date and product category. Finally, publish to Power BI Service with scheduled refresh.
Interviewer tip: Mentioning the date dimension table and DAX measures signals you understand how Power BI actually works, not just how to drag visuals onto a canvas.
06
How would you approach cleaning a messy dataset in Python before analysis?
Tools

Real-world data is always dirty. UK analyst roles — especially NHS, retail, and financial services — involve heavily messy data.

Model answer
import pandas as pd df = pd.read_csv('data.csv') print(df.shape, df.dtypes) print(df.isnull().sum(), df.duplicated().sum()) # Handle nulls df['revenue'].fillna(df['revenue'].median(), inplace=True) df.dropna(subset=['customer_id'], inplace=True) # Fix types — note dayfirst=True for UK date format df['date'] = pd.to_datetime(df['date'], dayfirst=True) # Remove duplicates df.drop_duplicates(subset=['transaction_id'], inplace=True) # Standardise text df['region'] = df['region'].str.strip().str.title() # Validate ranges df = df[df['revenue'] >= 0]
Interviewer tip: Note dayfirst=True for UK date formats — a small detail that shows you have worked with real UK data. Interviewers notice this.
07
Tell me about a time you found an insight that changed a business decision.
Behavioural

The most common STAR-method question in UK data analyst interviews. Interviewers want to see impact, not just technical work.

Model answer
Situation: While analysing weekly sales data, I noticed our highest-margin category had a sharp drop in repeat purchase rate over 6 weeks — masked by new customer growth keeping headline revenue flat. Task: Investigate the root cause. Action: I built a cohort retention analysis in Python, segmented by acquisition channel, and found that customers from a specific paid social campaign had 40% lower 90-day retention. I presented this in a 5-slide deck with the revenue impact modelled out. Result: The marketing team paused that campaign, reinvesting the budget into higher-retention channels. Repeat purchase rate recovered 18% over the following quarter.
Interviewer tip: Use a real story from your portfolio projects if you lack work experience. The structure — a number, an action, a result — is what matters, not the company size.
08
How do you explain a complex analysis to a non-technical stakeholder?
Behavioural

Communication is consistently rated as the top gap in junior UK data analysts. This question tests whether you are a translator, not just a technician.

Model answer
My approach is to lead with the decision, not the methodology. Most stakeholders do not need to know I used a window function — they need to know whether to launch the product or not. I anchor on the agreed business question, then present the single most important finding in one sentence. I use visuals wherever possible — a simple bar chart beats a table of numbers every time. Caveats are framed in business terms: "we are confident enough to act on this, but we should revisit in 30 days." I check understanding by asking "does this match what you were expecting?" rather than "do you have any questions?"
Interviewer tip: The phrase "lead with the decision, not the methodology" is a strong signal to interviewers. It shows you think like a business analyst, not just a data technician.
09
How would you measure the success of a new product feature launch?
Business

Common in product analyst, e-commerce, and SaaS roles. Tests whether you can define metrics before looking at data.

Model answer
I would define success before the launch, not after. I would work with the product team to identify the primary metric — the one thing the feature is supposed to move (e.g. conversion rate for a new checkout flow). I would also define guardrail metrics (things we must not break, like page load time) and secondary metrics (engagement, NPS). I would set up a pre/post comparison or an A/B test if engineering can support it. I would report results at three time horizons — 7-day, 30-day, and 90-day — because some features show a novelty effect that fades.
Interviewer tip: Mentioning guardrail metrics and the novelty effect shows product analytics maturity. These concepts come up constantly in UK tech and e-commerce roles.
10
Our sales have dropped 15% this month. How would you investigate?
Business

A classic case study question used by UK retailers, banks, and consultancies. Tests structured thinking under pressure.

Model answer
I would start by validating the data itself — is the 15% drop real, or is there a pipeline issue or a change in how sales are recorded? Assuming the data is clean, I would decompose: is the drop across all products, regions, and channels — or isolated to one segment? If isolated, that points to a specific root cause (competitor move, supply issue, campaign ending). If broad, it suggests a macro factor (seasonal effect, pricing change, economic headwind). I would check the same period last year and the prior month, then look at external signals like Google Trends and competitor activity. I would form 2–3 hypotheses and bring structured findings to the team within 24 hours, not a single answer.
Interviewer tip: "Validate the data first" is the answer most junior candidates skip. Starting there tells interviewers you have seen data quality issues in the wild.