Explanation of 10 Classic SQL Interview Questions

Explanation of 10 Classic SQL Interview Questions

The following questions are based on the author's experience and intermediate-to-advanced SQL problems shared online.

Last updated 5/2/2022 6:23 PM
俊欣 关于数据分析与可视化
11 min read
Category
Database
Tags
Database

I. Summary

As a data practitioner, SQL is the most commonly used language for data extraction and simple preprocessing in daily work. Due to its widespread use and ease of learning, it is also widely adopted by other roles such as product managers and developers. This article mainly combines classic interview questions to provide SQL methods and practical techniques for passing data development interviews. The following questions are based on the author's experience and medium-to-high difficulty SQL questions shared online.

II. Problem-Solving Approach

  • Easy — Tests usage of group by & limit or less frequently used functions like rand(); involves table joins.

  • Medium — Tests basic window functions; involves table joins, with tricky parts often including self-joins.

  • Hard — May involve medians or more complex data extraction concepts, possibly requiring generating columns based on specific rules; typically, creating intermediate tables makes such problems clearer.

III. SQL Exam Questions

Question 1

  • order table: fields goods_id, amount;
  • pv view table: fields goods_id, uid;
  • Goods are ranked by total sales amount into top 10, top 10~top 20, and others.

Find the number of browsing users for each group (same user within a group counts only once).

create table if not exists test.nil_goods_category as
select goods_id
,case when nn<= 10 then 'top10'
      when nn<= 20 then 'top10~top20'
      else 'other' end as goods_group
from
(
    select goods_id
    ,row_number() over(partition by goods_id order by sale_sum desc) as nn
    from
    (
        select goods_id,sum(amount) as sale_sum
        from order
        group by 1
    ) aa
) bb;
select b.goods_group,count(distinct a.uid) as num
from pv a
left join test.nil_goods_category b
on a.goods_id = b.goods_id
group by 1;

Question 2

Product activity table goods_event: g_id (may be duplicate), t1 (start time), t2 (end time).

Given a time period (t3, t4), find the number of products with activities during that period.

1.

select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
or (t2>=t3 and t2<=t4)
select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
union all

Question 3

Product activity log table event: fields goods_id, time;

Find the latest activity time of the product that participated in the most activities.

select a.goods_id,a.time
from event a
inner join
(
    select goods_id,count(*)
    from event
    group by gooods_id
    order by count(*) desc
    limit 1
) b
on a.goods_id = b.goods_id
order by a.goods_id,a.time desc

Question 4

User login log data; define a session: logs from the same user within one hour are considered one session.

Generate a session column.

drop table if exists koo.nil_temp0222_a2;
create table if not exists koo.nil_temp0222_a2 as
select *
    ,row_number() over(partition by userid order by inserttime) as nn1
from
(
    select a.*
    ,b.inserttime as inserttime_aftr
    ,datediff(b.inserttime,a.inserttime) as session_diff
  from
  (
    select userid,inserttime
      ,row_number() over(partition by userid order by inserttime asc) nn
    from koo.nil_temp0222
    where userid = 1900000169
  ) a
  left join
  (
     select userid,inserttime
      ,row_number() over(partition by userid order by inserttime asc) nn
    from koo.nil_temp0222
    where userid = 1900000169
  ) b
  on a.userid =  b.userid and a.nn = b.nn-1
) aa
where session_diff >10 or nn = 1
order by userid,inserttime;

drop table if exists koo.nil_temp0222_a2_1;
create table if not exists koo.nil_temp0222_a2_1 as
select a.*
,case when b.nn is null then a.nn+3 else b.nn end as nn_end
from koo.nil_temp0222_a2 a
left join koo.nil_temp0222_a2 b
on a.userid = b.userid
and a.nn1 = b.nn1 - 1;

select a.*,b.nn1 as session_id
from
(
  select userid,inserttime
    ,row_number() over(partition by userid order by inserttime asc) nn
  from koo.nil_temp0222
  where userid = 1900000169
) a
left join koo.nil_temp0222_a2_1 b
on a.userid = b.userid
and a.nn>=b.nn
and a.nn<b.nn_end

Question 5

Order table with fields: order number and time.

Get the last three orders on the last day of each month.

select *
from
(
  select *
  ,rank() over(partition by mm order by dd desc) as nn1
  ,row_number() over(partition by mm,dd order by inserttime desc) as nn2
  from
  (select cast(right(to_date(inserttime),2) as int) as dd,month(inserttime) as mm,userid,inserttime
  from koo.nil_temp0222) aa
) bb
where nn1 = 1 and nn2<=3;

Question 6

Database table Tourists records daily visitor counts for a scenic spot in July as follows:

id date visits 1 2017-07-01 100 …… Coincidentally, the id field equals the day of the month.

Now, select dates where there are more than 100 visitors for three consecutive days.

Example output: date 2017-07-01 ……

select a.*,b.num as num2,c.num as num3
from table  a
left join table b
on a.userid = b.userid
and a.dt = date_add(b.dt,-1)
left join table c
on a.userid = c.userid
and a.dt = date_add(c.dt,-2)
where b.num>100
and a.num>100
and c.num>100

Question 7

Table A has 21 columns: first column id, remaining 20 feature columns named d1-d20, with 100K rows.

Another table B (pattern table) has the same structure with 50K rows.

Find rows in A that match any row in B, and record the corresponding ids.

Two conditions qualify:

  • All feature columns match exactly.
  • At most one feature column does not match, while the other 19 match exactly (unknown which column).
select aa.*
from
(
  select *,concat(d1,d2,d3……d20) as mmd
  from table
) aa
left join
(
  select id,concat(d1,d2,d3……d20) as mmd
  from table
) bb
on aa.id = bb.id
and aa.mmd = bb.mmd
select a.*,sum(d1_jp,d2_jp……,d20_jp) as same_judge
from
(
  select a.*
  ,case when a.d1 = b.d1 then 1 else 0 end as d1_jp
  ,case when a.d2 = b.d2 then 1 else 0 end as d2_jp
  ,case when a.d3 = b.d3 then 1 else 0 end as d3_jp
  ,case when a.d4 = b.d4 then 1 else 0 end as d4_jp
  ,case when a.d5 = b.d5 then 1 else 0 end as d5_jp
  ,case when a.d6 = b.d6 then 1 else 0 end as d6_jp
  ,case when a.d7 = b.d7 then 1 else 0 end as d7_jp
  ,case when a.d8 = b.d8 then 1 else 0 end as d8_jp
  ,case when a.d9 = b.d9 then 1 else 0 end as d9_jp
  ,case when a.d10 = b.d10 then 1 else 0 end as d10_jp
  ,case when a.d20 = b.d20 then 1 else 0 end as d20_jp
  ,case when a.d11 = b.d11 then 1 else 0 end as d11_jp
  ,case when a.d12 = b.d12 then 1 else 0 end as d12_jp
  ,case when a.d13 = b.d13 then 1 else 0 end as d13_jp
  ,case when a.d14 = b.d14 then 1 else 0 end as d14_jp
  ,case when a.d15 = b.d15 then 1 else 0 end as d15_jp
  ,case when a.d16 = b.d16 then 1 else 0 end as d16_jp
  ,case when a.d17 = b.d17 then 1 else 0 end as d17_jp
  ,case when a.d18 = b.d18 then 1 else 0 end as d18_jp
  ,case when a.d19 = b.d19 then 1 else 0 end as d19_jp
  from table a
  left join table b
  on a.id = b.id
) aa
where sum(d1_jp,d2_jp……,d20_jp) = 19

Question 8

We store user ratings of products as sparse vectors in table t:

  • t fields: uid, goods_id, star. uid is user id.
  • goods_id is product id. star is user rating for that product (value 1-5).

Now we want to calculate the inner product between pairs of vectors. The semantic meaning here is:

For two different users, if they both rated the same set of products, multiply each pair of their ratings and sum these products.

Example: The database table has the following data:

U0 g0 2
U0 g1 4
U1 g0 3
U1 g1 1

The result after calculation:

U0 U1 23+41=10 ……

select aa.uid1,aa.uid2
,sum(star_multi) as result
from
(
  select a.uid as uid1
  ,b.uid as uid2
  ,a.goods_id
  ,a.star * b.star as star_multi
  from t a
  left join t b
  on a.goods_id = b.goods_id
  and a.udi<>b.uid
) aa
group by 1,2
select uid1,uid2,sum(multiply) as result
from
(select t.uid as uid1, t.uid as uid2, goods_id,a.star*star as multiply
from a left join b
on a.goods_id = goods_id
and a.uid<>uid) aa
group by goods

Question 9

Given a table of numbers and frequencies, find the median of the numbers.

select a.*
,b.s_mid_n
,c.l_mid_n
,avg(b.s_mid_n,c.l_mid_n)
from
(
  select
  case when mod(count(*),2) = 0 then count(*)/2 else (count(*)+1)/2 end as s_mid
  ,case when mod(count(*),2) = 0 then count(*)/2+1 else (count(*)+1)/2 end as l_mid
  from table
) a
left join
(
  select id,num,row_number() over(partition by id order by num asc) nn
  from table
) b
on a.s_mid = b.nn
left join
(
  select id,num,row_number() over(partition by id order by num asc) nn
  from table
) c
on a.l_mid = c.nn

Question 10

Table order has three fields: store ID, order time, order amount.

Find stores that have sales every week within a month.

select distinct credit_level
from
(
  select credit_level,count(distinct nn) as number
  from
  (
    select userid,credit_level,inserttime,month(inserttime) as mm
    ,weekofyear(inserttime) as week
    ,dense_rank() over(partition by credit_level,month(inserttime) order by weekofyear(inserttime) asc) as nn
    from koo.nil_temp0222
    where substring(inserttime,1,7) = '2019-12'
    order by credit_level ,inserttime
  ) aa
  group by 1
) bb
where number = (select count(distinct weekofyear(inserttime))
from koo.nil_temp0222
where substring(inserttime,1,7) = '2019-12')
Keep Exploring

Related Reading

More Articles
Recent update 5/18/2026

枝见 Zhijian: A Markdown Mind Map Editor Built with Avalonia

This article introduces Zhijian, a local mind map editor based on Avalonia, supporting blank creation, folder loading, precise onboarding guidance, macOS shortcut adaptation, outline/Markdown/mind map synchronization, node notes, thumbnails, zoom, canvas dragging, and Markdown/OPML/XMind file exchange.

Continue Reading