I. Summary
As a data practitioner, SQL is the most commonly used language for data extraction and simple preprocessing in daily work. Due to its widespread use and ease of learning, it is also widely adopted by other roles such as product managers and developers. This article mainly combines classic interview questions to provide SQL methods and practical techniques for passing data development interviews. The following questions are based on the author's experience and medium-to-high difficulty SQL questions shared online.
II. Problem-Solving Approach
Easy — Tests usage of
group by&limitor less frequently used functions likerand(); involves table joins.Medium — Tests basic window functions; involves table joins, with tricky parts often including self-joins.
Hard — May involve medians or more complex data extraction concepts, possibly requiring generating columns based on specific rules; typically, creating intermediate tables makes such problems clearer.
III. SQL Exam Questions
Question 1
ordertable: fieldsgoods_id,amount;pvview table: fieldsgoods_id,uid;- Goods are ranked by total sales amount into top 10, top 10~top 20, and others.
Find the number of browsing users for each group (same user within a group counts only once).
create table if not exists test.nil_goods_category as
select goods_id
,case when nn<= 10 then 'top10'
when nn<= 20 then 'top10~top20'
else 'other' end as goods_group
from
(
select goods_id
,row_number() over(partition by goods_id order by sale_sum desc) as nn
from
(
select goods_id,sum(amount) as sale_sum
from order
group by 1
) aa
) bb;
select b.goods_group,count(distinct a.uid) as num
from pv a
left join test.nil_goods_category b
on a.goods_id = b.goods_id
group by 1;
Question 2
Product activity table goods_event: g_id (may be duplicate), t1 (start time), t2 (end time).
Given a time period (t3, t4), find the number of products with activities during that period.
1.
select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
or (t2>=t3 and t2<=t4)
select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
union all
Question 3
Product activity log table event: fields goods_id, time;
Find the latest activity time of the product that participated in the most activities.
select a.goods_id,a.time
from event a
inner join
(
select goods_id,count(*)
from event
group by gooods_id
order by count(*) desc
limit 1
) b
on a.goods_id = b.goods_id
order by a.goods_id,a.time desc
Question 4
User login log data; define a session: logs from the same user within one hour are considered one session.
Generate a session column.
drop table if exists koo.nil_temp0222_a2;
create table if not exists koo.nil_temp0222_a2 as
select *
,row_number() over(partition by userid order by inserttime) as nn1
from
(
select a.*
,b.inserttime as inserttime_aftr
,datediff(b.inserttime,a.inserttime) as session_diff
from
(
select userid,inserttime
,row_number() over(partition by userid order by inserttime asc) nn
from koo.nil_temp0222
where userid = 1900000169
) a
left join
(
select userid,inserttime
,row_number() over(partition by userid order by inserttime asc) nn
from koo.nil_temp0222
where userid = 1900000169
) b
on a.userid = b.userid and a.nn = b.nn-1
) aa
where session_diff >10 or nn = 1
order by userid,inserttime;
drop table if exists koo.nil_temp0222_a2_1;
create table if not exists koo.nil_temp0222_a2_1 as
select a.*
,case when b.nn is null then a.nn+3 else b.nn end as nn_end
from koo.nil_temp0222_a2 a
left join koo.nil_temp0222_a2 b
on a.userid = b.userid
and a.nn1 = b.nn1 - 1;
select a.*,b.nn1 as session_id
from
(
select userid,inserttime
,row_number() over(partition by userid order by inserttime asc) nn
from koo.nil_temp0222
where userid = 1900000169
) a
left join koo.nil_temp0222_a2_1 b
on a.userid = b.userid
and a.nn>=b.nn
and a.nn<b.nn_end
Question 5
Order table with fields: order number and time.
Get the last three orders on the last day of each month.
select *
from
(
select *
,rank() over(partition by mm order by dd desc) as nn1
,row_number() over(partition by mm,dd order by inserttime desc) as nn2
from
(select cast(right(to_date(inserttime),2) as int) as dd,month(inserttime) as mm,userid,inserttime
from koo.nil_temp0222) aa
) bb
where nn1 = 1 and nn2<=3;
Question 6
Database table Tourists records daily visitor counts for a scenic spot in July as follows:
id date visits 1 2017-07-01 100 …… Coincidentally, the id field equals the day of the month.
Now, select dates where there are more than 100 visitors for three consecutive days.
Example output: date 2017-07-01 ……
select a.*,b.num as num2,c.num as num3
from table a
left join table b
on a.userid = b.userid
and a.dt = date_add(b.dt,-1)
left join table c
on a.userid = c.userid
and a.dt = date_add(c.dt,-2)
where b.num>100
and a.num>100
and c.num>100
Question 7
Table A has 21 columns: first column id, remaining 20 feature columns named d1-d20, with 100K rows.
Another table B (pattern table) has the same structure with 50K rows.
Find rows in A that match any row in B, and record the corresponding ids.
Two conditions qualify:
- All feature columns match exactly.
- At most one feature column does not match, while the other 19 match exactly (unknown which column).
select aa.*
from
(
select *,concat(d1,d2,d3……d20) as mmd
from table
) aa
left join
(
select id,concat(d1,d2,d3……d20) as mmd
from table
) bb
on aa.id = bb.id
and aa.mmd = bb.mmd
select a.*,sum(d1_jp,d2_jp……,d20_jp) as same_judge
from
(
select a.*
,case when a.d1 = b.d1 then 1 else 0 end as d1_jp
,case when a.d2 = b.d2 then 1 else 0 end as d2_jp
,case when a.d3 = b.d3 then 1 else 0 end as d3_jp
,case when a.d4 = b.d4 then 1 else 0 end as d4_jp
,case when a.d5 = b.d5 then 1 else 0 end as d5_jp
,case when a.d6 = b.d6 then 1 else 0 end as d6_jp
,case when a.d7 = b.d7 then 1 else 0 end as d7_jp
,case when a.d8 = b.d8 then 1 else 0 end as d8_jp
,case when a.d9 = b.d9 then 1 else 0 end as d9_jp
,case when a.d10 = b.d10 then 1 else 0 end as d10_jp
,case when a.d20 = b.d20 then 1 else 0 end as d20_jp
,case when a.d11 = b.d11 then 1 else 0 end as d11_jp
,case when a.d12 = b.d12 then 1 else 0 end as d12_jp
,case when a.d13 = b.d13 then 1 else 0 end as d13_jp
,case when a.d14 = b.d14 then 1 else 0 end as d14_jp
,case when a.d15 = b.d15 then 1 else 0 end as d15_jp
,case when a.d16 = b.d16 then 1 else 0 end as d16_jp
,case when a.d17 = b.d17 then 1 else 0 end as d17_jp
,case when a.d18 = b.d18 then 1 else 0 end as d18_jp
,case when a.d19 = b.d19 then 1 else 0 end as d19_jp
from table a
left join table b
on a.id = b.id
) aa
where sum(d1_jp,d2_jp……,d20_jp) = 19
Question 8
We store user ratings of products as sparse vectors in table t:
tfields:uid,goods_id,star.uidis user id.goods_idis product id.staris user rating for that product (value 1-5).
Now we want to calculate the inner product between pairs of vectors. The semantic meaning here is:
For two different users, if they both rated the same set of products, multiply each pair of their ratings and sum these products.
Example: The database table has the following data:
U0 g0 2
U0 g1 4
U1 g0 3
U1 g1 1
The result after calculation:
U0 U1 23+41=10 ……
select aa.uid1,aa.uid2
,sum(star_multi) as result
from
(
select a.uid as uid1
,b.uid as uid2
,a.goods_id
,a.star * b.star as star_multi
from t a
left join t b
on a.goods_id = b.goods_id
and a.udi<>b.uid
) aa
group by 1,2
select uid1,uid2,sum(multiply) as result
from
(select t.uid as uid1, t.uid as uid2, goods_id,a.star*star as multiply
from a left join b
on a.goods_id = goods_id
and a.uid<>uid) aa
group by goods
Question 9
Given a table of numbers and frequencies, find the median of the numbers.
select a.*
,b.s_mid_n
,c.l_mid_n
,avg(b.s_mid_n,c.l_mid_n)
from
(
select
case when mod(count(*),2) = 0 then count(*)/2 else (count(*)+1)/2 end as s_mid
,case when mod(count(*),2) = 0 then count(*)/2+1 else (count(*)+1)/2 end as l_mid
from table
) a
left join
(
select id,num,row_number() over(partition by id order by num asc) nn
from table
) b
on a.s_mid = b.nn
left join
(
select id,num,row_number() over(partition by id order by num asc) nn
from table
) c
on a.l_mid = c.nn
Question 10
Table order has three fields: store ID, order time, order amount.
Find stores that have sales every week within a month.
select distinct credit_level
from
(
select credit_level,count(distinct nn) as number
from
(
select userid,credit_level,inserttime,month(inserttime) as mm
,weekofyear(inserttime) as week
,dense_rank() over(partition by credit_level,month(inserttime) order by weekofyear(inserttime) asc) as nn
from koo.nil_temp0222
where substring(inserttime,1,7) = '2019-12'
order by credit_level ,inserttime
) aa
group by 1
) bb
where number = (select count(distinct weekofyear(inserttime))
from koo.nil_temp0222
where substring(inserttime,1,7) = '2019-12')