Parallel Query Design for Tree Tables

Parallel Query Design for Tree Tables

Discussion on tree table design

Last updated 5/24/2022 6:53 PM
长空X
9 min read
Category
Sharing
Tags
Architecture Design

This article is contributed by netizen 长空X, welcome to share and repost

Original author: 长空 X (CSDN username "长空 X", author of CkTools, github: https://github.com/hjkl950217)

Original link: https://www.cnblogs.com/gtxck/articles/16293295.html

Background

Today, while chatting with 懒得勤快 about handling tree tables, we found that both of us currently have to use recursive queries to query tree tables. This approach is very inefficient and hard to maintain. Is there a simpler way to query in parallel? Later, we actually discussed a method, and he quickly implemented it on his website.

懒得勤快 Official Site

Disclaimer

The several solutions in the article are the result of our discussions and some online resources. There are countless design approaches. The designs introduced in the article are for most scenarios that require tree tables, but they do not represent the optimal solution! The optimal solution is already a combination of factors such as design approach, team skill level, business situation, etc. This sharing is only meant to help you find your optimal solution faster.

What is a Tree Table?

In a relational database, a table that stores tree structures. For example, if a field needs to select a category with levels one, two, ... N, it can be designed like this:

ID PID Name or Content
1 Comment 1
2 1 Comment 2
3 1 Comment 3
4 3 Comment 4

Such data can form a tree from our university data structures, used to express hierarchical relationships. Here, Id is usually best as a number, but there are cases where it is not a number; this may affect the choice of solution, as will be mentioned later. The entity definition for this data structure is generally as follows:

class CommentEntity
{
	public int ID {get;set;}
	public int PID {get;set;}
	//.. some data fields
	public CommentEntity ParentNode {get;set;}
	public List<CommentEntity> ChildNode {get;set;}
}

The entity defines ParentNode to point to the parent node, and ChildNode to point to several child nodes. If you have knowledge of linked lists from data structures, you can see that these two fields act like pointer fields.

Data is stored in the database as rows. If we get the data and assemble the ParentNode and ChildNode references, we can use it according to actual business needs.

What is it Used For?

Anything with a subordinate relationship can be stored this way, for example: permission relationships, categories, types, level divisions, administrative divisions, comments, etc.

The tricky part is the inconvenience of querying. For instance, if you want to query all data under a first-level category, the traditional way is to first query the first-level category with id=1, then query data where PID=1, then query data where PID = the ID of the previously queried data, and so on recursively multiple times until done.

Goal

Let's use comments as an example.

Requirements:

  1. When entering the page, paginate query the main comments, then display replies hierarchically
  2. Be able to query all subordinate comments based on a specific comment
  3. Query in parallel rather than recursively
  4. Each comment data can be either a main comment or a sub-comment

Solution 1: Using Tag to Mark the Tree

This solution adds a field tag to mark the entire tree, with the following structure:

ID PID Tag Content
1 ArticleId1 Comment 1
2 1 ArticleId1 Comment 2
3 1 ArticleId1 Comment 3
4 3 ArticleId1 Comment 4

Tag is used for database queries, ID and PID for assembling data in memory. Also, create a non-clustered index on the Tag column.

Query method:

Here, the new field is the same across all nodes in the tree. At most, query the database twice, then rearrange the reference relationships in memory using Pid, pruning unnecessary data.

First query: Use the comment ID to query the article ID (if the article ID is already known, go directly to step 2)

Second query: Use the article ID to query all data

Pagination query: After querying, prune unnecessary data in memory

These considerations are based on:

  1. If Id is a number, consecutive data is likely to be stored contiguously on disk, which improves disk I/O efficiency. If Id is not a number, creating a non-clustered index on ArticleId also allows fast querying.
  2. Assembling reference relationships in memory is very fast and does not require recursion (traverse and use PID to find, then directly add to ChildNode and assign to ParentNode)
  3. The design logic is simple, maintainable by developers at or above intern level

Disadvantage: If a comment tree has 1000 levels, it will inevitably retrieve a huge amount of useless data

Improvement: Using Level to Mark Depth

Add a level field:

ID PID tag level Content
1 ArticleId1 1 Comment 1
2 1 ArticleId1 2 Comment 2
3 1 ArticleId1 2 Comment 3
4 3 ArticleId1 3 Comment 4

When querying, attach level to reduce the amount of useless data transferred, then reuse the assembly code above.

Solution 2: Using Path to Mark Dependency Path

Borrowing an image from the internet to illustrate the idea directly (source unknown, will remove if infringement):

Combine with the above and modify:

ID PID Tag Path Content
1 ArticleId1 Comment 1
2 1 ArticleId1 1 Comment 2
3 1 ArticleId1 1 Comment 3
4 3 ArticleId1 1,2 Comment 4

When writing a child node, you need to know the path of the parent node, but generally this requirement can be met. Tag and Path are used for database queries, ID and PID for assembling data in memory.

Query method:

Query all: Query all data by article ID, then assemble using Pid in memory

Query data under id 2:

First query: Query the path of id=2

Second query: Query id=2 or startwith $",2"

Pagination query:

First query the first X items sorted by time using article ID, then perform a second query to get nested comment data. In the second query, multiple startwith conditions can be combined.

Also, it is recommended to redundantly include the level field to reduce query workload. Although level information is implicitly contained in path, it is not query-friendly.

These considerations are based on:

  1. Similar to solution 1, but with lower learning cost

Disadvantage: Not a serious disadvantage; when filtering with path to query child nodes, indexes cannot be utilized.

Solution 3: Not Designing Nested Comments

Taking inspiration from Zhihu's design, a picture says it all:

In Zhihu's structure, there are only comments and replies. A reply only needs to save the ID of the previous comment. This approach is not only simple to design but also provides an excellent reading experience (deep nesting is not necessarily good-looking).

ID PID GroupID Tag Content
1 1 ArticleId1 Comment 1
2 1 1 ArticleId1 Comment 2
3 1 1 ArticleId1 Comment 3
4 3 1 ArticleId1 Comment 4
5 2 ArticleId1 Comment 5

Query method:

Query all: Query all data where PID is null by article ID, then assemble using PID in memory

Query data under id 1 and its subordinates: Query data where GroupID = 1. In this design, replies are not queried separately.

Advantages: Very low learning cost and low storage pressure

Solution 4: Using Recursion

Didn't we say we wouldn't use recursion? Why is it mentioned here? Because:

  1. Some team members may stubbornly believe that databases should not return extra data or add redundant nodes
  2. MySQL 8.0 added RECURSIVE to implement recursion at the database level
  3. Other unavoidable reasons

So if none of the previous three solutions suit your situation, you may still have to go down the recursion route. I won't go into details here; there are many articles online on this topic.

Summary

Solutions 1, 2, and 3 reduce query cost and learning cost by adding redundant fields and leverage the characteristics of different storage (databases are not good at computation, memory is suitable for fast read/write) to achieve the goal.

Solution 3 also achieves a win-win between technical cost and customer experience by analyzing and optimizing business.

Solution 4 is a fallback plan.

I personally prefer the level+path combination, as it can handle not only comments but also other tree structures well. After all, developers cannot always have the opportunity to influence business requirements, right?

If you have a better solution, feel free to leave a comment and discuss!

Keep Exploring

Related Reading

More Articles
Recent update 5/18/2026

枝见 Zhijian: A Markdown Mind Map Editor Built with Avalonia

This article introduces Zhijian, a local mind map editor based on Avalonia, supporting blank creation, folder loading, precise onboarding guidance, macOS shortcut adaptation, outline/Markdown/mind map synchronization, node notes, thumbnails, zoom, canvas dragging, and Markdown/OPML/XMind file exchange.

Continue Reading