This article is contributed by netizen
长空X, welcome to share and repostOriginal author: 长空 X (CSDN username "长空 X", author of CkTools, github: https://github.com/hjkl950217)
Original link: https://www.cnblogs.com/gtxck/articles/16293295.html
Background
Today, while chatting with 懒得勤快 about handling tree tables, we found that both of us currently have to use recursive queries to query tree tables. This approach is very inefficient and hard to maintain. Is there a simpler way to query in parallel? Later, we actually discussed a method, and he quickly implemented it on his website.
Disclaimer
The several solutions in the article are the result of our discussions and some online resources. There are countless design approaches. The designs introduced in the article are for most scenarios that require tree tables, but they do not represent the optimal solution! The optimal solution is already a combination of factors such as design approach, team skill level, business situation, etc. This sharing is only meant to help you find your optimal solution faster.
What is a Tree Table?
In a relational database, a table that stores tree structures. For example, if a field needs to select a category with levels one, two, ... N, it can be designed like this:
| ID | PID | Name or Content |
|---|---|---|
| 1 | Comment 1 | |
| 2 | 1 | Comment 2 |
| 3 | 1 | Comment 3 |
| 4 | 3 | Comment 4 |

Such data can form a tree from our university data structures, used to express hierarchical relationships. Here, Id is usually best as a number, but there are cases where it is not a number; this may affect the choice of solution, as will be mentioned later.
The entity definition for this data structure is generally as follows:
class CommentEntity
{
public int ID {get;set;}
public int PID {get;set;}
//.. some data fields
public CommentEntity ParentNode {get;set;}
public List<CommentEntity> ChildNode {get;set;}
}
The entity defines ParentNode to point to the parent node, and ChildNode to point to several child nodes. If you have knowledge of linked lists from data structures, you can see that these two fields act like pointer fields.
Data is stored in the database as rows. If we get the data and assemble the ParentNode and ChildNode references, we can use it according to actual business needs.
What is it Used For?
Anything with a subordinate relationship can be stored this way, for example: permission relationships, categories, types, level divisions, administrative divisions, comments, etc.
The tricky part is the inconvenience of querying. For instance, if you want to query all data under a first-level category, the traditional way is to first query the first-level category with id=1, then query data where PID=1, then query data where PID = the ID of the previously queried data, and so on recursively multiple times until done.
Goal
Let's use comments as an example.

Requirements:
- When entering the page,
paginate querythe main comments, then display replies hierarchically - Be able to query all subordinate comments based on a specific comment
- Query in parallel rather than recursively
- Each comment data can be either a main comment or a sub-comment
Solution 1: Using Tag to Mark the Tree
This solution adds a field tag to mark the entire tree, with the following structure:
| ID | PID | Tag | Content |
|---|---|---|---|
| 1 | ArticleId1 | Comment 1 | |
| 2 | 1 | ArticleId1 | Comment 2 |
| 3 | 1 | ArticleId1 | Comment 3 |
| 4 | 3 | ArticleId1 | Comment 4 |
Tag is used for database queries, ID and PID for assembling data in memory. Also, create a non-clustered index on the Tag column.
Query method:
Here, the new field is the same across all nodes in the tree. At most, query the database twice, then rearrange the reference relationships in memory using Pid, pruning unnecessary data.
First query: Use the comment ID to query the article ID (if the article ID is already known, go directly to step 2)
Second query: Use the article ID to query all data
Pagination query: After querying, prune unnecessary data in memory
These considerations are based on:
- If Id is a number, consecutive data is
likelyto be storedcontiguously on disk, which improves disk I/O efficiency. If Id is not a number, creating a non-clustered index onArticleIdalso allows fast querying. - Assembling reference relationships in memory is very fast and does not require recursion (traverse and use PID to find, then directly add to
ChildNodeand assign toParentNode) - The design logic is simple, maintainable by developers at or above intern level
Disadvantage: If a comment tree has 1000 levels, it will inevitably retrieve a huge amount of useless data
Improvement: Using Level to Mark Depth
Add a level field:
| ID | PID | tag | level | Content |
|---|---|---|---|---|
| 1 | ArticleId1 | 1 | Comment 1 | |
| 2 | 1 | ArticleId1 | 2 | Comment 2 |
| 3 | 1 | ArticleId1 | 2 | Comment 3 |
| 4 | 3 | ArticleId1 | 3 | Comment 4 |
When querying, attach level to reduce the amount of useless data transferred, then reuse the assembly code above.
Solution 2: Using Path to Mark Dependency Path
Borrowing an image from the internet to illustrate the idea directly (source unknown, will remove if infringement):

Combine with the above and modify:
| ID | PID | Tag | Path | Content |
|---|---|---|---|---|
| 1 | ArticleId1 | Comment 1 | ||
| 2 | 1 | ArticleId1 | 1 | Comment 2 |
| 3 | 1 | ArticleId1 | 1 | Comment 3 |
| 4 | 3 | ArticleId1 | 1,2 | Comment 4 |
When writing a child node, you need to know the path of the parent node, but generally this requirement can be met. Tag and Path are used for database queries, ID and PID for assembling data in memory.
Query method:
Query all: Query all data by article ID, then assemble using Pid in memory
Query data under id 2:
First query: Query the path of id=2
Second query: Query id=2 or startwith $",2"
Pagination query:
First query the first X items sorted by time using article ID, then perform a second query to get nested comment data. In the second query, multiple startwith conditions can be combined.
Also, it is recommended to redundantly include the level field to reduce query workload. Although level information is implicitly contained in path, it is not query-friendly.
These considerations are based on:
- Similar to solution 1, but with lower learning cost
Disadvantage: Not a serious disadvantage; when filtering with path to query child nodes, indexes cannot be utilized.
Solution 3: Not Designing Nested Comments
Taking inspiration from Zhihu's design, a picture says it all:

In Zhihu's structure, there are only comments and replies. A reply only needs to save the ID of the previous comment. This approach is not only simple to design but also provides an excellent reading experience (deep nesting is not necessarily good-looking).
| ID | PID | GroupID | Tag | Content |
|---|---|---|---|---|
| 1 | 1 | ArticleId1 | Comment 1 | |
| 2 | 1 | 1 | ArticleId1 | Comment 2 |
| 3 | 1 | 1 | ArticleId1 | Comment 3 |
| 4 | 3 | 1 | ArticleId1 | Comment 4 |
| 5 | 2 | ArticleId1 | Comment 5 |
Query method:
Query all: Query all data where PID is null by article ID, then assemble using PID in memory
Query data under id 1 and its subordinates: Query data where GroupID = 1. In this design, replies are not queried separately.
Advantages: Very low learning cost and low storage pressure
Solution 4: Using Recursion
Didn't we say we wouldn't use recursion? Why is it mentioned here? Because:
- Some team members may stubbornly believe that databases should not return extra data or add redundant nodes
- MySQL 8.0 added
RECURSIVEto implement recursion at the database level - Other unavoidable reasons
So if none of the previous three solutions suit your situation, you may still have to go down the recursion route. I won't go into details here; there are many articles online on this topic.
Summary
Solutions 1, 2, and 3 reduce query cost and learning cost by adding redundant fields and leverage the characteristics of different storage (databases are not good at computation, memory is suitable for fast read/write) to achieve the goal.
Solution 3 also achieves a win-win between technical cost and customer experience by analyzing and optimizing business.
Solution 4 is a fallback plan.
I personally prefer the level+path combination, as it can handle not only comments but also other tree structures well. After all, developers cannot always have the opportunity to influence business requirements, right?
If you have a better solution, feel free to leave a comment and discuss!