Performance Analysis of Thirty-Eight Linked List Variations Versus STL


Introduction

A few days ago I wrote an article proposing a few improvements on the basic concept of the linked list. As promised in that first article I have implemented each variant on the original linked list and have done some performance tests under different conditions to see how it all compares.

Feel free to read this long boring article or just look at the data sheet (PDF format) on which I have put all the data and a condensed explanation.


Overview of the Variants
The base form from which I start is the doubly linked list. This is a structure which can store data and grow without ever running out of storage space. (At least, not until the RAM and swap are both full.) As more data is stored in the doubly linked list, more memory is allocated, and pointers are used to navigate between these seperate memory locations like links on a chain.

The main problem with the linked list is the overhead associated with retreiving a piece of data from the list. Because the data is not stored contigously retreiving a piece of data means that you have to start at one link and follow the pointers to jump from one link to the next until you reach your destination. In that respect the way to speed up the retreival process is to limit the number of “jumps” that have to made to find the data that needs to be retreived.

My first modification to the process is to add a “cursor.” The cursor is a simple pointer that keeps track of the linked list member that contained the piece of information requested last. The concept is simple. Typically when you want to get data out of a linked list you retrieve the first piece of data, then the second, and so on, consecutively requesting each piece of data in turn.

By enabling the linked list to keep track of which link it was on last it guarantees that for consecutive access, the next piece of data that will be requested is only one link away from the current link. Processor cycles will not be wasted navigating between the links. The principal is to minimize the processor cycles spent on moving between links by anticipating what link will be needed next.

However, this modification does little if the data in the linked list is requested randomly. This is because the next piece of data requested could be anywhere in the linked list, in any one of the links. To speed this up I made the linked list keep track of three locations: the first link of the list, the last accessed link, and the last link. When data is requested from the linked list it does a quick check to see which one of these known links is closest to the link containing the requested data. Then it starts from that link and traversing the linked list until it reaches the desired destination. Using this technique the linked list never has to traverse more than half of the links, and frequently only a third of the links will have to be traversed before the destination link is reached.

Once again, retrieval from the linked list was improved by providing more known access points so that time would not be wasted moving from link to link. However there is a limit to this technique. For lists with thousands of links it will reach a point where three access points (beginning, end, and last link) will not be enough. You could perhaps speed it up by adding an access point at the middle of the list, or at the one-fourth and three-fourths marks, but even that isn’t a real solution.

For the final version of the linked list I greatly reduced the need to traverse linked lists by making each link of the list store multiple pieces of data. This type of linked list is called an unrolled linked list. Even if each link only contains two pieces of data it cuts the amount of links in half, thus making retrieval roughly twice as fast.

By tailoring the amount of data each link will hold to the size of the linked list the number of links can be minimized and the retrieval speeds can be made much faster.


About the Linked List Tests

All tests were performed on a linked list that stored ten thousand characters. After storing the ten thousand characters in the linked list I performed two retrieval tests. The first test did one million consecutive retrievals of characters from the list. The second test did one million random retrievals of characters from the list.

The computer I used for the tests is my development machine, a MacBook running Mac OS X on a 2 GHz Intel Core 2 Duo with 2 GB of 667 MHz DDR2 SDRAM.

The timing results are given in seconds, and used the clock() routine to get the start and end times. This means that there is a theoretical error margin of ±10 milliseconds. However in practice multiple iterations of each test on my processor and operating system gave me an error of about ±2 milliseconds.


Traditional Double Linked List

This is a classic double linked list, the kind straight out of any beginners C++ primer. It stores one piece of data per link and each retrieval means that the class has to start at the beginning of the list and traverse each link until it gets to the destination link.

Storage TimeConsecutive RetrievalRandom Retrieval
.00359.95459.964

As would be expected this is a very slow implementation of the linked list. Because it does not keep track of which link was used last, a request for data from the ten thousandth link in the list requires that all of the previous 9999 links have to scanned through before it reaches the target link. Over one million retrievals the random retrieval time balances out to be just about the same.


Double Linked List With Cursor

This version is the very same, except that it includes a cursor that allows it to keep track of which member contained the piece of data requested last. Like its predecessor it only stores one piece of data per link.

Storage TimeConsecutive RetrievalRandom Retrieval
.003.06436.723

This version has significantly optimized consecutive retrieval. This is to be expected as the linked list never has to scan through more than one link. However, it also has fairly impressive benefits for random retrieval as well. This is because the requested piece of data is more likely to be closer to the cursor than it is to be closer to the beginning of the list.

The cursor has links on both sides of it unlike the start of the list which has links on only one side. The result is a whopping 23 second speed optimization.


Double Linked With Smart Link Cursor

This version is the same as the last except that it uses intelligent decision making in choosing which member to start at. If the desired link is closest to the cursor it starts navigating the list at the cursor. If the desired link is closer to the beginning or the end it starts at those links instead.

Storage TimeConsecutive RetrievalRandom Retrieval
.003.06618.477

There is a slight increase in the time required to do one million consecutive retrieval's but that is to be expected because of the extra logic statements added to this variant. However, the gains for random retrieval more than make up for the loss in consecutive retrieval.


Unrolled Link List Variants

These variants all have one thing in common: each link holds more than one piece of information. Interestingly, this simple change has even greater effects than any of the other modifications.

Each time the amount of data stored per link is doubled the amount of time required for retrieval is cut in half. In fact for any link size of n the time required for retrieval will be roughly 1/nth of the time required for retrieval if each link only stores one piece of data.


Traditional Double Linked List Unrolled

Link SizeStorage TimeConsecutive RetrievalRandom Retrieval
1.00359.95459.964
2.00130.04830.103
3.00120.05520.094
4.00115.04015.052
5.00011.99212.037
6.0009.98110.026
7.0008.5398.577
8.0007.4507.491
9.0006.6006.643
10.0005.9165.959
1000.000.098.129




Unrolled Double Linked List with Cursor

Link SizeStorage TimeConsecutive RetrievalRandom Retrieval
1.003.06436.723
2.001.05618.309
3.001.05312.174
4.001.0509.120
5.000.0507.277
6.000.0496.060
7.000.0485.187
8.000.0484.539
9.000.0484.039
10.000.0483.631
1000.000.048.122


In this variant increasing the link size has very little effect on consecutive retrieval speed because the link cursor minimizes link traversal anyway.

However, the double the link size and half the retrieval time continues to prove true for random retrieval.



Unrolled Double Linked List with Smart Link Cursor

Link SizeStorage TimeConsecutive RetrievalRandom Retrieval
1.003.06618.447
2.001.0629.283
3.001.0606.223
4.001.0594.689
5.000.0593.765
6.000.0593.148
7.000.0592.706
8.000.0592.369
9.000.0582.112
10.000.0581.900
1000.000.057.110


As mentioned earlier the logic associated with the smart link cursor makes access times for consecutive retrieval slightly slower.

However, the access times for random retrieval are cut nearly in half, at least up until the higher link storage intervals in which function overhead causes the curve of the speed gain to reach a natural limit.


A Minor Variation: Bitwise Shift Rather Than Division

For the variations in which each link holds multiple pieces of information a division problem is required to determine which link holds the desired piece of information:

indexOfDesiredData/amountOfDataPerLink = linkHoldingDesiredData
My idea was to use a right shift rather than division. However this had minimal results on the speed of retrieval while limiting link size to a power of two. The following speed tests were on a version of the linked list that had a normal link cursor.

Link SizeStorage TimeConsecutive RetrievalRandom Retrieval
2.001.04918.198
4.001.0439.013
5.000.0404.470
16.000.0392.262
1024.000.038.103


The Final Analysis Versus STL
I decided that to test how successful these linked list modifications are I would test them against similar Standard Template Library libraries. In the final version, in which each link holds more than one piece of information the class is more like the STL vector or deque than its list class. In addition, my class allows random access, whereas the list class from STL only allows consecutive access.

STL TypeStorage TimeConsecutive RetrievalRandom Retrieval
Vector.002.103.120
Deque.001.153.168

From what research I've done it appears that deque works pretty much like my linked list class in that it allocates chunks of memory and links these together. However, there must be one significant difference: it doesn't have a cursor like my version. You'll notice that there is not a significant difference between consecutive access and random access.

In my best version consecutive access is twice as fast as random retrieval. However in the STL version consecutive access is just as slow. Consecutive retrieval from my class takes only a third of the time that it takes from the STL version.

My Class: .057 seconds for consecutive access
STL Vector: .103 seconds for consecutive access

My Class: .110 seconds for random access
STL Vector: .168 seconds seconds for random access

For a more condensed and organized overview of the results look at the PDF datasheet I put together from the test results.


Conclusion

In real life applications a linked list with ten thousand members that required one million retrievals probably wouldn't care about such minor speed benefits from my class. The truth of the matter is that anything that would have to be done with those retrieved results would probably take considerably longer than the retrieval itself.

However, this has still been a very interesting experiment into the nature of the linked list and how to optimize it.

1 comment:

  1. (1) You should test an implementation of std::list (use std::advance(mylist::begin()+N for random access). (2) The three containers tested have very different performances for insertion/removal in random/begin/end positions, which should be compared. (3) The Standard Library is designed to use allocators, which may add some overhead to your list class. (4) It would also be interesting to compare memory usage (via a custom allocator)

    ReplyDelete