6.006 Lec 2: Data Structures and Dynamic Arrays
Table of Contents
Difference between an interface and a data structure
The interface can also be called ADT or API.
The interface, often called ADT or API, offers the general specifications on solving the problem; it describes what operations the data structure must perform and what they will be achieving while the data structures is the actual implementation of these functions. It consists of all the commands (algorithm), describing how the interface operations will be implemented and function.
The main interfaces
There are two main interfaces:
- set
- sequence
DS approaches
Additionally there are two schools of thought:
- static arrays Their size is predefined and difficult to change
- dynamic structures Focusing on variable length and easy addition/deletion operations.
Now, we will combine these two together, focusing on sequences.
Analyzing static sequences
Static sequences are to be used when dealing with data in which the order is important, however the volume of it does not change often.
Interface functions
The methods that static data structures need to implement efficiently
build()
, O(n) The creation of a new array, runs in linear time, due to the fact that the computer needs tomalloc()
space for n elements and initialize all of their elementslength()
, O(1) Returns n, no matter how many elements are used inside the arrayiter_seq()
, O(n) Visit every element from the beginning of the array to the end of the arrayget_at(i)
, O(1)set_at(i,x)
, O(1) These two functions take constant time as explained in the word RAM model
The natural solution to this problem is the static array.
Word RAM model of computation
RAM stands out for Random Access Memory, meaning that the computer (CPU) can access every element in it in constant time. This differentiates data structures that follow this model to ones that focus on other operations.
Practically, this means that all items are stored in a continuous location in memory, where the ith field consists of xi
x[i] = xi
Reality and memory
I must have misunderstood something here, cause I can not explain it clearly
This is more of a theoretical warning at this point but, in order for the static data structures to be implemented it is crucial that the memory increases at least as fast as the \(\lg{n}\). Meaning that if you have less space than the \(\lg{n}\) you can not store all the values that you want, therefore you can not use the data structure.
Analyzing dynamic sequences
Dynamic sequences are to be used when we care more about quick insertion and sorting and not so much about random access speeds
Interface functions
Analyzing the previous statement a little bit more, we use dynamic sequences when we find ourselves dealing with the following operations1:
insert/delete_first(i,x)
, O(1) We do not need to allocate more space and copy every element in our array to another location in memory, like in ds, but we just need to create a new node make it point tohead
and set it up as the new head. Thus, constant running time.insert/delete_last()
, O(n) or O(1) Normaly O(n), due to the fact that you need to follow the pointers of each node in the list to reach the last node but O(1) is achievable through data structure augmentation; inserting a new field for our data structure ( a pointer to thetail
node ), to have instant access to the last element.insert_at(i,x),delete_at(i)
, O(n) Even though it takes 0(1) when you have reached the previous node, you can not perform the operation at 0(1), because you need to iterate through (worst case scenario) the entire array
Linked List
The linked list is a data structure based on the Dynamic Sequence ADT. It is a
sequence of nodes, each one with a value
and a next
field, pointing to the
next node.
Unlike arrays, linked lists are not stored in a consecutive chunk of memory; they are found in random positions in ram. This is the reason why you need pointers in each node, and why RAM operations do not work efficiently
Dynamic arrays
In Python: lists
Let’s try to create an improved data structure, kind of combining dynamic and static data structures to get the best out of both worlds.
The main constraint that we have with simple arrays, is that for every time that you want to add an element you have to resize the whole array. That translates into allocating a n+1 chunk of memory, copying all of the arrays elements and then adding the new one. To address that, we could create a larger array of size rougly n. Then, whenever it gets big enough, resize it but not by expanding it by a constant. Instead, by doubling (multiplying by \(c>1\)) the existing size.
This simple change would lead to:
insert_last()
: O(n)->O(1) 2
It actually is not always correct, but most of the time. There will be times
when the size of the array will no longer suffice and it will have to resize
- Fewer
malloc
calls and copying values around in memory, thus increased performance. - To visualize, think of performing n
insert_last()
in a row and compare the efficiency of static array to dynamic array
Summary: Comparing operations
operation | static array | linked list |
---|---|---|
insert/deleteat | 0(n) | 0(1) |
get/setat | 0(1) | 0(n) |
get/setlast | 0(n)/O(1) |