Recent Posts

Showing posts with label arrays. Show all posts
Showing posts with label arrays. Show all posts

How to: use arrays in C++

Arrays on the type level

An array type is denoted as T[n] where T is the element type and n is a positive size, the number of elements in the array. The array type is a product type of the element type and the size. If one or both of those ingredients differ, you get a distinct type:

#include <type_traits>

static_assert(!std::is_same<int[8], float[8]>::value, "distinct element type");
static_assert(!std::is_same<int[8],   int[9]>::value, "distinct size");

Note that the size is part of the type, that is, array types of different size are incompatible types that have absolutely nothing to do with each other. sizeof(T[n]) is equivalent to n * sizeof(T).

Array-to-pointer decay

The only "connection" between T[n] and T[m] is that both types can implicitly be converted to T*, and the result of this conversion is a pointer to the first element of the array. That is, anywhere a T* is required, you can provide a T[n], and the compiler will silently provide that pointer:

                  +---+---+---+---+---+---+---+---+
the_actual_array: |   |   |   |   |   |   |   |   |   int[8]
                  +---+---+---+---+---+---+---+---+
                    ^
                    |
                    |
                    |
                    |  pointer_to_the_first_element   int*

This conversion is known as "array-to-pointer decay", and it is a major source of confusion. The size of the array is lost in this process, since it is no longer part of the type (T*). Pro: Forgetting the size of an array on the type level allows a pointer to point to the first element of an array of any size. Con: Given a pointer to the first (or any other) element of an array, there is no way to detect how large that array is or where exactly the pointer points to relative to the bounds of the array.

Arrays are not pointers

The compiler will silently generate a pointer to the first element of an array whenever it is deemed useful, that is, whenever an operation would fail on an array but succeed on a pointer. This conversion from array to pointer is trivial, since the resulting pointer value is simply the address of the array. Note that the pointer is not stored as part of the array itself (or anywhere else in memory). An array is not a pointer.

tatic_assert(!std::is_same<int[8], int*>::value, "an array is not a pointer");
One important context in which an array does not decay into a pointer to its first element is when the & operator is applied to it. In that case, the & operator yields a pointer to the entire array, not just a pointer to its first element. Although in that case the values (the addresses) are the same, a pointer to the first element of an array and a pointer to the entire array are completely distinct types:

static_assert(!std::is_same<int*, int(*)[8]>::value, "distinct element type");
The following ASCII art explains this distinction:

      +-----------------------------------+
      | +---+---+---+---+---+---+---+---+ |
+---> | |   |   |   |   |   |   |   |   | | int[8]
|     | +---+---+---+---+---+---+---+---+ |
|     +---^-------------------------------+
|         |
|         |
|         |
|         |  pointer_to_the_first_element   int*
|
|  pointer_to_the_entire_array              int(*)[8]
Note how the pointer to the first element only points to a single integer (depicted as a small box), whereas the pointer to the entire array points to an array of 8 integers (depicted as a large box).

The same situation arises in classes and is maybe more obvious. A pointer to an object and a pointer to its first data member have the same value (the same address), yet they are completely distinct types.

If you are unfamiliar with the C declarator syntax, the parenthesis in the type int(*)[8] are essential:

  • int(*)[8] is a pointer to an array of 8 integers.
  • int*[8] is an array of 8 pointers, each element of type int*.

Accessing elements

C++ provides two syntactic variations to access individual elements of an array. Neither of them is superior to the other, and you should familiarize yourself with both.

Pointer arithmetic

Given a pointer p to the first element of an array, the expression p+i yields a pointer to the i-th element of the array. By dereferencing that pointer afterwards, one can access individual elements:

std::cout << *(x+3) << ", " << *(x+7) << std::endl;
If x denotes an array, then array-to-pointer decay will kick in, because adding an array and an integer is meaningless (there is no plus operation on arrays), but adding a pointer and an integer makes sense:

   +---+---+---+---+---+---+---+---+
x: |   |   |   |   |   |   |   |   |   int[8]
   +---+---+---+---+---+---+---+---+
     ^           ^               ^
     |           |               |
     |           |               |
     |           |               |
x+0  |      x+3  |          x+7  |     int*

(Note that the implicitly generated pointer has no name, so I wrote x+0 in order to identify it.)

If, on the other hand, x denotes a pointer to the first (or any other) element of an array, then array-to-pointer decay is not necessary, because the pointer on which i is going to be added already exists:

   +---+---+---+---+---+---+---+---+
   |   |   |   |   |   |   |   |   |   int[8]
   +---+---+---+---+---+---+---+---+
     ^           ^               ^
     |           |               |
     |           |               |
   +-|-+         |               |
x: | | |    x+3  |          x+7  |     int*
   +---+
Note that in the depicted case, x is a pointer variable (discernible by the small box next to x ), but it could just as well be the result of a function returning a pointer (or any other expression of type T*).

Indexing operator

Since the syntax *(x+i) is a bit clumsy, C++ provides the alternative syntax x[i]:

std::cout << x[3] << ", " << x[7] << std::endl;
Due to the fact that addition is commutative, the following code does exactly the same:

std::cout << 3[x] << ", " << 7[x] << std::endl;
The definition of the indexing operator leads to the following interesting equivalence:

&x[i]  ==  &*(x+i)  ==  x+i
However, &x[0] is generally not equivalent to x. The former is a pointer, the latter an array. Only when the context triggers array-to-pointer decay can x and &x[0] be used interchangeably. For example:

T* p = &array[0];  // rewritten as &*(array+0), decay happens due to the addition
T* q = array;      // decay happens due to the assignment
On the first line, the compiler detects an assignment from a pointer to a pointer, which trivially succeeds. On the second line, it detects an assignment from an array to a pointer. Since this is meaningless (but pointer to pointer assignment makes sense), array-to-pointer decay kicks in as usual.

Ranges

An array of type T[n] has n elements, indexed from 0 to n-1; there is no element n. And yet, to support half-open ranges (where the beginning is inclusive and the end is exclusive), C++ allows the computation of a pointer to the (non-existent) n-th element, but it is illegal to dereference that pointer:

   +---+---+---+---+---+---+---+---+....
x: |   |   |   |   |   |   |   |   |   .   int[8]
   +---+---+---+---+---+---+---+---+....
     ^                               ^
     |                               |
     |                               |
     |                               |
x+0  |                          x+8  |     int*
For example, if you want to sort an array, both of the following would work equally well:

std::sort(x + 0, x + n);
std::sort(&x[0], &x[0] + n);
Note that it is illegal to provide &x[n] as the second argument since this is equivalent to &*(x+n), and the sub-expression *(x+n) technically invokes undefined behavior in C++ (but not in C99).

Also note that you could simply provide x as the first argument. That is a little too terse for my taste, and it also makes template argument deduction a bit harder for the compiler, because in that case the first argument is an array but the second argument is a pointer. (Again, array-to-pointer decay kicks in.)









How to access and process nested objects, arrays or JSON

Preliminaries

In JavaScript there are only two (actually one) data types which can contain other data types: objects and arrays (a special form of objects).

Both types expose a key -> value structure. Keys in arrays must be numeric, whereas any string can be used as key in objects. The key-value pairs are also called the "properties".

Properties can be accessed either using dot notation

var value = obj.someProperty;
or bracket notation, if the property name would not be a valid JavaScript identifier name [spec], or the name is the value of a variable:

// the space is not a valid character in identifier names
var value = obj["some Property"];

// property name as variable
var name = "some Property";
var value = obj[name];
For the same reason, array elements can only be accessed using bracket notation:

var value = arr[5];

// property name / index as variable
var x = 5;
var value = arr[x];

Wait... what about JSON?

JSON is a textual representation of data, just like XML, YAML, CSV, and others. To work with such data, it first has to be converted to JavaScript data types, i.e. arrays and objects (and how to work with those was just explained). How to parse JSON is explained in the question How to parse JSON in JavaScript .

Accessing nested data structures

A nested data structure is an array or object which refers to other arrays or objects, i.e. its values are arrays or objects. Such structures can be accessed by consecutively applying dot or bracket notation.

Here is an example:

var data = {
    code: 42,
    items: [{
        id: 1,
        name: 'foo'
    }, {
        id: 2,
        name: 'bar'
    }]
};
Lets assume we want to access the name of the second item.

Here is how we can do it step-by-step:

As we can see data is an object, hence we can access its properties using dot notation. The items property is accessed as follows:

data.items
The value is an array, to access its second element, we have to use bracket notation:

data.items[1]
This value is an object and we use dot notation again to access the name property. So we eventually get:

var item_name = data.items[1].name;
What if the property names are dynamic and I don't know them beforehand?

If the property names are unknown or we want to access all properties of an object / elements of an array, we can use the for...in [MDN] loop for objects and the for [MDN] loop for arrays to iterate over all properties / elements.

for(var prop in data) {
    // `prop` contains the name of each property, i.e. `'code'` or `'items'`
    // consequently, `data[prop]` refers to the value of each property, i.e.
    // either `42` or the array
}
To iterate over all properties of data, we can iterate over the object like so:

for(var i = 0, l = data.items.length; i < l; i++) {
    // `i` will take on the values `0`, `1`, `2`,..., i.e. in each iteration
    // we can access the next element in the array with `data.items[i]`, example:
    // 
    // var obj = data.items[i];
    // 
    // Since each element is an object (in our example),
    // we can now access the objects properties with `obj.id` and `obj.name`. 
    // We could also use `data.items[i].id`.
}
One could also use for...in to iterate over arrays, but there are reasons why this should be avoided: Why is 'for(var item in list)' with arrays considered bad practice in JavaScript?.

Depending on where the object comes from (and what you want to do), you might have to test in each iteration whether the property is really a property of the object, or it is an inherited property. You can do this with Object#hasOwnProperty [MDN].

What if the "depth" of the data structure is unknown to me?

In addition to unknown keys, the "depth" of the data structure (i.e. how many nested objects / array) it has, might be unknown as well. How to access deeply nested properties depends on the exact data structure then.

If the data structure contains repeating structures, e.g. the representation of a binary tree, the solution typically includes to recursively [Wikipedia] access each level of the data structure.

Here is an example to get the first leaf node of a binary tree:

function getLeaf(node) {
    if (node.leftChild) {
        return getLeaf(node.leftChild); // <- recursive call
    }
    else if (node.rightChild) {
        return getLeaf(node.rightChild); // <- recursive call
    }
    else { // node must be a leaf node
        return node;
    }
}

var first_leaf = getLeaf(root);
DEMO

A more generic way to access a nested data structure with unknown keys and depth is to test the type of the value and act accordingly.

Here is an example which adds all primitive values inside a nested data structure into an array (assuming it does not contain any functions).

function toArray(obj) {
    var result = [];
    for (var prop in obj) {
        var value = obj[prop];
        if (typeof value === 'object') {
            result.push(toArray(value)); // <- recursive call
        }
        else {
            result.push(value);
        }
    }
    return result;
}
DEMO

Helpers

Since the structure of a complex object or array is not necessarily obvious, we can inspect the value at each step to decide how to move further. console.log [MDN] and console.dir [MDN] help us doing this. For example (output of the Chrome console):

> console.log(data.items)
 [ Object, Object ]
Here we see that that data.items is an array with two elements which are both objects. In Chrome console the objects can even be expanded and inspected immediately.

> console.log(data.items[1])
  Object
     id: 2
     name: "bar"
     __proto__: Object
This tells us that data.items[1] is an object and after expanding it we see that it has three properties, id, name and __proto__. The latter is an internal property used for the prototype chain of the object. The prototype chain and inheritance is out of scope for this answer though.