Friday, 17 March 2017

Structures & Unions




Structures

A structure is a customised user-defined data type in C. It is by definition a collection of variables of any type that are referenced under one name, providing a convenient means of keeping related information together.

Some terminology :-

structure definition  :-  the  template used to create structure variables.
structure elements  :-   the member variables of the structure type

Defining Structures


Syntax :                       struct  tag   {
                                                type  var_1 ;
                                                type var_2 ;
                                                ...
                                    type var_n ;
                                                }  ;

The keyword struct tells the compiler we are dealing with a structure and must be present whenever we refer to the new type, tag is an identifier which is the name given to the customised "type".

A variable of this new type can now be defined as follows for example. Note that the keyword struct has to be used in conjunction with our own name for the structure, tag.

                        struct tag variable ; 

For Example :-
                        struct RECORD {
                                    int rec_no ;
                                    char name[30] ;
                                    char town[40] ;
                                    char country[ 20 ]
                                    } ;
                        struct RECORD person ;

The compiler will automatically allocate enough storage to accommodate all the elements. To find out how much storage is required one might do the following

                                    size = sizeof ( person ) ;
or
                                    size = sizeof( struct RECORD ) ;

NB : The name of a structure is not the address of the structure as with array names.

Accessing Structure Elements


For example define a complex type structure as follows.

                        struct complex {
                                    double real ;
                                    double imaginary ;                   // Note that a variable may also be
                                    } cplx ;                                     // defined at structure definition time

The elements of the structure are accessed using the dot operator, . , as follows

                        cplx.real  =  10.0 ;
                        cplx.imag  =  20.23 ;
                        scanf ( "%lf", &cplx.real ) ;

or if we want to access struct RECORD already defined

                        puts( person.name ) ;

or character by character
           
                        person.name[i] = 'a' ;

Thus we treat structure elements exactly as normal variables and view the dot operator as just another appendage like the indirection operator or an array index.

Initialising Structures


Structure elements or fields can be initialised to specific values as follows :-

            struct id {
                        char name[30] ;
                        int id_no ;
                        } ;
            struct id student = { "John", 4563 } ;
 

Structure Assignment

The name of a structure variable can be used on its own to reference the complete structure. So instead of having to assign all structure element values separately, a single assignment statement may be used to assign the values of one structure to another structure of the same type.

For Example :-
                        struct {
int a, b ;
                                    }  x = {1, 2 }, y ;

                        y = x ;              // assigns values of all fields in x to fields in y

Creating more Complex Structures with Structures

Once again emphasising that structures are just like any other type in C we can create arrays of structures, nest structures, pass structures as arguments to functions, etc.

For example we can nest structures as follows creating a structure employee_log that has another structure as one of its members.

                        struct time {
                                    int hour ;
                                    int min ;
                                    int sec ;
                        } ;
                        struct employee_log {
                                    char name[30] ;
                                    struct time start, finish ;
                                    } employee_1 ;

To access the hour field of time in the variable employee_1 just apply the dot operator twice

                        employee_1.start.hour = 9 ;

Typically a company will need to keep track of more than one employee so that an array of employee_log would be useful.

            struct employee_log  workers[100] ;

To access specific employees we simply index using square braces as normal, e.g. workers[10]. To access specific members of this structure we simply apply the dot operator on top of the index.

            workers[10].finish.hour = 10 ;

When structures or arrays of structures are not global they must be passed to functions as parameters subject to the usual rules. For example

                        function1( employee_1 ) ;

implements a call to function1 which might be prototyped as follows

                        void  function1( struct employee_log emp ) ;

Note however that a full local copy of the structure passed is made so if a large structure is involved memory the overhead to simply copy the parameter will be high so we should employ call by reference instead as we will see in the next section.

Passing an array of structures to a function also follows the normal rules but note that in this case as it is impossible to pass an array by value no heavy initialisation penalty is paid - we essentially have call by reference. For example

                        function2( workers ) ;

passes an array of structures to function2 where the function is prototyped as follows.

                        function2( struct employee_log staff[ ] ) ;

Structure Pointers

As we have said already we need call by reference calls which are much more efficient than normal call by value calls when passing structures as parameters. This applies even if we do not intend the function to change the structure argument.

A structure pointer is declared in the same way as any pointer for example

                        struct address {
                                    char name[20] ;
                                    char street[20] ;
                                    } ;
                        struct address person ;
                        struct address *addr_ptr ;

declares a pointer addr_ptr to data type struct address.

To point to the variable person declared above we simply write

                        addr_ptr =  &person ;

which assigns the address of person to addr_ptr.

To access the elements using a pointer we need a new operator called the arrow operator, ->, which can be used only with structure pointers. For example

                        puts( addr_ptr -> name ) ;

For Example :- Program using a structure to store time values.

            #include <stdio.h>

            struct time_var {
                        int hours, minutes, seconds ;
                        } ;
            void display ( const struct time_var * ) ;                      /* note structure pointer and const */

            void main()
            {
            struct time_var time ;

            time.hours = 12 ;
            time.minutes = 0 ;
            time.seconds = 0 ;
            display( &time ) ;
            }
            void display( const struct time_var *t )
            {
            printf( "%2d:%2d;%2d\n", t -> hours, t -> minutes, t -> seconds ) ;
            }

Note that even though we are not changing any values in the structure variable we still employ call by reference for speed and efficiency. To clarify this situation the const keyword has been employed.

Dynamic allocation of structures


The memory allocation functions may also be used to allocate memory for user defined types such as structures. All malloc() basically needs to know is how much memory to reserve.

For Example :-
            struct coordinate {
                                    int x, y, z ;
                                    } ;
                        struct coordinate *ptr ;

                        ptr = (struct coordinate * ) malloc( sizeof ( struct coordinate )  ) ;

Bit--Fields

Bit--fields are based directly on structures with the additional feature of allowing the programmer to specify the size of each of the elements in bits to keep storage requirements at a minimum. However bit--field elements are restricted to be of type int ( signed or unsigned ).

For Example :-
                        struct clock {
                                    unsigned hour : 5 ;
                                    unsigned minutes : 6 ;
                                    unsigned seconds : 6 ;
                                    } time ;

This time structure requires 17 bits to store the information now so the storage requirement is rounded up to 3 bytes. Using the normal structure format and 32-bit integer elements we would require 12 bytes so we achieve a substantial saving.

Bit--fields can be used instead of the bitwise operators in system level programming, for example to analyse the individual bits of values read from a hardware port we might define the following bit-field.
                        struct status {
                                    unsigned bit0 : 1 ;
                                    unsigned bit1 : 1 ;
                                    ...
                                    unsigned bit15 : 1 ;
                                    } ;

If  we are interested in bit 15 only we need only do the following

                        struct status {
                                    unsigned : 15 ;             /* skip over first 15 bits with a "non-member"  */
                                    unsigned bit15 : 1 ;
                                    } ;

There are two further restrictions on the use of bit--fields

·      Cannot take the address of a bit--field variable
·      Cannot create an array of bit--field variables

Can you suggest reasons for this ?

Unions

A union is data type where the data area is shared by two or more members generally of different type at different times.

For Example :-
union u_tag {
            short ival ;
            float fval ;
            char cval ;
            } uval ;

The size of uval will be the size required to store the largest single member, 4 bytes in this case to accommodate the floating point member.

Union members are accessed in the same way as structure members and union pointers are valid.

                                    uval.ival = 10 ;
                                    uval.cval = 'c' ;

When the union is accessed as a character we are only using the bottom byte of storage, when it is accessed as a short integer the bottom two bytes etc. It is up to the programmer to ensure that the element accessed contains a meaningful value.

A union might be used in conjunction with the bit-field struct status in the previous section to implement binary conversions in C.

For Example :-
                                    union conversion {
                                    unsigned short num ;
                                                struct status bits ;
                                    } number ;

We can load number with an integer

                        scanf( "%u", &number.num );

Since the integer and bit--field elements of the union share the same storage if we now access the union as the bit--field variable bits we can interpret the binary representation of num directly.

i.e.                   if ( uvar.bits.bit15 )
                                    putchar( '1' ) ;
                        else
                                    putchar('0') ;
                        ...
                        if ( uvar.bits.bit0 )
                                    putchar( '1' ) ;
                        else
                                    putchar('0') ;

Admittedly rather inefficient and inelegant but effective.

Enumerations

An enumeration is a user defined data type whose values consist of a set of named integer constants, and are used for the sole purpose of making program code more readable.

Syntax:            enum tag { value_list } [ enum_var ] ;

where tag is the name of the enumeration type, value_list is a list of valid values for the enumeration, and where enum_var is an actual variable of this type.

For Example :-
                        enum colours { red, green, blue, orange } shade ;
                                                            //  values red - 0, green - 1, blue - 2, orange - 3

                        enum day { sun = 1, mon, tue, wed = 21, thur, fri, sat } ;
                        enum day weekday ;
                                                            //  values are 1, 2, 3, 21, 22, 23, 24

Variables declared as enumerated types are treated exactly as normal variables in use and are converted to integers in any expressions in which they are used.

For Example :-
                        int i ;

                        shade = red ;    // assign a value to shade enum variable

                        i = shade ;        // assign value of enum to an int

                        shade = 3 ;       // assign valid int to an enum, treat with care

The typedef Keyword

C makes use of the typedef  keyword to allow new data type names to be defined. No new type is created, an existing type will now simply be recognised by another name as well. The existing type can be one of the in-built types or a user-defined type.

Syntax :           typedef  type  name ;

where type is any C data type and name is the new name for this type.

For Example :-
                        typedef  int INTEGER ;
                        INTEGER i ;                          // can now declare a variable of type ‘INTEGER’

                        typedef double * double_ptr ;
                        double_ptr  ptr ;                                   //  no need of * here as it is part of the type

                        typedef struct coords {
                                    int x, y ;
                                    } xycoord ;                  //  xycoord is now a type name in C
                        xycoord  coord_var ;
The use of typedef  makes program code easier to read and when used intelligently can facilitate the porting of code to a different platform and the modification of code. For example in a first attempt at a particular program we might decide that floating point variables will fill our needs. At a later date we decide that all floating point variables really should be of type double so we have to change them all. This problem is trivial if we had used a typedef as follows :-

                        typedef float FLOATING ;

To remedy the situation we modify the user defined type as follows 

                        typedef double FLOATING ;

Linked Lists

When we have needed to represent collections of data up until now we have typically used data structures such as arrays whose dimensions were fixed at compile time or which were dynamically allocated as required. These arrays could have been arrays of basic data types, e.g. doubles, or could have been arrays of structures when more complex data needed to be represented.

However this method of representing data, while perfectly adequate in many situations, is limited and can be very inefficient when we are dealing with situations where the data collection has to be modified at run-time.

For example if we are dealing with an ordered list of records and we need to insert a new record, for John say, in the correct position in the list we might have to go through the following steps.

1.   Expand the memory space allocated to the array to take one more record.

Alan
Ben
Sam
Tom


2. Determine the correct position in the array at which to insert the element – position 3.

3. Reposition the data so that the appropriate position is left unused.

Alan
Ben

Sam
Tom

4. Insert the new element into the list.

Alan
Ben
John
Sam
Tom


Whatever our internal representation of the data, steps 1,2 & 4 will have to be carried out in one form or another. However step 3 can be avoided if we represent the data in an alternate fashion.

If we can represent our data as a chain of inter-connected records as illustrated below where each record has its own memory space and knows which record comes before and after it we have a much more flexible structure.

Now when we want to insert a new record in position we still must allocate the appropriate amount of memory to store it (note in this case it is a discrete block) and we still must figure out which position it belongs in but we no longer have to manipulate the position of the other records in memory. These will now stay exactly where they are but we will have to inform the records before and after the position at which we wish to insert that they will be pointing to a different record.

Using the normal terminology we say we have a list of nodes linked together by pointer links. To insert a new item into the list we break an existing link and create two new links incorporating the new node into the list.

Nodes or Self-Referential Structures


A structure that contains a pointer member that points to a structure of the same type as itself is said to be a self-referential structure. For example :-

struct node {
            char name[20] ;
            struct node *nextnode ;
};
This defines a new type, struct node, which has a pointer member, nextnode, which points to a structure of the same type as itself. This node pointer allows this current node to be linked to another and so a list of linked nodes can be built up.

Note that even though the structure is not fully defined when the nextnode field is specified the compiler does not have a problem as all it needs to know is that there is such a type.

The above definition of struct node allows us to build a singly linked list i.e. each node only knows where the node following it is located. A null pointer is used to indicate the end of a linked list structure.

The diagram below illustrates how this node structure would be used to represent our linked list above. Each node contains two fields the actual data and a pointer link to the next node. The final node in the list contains a null pointer link as indicated by the slash in the diagram.

Connecting the Nodes

The basic building block in a linked list is just a C structure so we can initialise it in the same way as any other structure. For example in our above situation to allocate and initialise the first node we might do the following :-

            struct node *node_1 ;

                        node_1 = ( struct node *)malloc( sizeof ( struct node ) ) ;

            strcpy( node_1->name, “Alan” ) ;
            node_1->nextnode = NULL ;

This block of code initialises this node only and because the nextnode pointer is assigned a null pointer value it designates it as the final (and only) node in the list.

If we want to add on an item to the list we can do the following :-

                        struct node *node_2 ;

            node_2 = ( struct node *)malloc( sizeof ( struct node ) ) ;

            strcpy( node_2->name, “Ben” ) ;
                        node_2->nextnode = NULL ;
            node_1->nextnode = node_2 ;

Note that it is only the last line of code that adds the item onto the list. The remaining code just allocates storage and initialises a new node as before.

We now have a linked list with two nodes. The node pointer node_1 points to the first element in the list which is normally called the list head. As all the nodes in the list are connected via their link pointer this is in fact a pointer to the complete linked list so we wouldn’t need to store node_2, etc. at all. The last node has a null pointer which identifies it as the last node in the linked list and is called the list tail.

Operations on Linked Lists

When working with linked lists we will need to perform a number of operations quite often so it makes sense to build up a library of these at the outset. We will need to be able to insert items into the list, remove items from the list, traverse the list, etc.

We will continue to use our definition of a node except that we will typedef it as follows :-

typedef struct node {
            char name[20] ;
            struct node *nextnode ;
} list_node ;

Note in this case we cannot use the typedef  name to declare nextnode as this name is not in existence at that point.

To declare the linked list we simply declare a pointer to one of these structure types.

                        list_node *ptr_head = NULL ;
Traversing a List

In many list operations we will need to process the information in each node, for example to print the complete information in the list; this is termed traversing a list.

An iterative version of a print_list function might be as follows :-

void print_list( list_node *p_head )
{
list_node *node_ptr ;

for ( node_ptr = p_head ; node_ptr != NULL ; node_ptr = node_ptr->nextnode )
            puts( node_ptr->name ) ;
}

whereas a recursive version might be implemented as follows :-

void print_list( list_node *p_head )
{
if ( p_head == NULL )
            puts(“\n” );
else
            {
            puts( p_head->name ) ;
            print_list( p_head->nextnode ) ;
            }
}
Inserting a Node

There are a variety of ways this function can be implemented - inserting a node in the correct order, at the head, at the tail, by position, etc. For our example we will write a function to insert a new node in the correct order assuming the node has been fully created before being passed to us.

void insert( list_node **p_head, list_node *new )                  
{                                                         
list_node *node_ptr = *p_head ;
list_node *prev = NULL ;

while (  node_ptr != NULL  && (strcmp( new->name, node_ptr->name ) > 0 ) )
            {
            prev = node_ptr ;
            node_ptr = node_ptr->nextnode ;
            }

if ( prev == NULL )                 // head position
            {
            new->nextnode = *p_head ;
            *p_head = new ;
            }
else
            {
            prev->nextnode = new ;
            new->nextnode = node_ptr ;
            }
}
Removing a Node

We will implement this as a function which finds the node by the name field, unlinks it from the list and returns a pointer to the removed node without freeing its associated memory.

list_node *delete( list_node **p_head, char *name )
{
list_node *prev = NULL, *curr, *node_ptr = *p_head ;

while (  node_ptr != NULL  && strcmp( name, node_ptr->name ) )
            {
            prev = node_ptr ;
            node_ptr = node_ptr->nextnode ;
            }
if ( prev == NULL )                 // removing head position
            {
            *p_head = (*p_head)->nextnode ;
            return node_ptr ;
            }
else      {
            prev->nextnode = node_ptr->nextnode ;
            return node_ptr ;
            }
}
Note the use of double indirection to type list_node in each of the above functions. Can you explain why this is necessary ?

The following main function is a trivial illustration of how to use the above functions in practice.

void main()
{
list_node *ptr_head = NULL ;
struct node *node_1, *node_2 ;

node_1 = ( struct node *)malloc( sizeof ( struct node ) ) ;
strcpy( node_1->name, "Alan" ) ;
insert( &ptr_head, node_1 ) ;

node_2 = ( struct node *)malloc( sizeof ( struct node ) ) ;
strcpy( node_2->name, "Ben" ) ;
insert( &ptr_head, node_2) ;

puts( “List with Alan & Ben”) ;
print_list( ptr_head ) ;

delete( &ptr_head, "Ben" ) ;

puts( “List with Alan”) ;
print_list( ptr_head ) ;
}

What we have discussed in the example above is one of the more general variations of a linked list. Many different implementations can be implemented with slightly different functionality - for example we can implement doubly linked lists, stacks (LIFO) or queues (FIFO) using the same basic building blocks.

 

7.7 Efficiency Considerations


As described previously pointers should be used to implement call-by-reference in passing large user defined data items to functions to reduce the copying overhead involved in call-by-value. If implementing a typical call-by-value situation use of the const keyword can protect us from inadvertent modification of the referenced argument.


Care should be taken not to go overboard in defining large complicated data structures without need and to use them appropriately.

For example when dealing with lists of data of any type a normal array may sometimes be a better choice as a data structure rather than a linked list. The big advantage a linked list has over a linear array is that you can insert or delete items at will without shifting the remaining data.

However the extra overhead involved in setting up the linked list, both the calls to malloc() to allocate each new element and the ‘extra’ pointer associated with each element, may not justify its selection.

Suppose we need to keep a list of integers. Then most of the storage is taken up with the extra pointer information used solely for maintaining the list ( the pointer to the next element and the pointer to each integer ).

 Exercises

1.  Write a program which will simulate the action of a  digital clock continuously as closely as possible. Use a structure to hold the values of hours, minutes and seconds required. Your program should include a function to display the current time and to update the time appropriately using a delay function/loop to simulate real time reasonably well.  Pointers to the clock structure should be passed to the display and update functions rather than using global variables.

2.  Complex numbers are not supported as a distinct type in C but are usually implemented by individual users as user defined structures with a set of functions representing the operations possible on them. Define a data type for complex values and provide functions to support addition, subtraction, multiplication and division of these numbers. Use call by reference to make these functions as efficient as possible and try to make their use as intuitive as you can.

3.   Write a program which will

(a) set up an array of structures which will hold the names, dates of birth, and addresses of  a number of employees.
            (b) assign values to the structure from the keyboard.
(c) sort the structures into alphabetical order by name using a simple bubble sort algorithm.
            (d)  print out the contents of a specific array element on demand.

4.  Use a combination of bit--fields and unions in C to print out the bit patterns of any unsigned
integer values input at the keyboard.

5.   BCD (Binary Coded Decimal)  codes are commonly used in industrial applications to represent
decimal numbers. Here four bits are used to represent each decimal digit , e.g.

                0000  ==>   0
                0001  ==>   1
                ...
                1001  ==>   9

Thus short int can be used to represent a four digit decimal number.

Use Bit--Fields to implement conversion routines to encode a decimal number into BCD format and back again.

6.  Redo problem 1 using a bit--field structure as an illustration of how memory requirements may
be reduced. Compare the old and new memory requirements for data storage.

7. Using a doubly linked list data structure write a program that maintains an alphabetically ordered list of student registration records. The information held in each record should include the student’s name, student identity number, address, course, etc. Your program should allow the user to add on records to the list, to remove records from the list, to amend records and to produce a screen dump of all records. Use distinct functions for all major operations if possible.

8.   Use a FIFO implementation of a linked list to represent a queue of people entered by the user.