Structures
A structure is a
customised user-defined data type in C. It is by definition a collection of
variables of any type that are referenced under one name, providing a
convenient means of keeping related information together.
Some terminology :-
structure
definition :- the
template used to create structure variables.
structure
elements :- the member variables of the structure type
Defining Structures
Syntax : struct tag {
type var_1 ;
type
var_2 ;
...
type var_n ;
} ;
The keyword struct tells the compiler we are
dealing with a structure and must be present whenever we refer to the new type,
tag is an identifier which is the
name given to the customised "type".
A variable of this
new type can now be defined as follows for example. Note that the keyword
struct has to be used in conjunction with our own name for the structure, tag.
struct tag variable
;
For Example :-
struct RECORD {
int rec_no ;
char
name[30] ;
char
town[40] ;
char
country[ 20 ]
} ;
struct RECORD person ;
The compiler will
automatically allocate enough storage to accommodate all the elements. To find
out how much storage is required one might do the following
size =
sizeof ( person ) ;
or
size =
sizeof( struct RECORD ) ;
NB : The name of a structure is not the
address of the structure as with array names.
Accessing Structure Elements
For example define a
complex type structure as follows.
struct complex {
double real
;
double
imaginary ; // Note that
a variable may also be
} cplx ; // defined
at structure definition time
The elements of the
structure are accessed using the dot
operator, . , as follows
cplx.real = 10.0
;
cplx.imag =
20.23 ;
scanf ( "%lf",
&cplx.real ) ;
or if we want to access
struct RECORD already defined
puts( person.name ) ;
or character by
character
person.name[i] = 'a' ;
Thus we treat
structure elements exactly as normal variables and view the dot operator as
just another appendage like the indirection operator or an array index.
Initialising Structures
Structure elements or
fields can be initialised to specific values as follows :-
struct id {
char name[30] ;
int id_no ;
} ;
struct id student = {
"John", 4563 } ;
Structure Assignment
The name of a
structure variable can be used on its own to reference the complete structure.
So instead of having to assign all structure element values separately, a
single assignment statement may be used to assign the values of one structure
to another structure of the same type.
For Example :-
struct {
int a, b ;
} x = {1, 2 }, y ;
y = x ; // assigns values of all fields in
x to fields in y
Creating more Complex Structures with Structures
Once again
emphasising that structures are just like any other type in C we can create
arrays of structures, nest structures, pass structures as arguments to
functions, etc.
For example we can
nest structures as follows creating a structure employee_log that has another
structure as one of its members.
struct time {
int hour ;
int min ;
int sec ;
} ;
struct employee_log {
char
name[30] ;
struct time
start, finish ;
} employee_1
;
To access the hour
field of time in the variable employee_1 just apply the dot operator twice
employee_1.start.hour =
9 ;
Typically a company
will need to keep track of more than one employee so that an array of employee_log would be useful.
struct employee_log workers[100] ;
To access specific
employees we simply index using square braces as normal, e.g. workers[10]. To
access specific members of this structure we simply apply the dot operator on
top of the index.
workers[10].finish.hour = 10 ;
When structures or
arrays of structures are not global they must be passed to functions as
parameters subject to the usual rules. For example
function1( employee_1 )
;
implements a call to
function1 which might be prototyped as follows
void function1( struct employee_log emp ) ;
Note however that a
full local copy of the structure passed is made so if a large structure is
involved memory the overhead to simply copy the parameter will be high so we
should employ call by reference instead as we will see in the next section.
Passing an array of
structures to a function also follows the normal rules but note that in this
case as it is impossible to pass an array by value no heavy initialisation
penalty is paid - we essentially have call by reference. For example
function2( workers ) ;
passes an array of
structures to function2 where the
function is prototyped as follows.
function2( struct
employee_log staff[ ] ) ;
Structure Pointers
As we have said
already we need call by reference calls which are much more efficient than
normal call by value calls when passing structures as parameters. This applies
even if we do not intend the function to change the structure argument.
A structure pointer
is declared in the same way as any pointer for example
struct address {
char
name[20] ;
char
street[20] ;
} ;
struct address person ;
struct address *addr_ptr ;
declares a pointer
addr_ptr to data type struct address.
To point to the
variable person declared above we simply write
addr_ptr = &person ;
which assigns the
address of person to addr_ptr.
To access the
elements using a pointer we need a new operator called the arrow operator,
->, which can be used only with
structure pointers. For example
puts( addr_ptr ->
name ) ;
For Example :-
Program using a structure to store time values.
#include <stdio.h>
struct time_var {
int hours, minutes,
seconds ;
} ;
void display ( const struct time_var
* ) ; /* note
structure pointer and const */
void main()
{
struct time_var time ;
time.hours = 12 ;
time.minutes = 0 ;
time.seconds = 0 ;
display( &time ) ;
}
void display( const struct time_var
*t )
{
printf( "%2d:%2d;%2d\n", t
-> hours, t -> minutes, t -> seconds ) ;
}
Note that even though
we are not changing any values in the structure variable we still employ call
by reference for speed and efficiency. To clarify this situation the const keyword has been employed.
Dynamic allocation of structures
The memory allocation
functions may also be used to allocate memory for user defined types such as
structures. All malloc() basically needs to know is how much memory to reserve.
For Example :-
struct coordinate {
int x, y, z
;
} ;
struct coordinate *ptr ;
Bit--Fields
Bit--fields are based
directly on structures with the additional feature of allowing the programmer
to specify the size of each of the elements in bits to keep storage
requirements at a minimum. However bit--field elements are restricted to be of
type int ( signed or unsigned ).
For Example :-
struct clock {
unsigned
hour : 5 ;
unsigned
minutes : 6 ;
unsigned
seconds : 6 ;
} time ;
This time structure
requires 17 bits to store the information now so the storage requirement is
rounded up to 3 bytes. Using the normal structure format and 32-bit integer
elements we would require 12 bytes so we achieve a substantial saving.
Bit--fields can be
used instead of the bitwise operators in system level programming, for example
to analyse the individual bits of values read from a hardware port we might
define the following bit-field.
struct status {
unsigned
bit0 : 1 ;
unsigned
bit1 : 1 ;
...
unsigned
bit15 : 1 ;
} ;
If we are interested in bit 15 only we need only
do the following
struct status {
unsigned :
15 ; /* skip over first 15
bits with a "non-member" */
unsigned
bit15 : 1 ;
} ;
There are two further
restrictions on the use of bit--fields
·
Cannot take the address of a
bit--field variable
·
Cannot create an array of
bit--field variables
Can you suggest reasons for this ?
Unions
A union is data type where the data area is
shared by two or more members generally of different type at different times.
For Example :-
union u_tag {
short
ival ;
float
fval ;
char
cval ;
}
uval ;
The size of uval will be the size required
to store the largest single member, 4 bytes in this case to accommodate the
floating point member.
Union members are accessed in the same way
as structure members and union pointers are valid.
uval.ival
= 10 ;
uval.cval
= 'c' ;
When the union is accessed as a character
we are only using the bottom byte of storage, when it is accessed as a short
integer the bottom two bytes etc. It is up to the programmer to ensure that the
element accessed contains a meaningful value.
A union might be used in conjunction with
the bit-field struct status in the
previous section to implement binary conversions in C.
For Example :-
union
conversion {
unsigned
short num ;
struct
status bits ;
}
number ;
We can load number with an integer
scanf(
"%u", &number.num );
Since the integer and bit--field elements
of the union share the same storage if we now access the union as the
bit--field variable bits we can
interpret the binary representation of num directly.
i.e. if
( uvar.bits.bit15 )
putchar(
'1' ) ;
else
putchar('0')
;
...
if
( uvar.bits.bit0 )
putchar(
'1' ) ;
else
putchar('0')
;
Admittedly rather inefficient and inelegant
but effective.
Enumerations
An enumeration is a user defined data type
whose values consist of a set of named integer constants, and are used for the
sole purpose of making program code more readable.
Syntax: enum tag { value_list } [ enum_var ]
;
where tag is the name of the enumeration
type, value_list is a list of valid values for the enumeration, and where
enum_var is an actual variable of this type.
For Example :-
enum
colours { red, green, blue, orange } shade ;
// values red - 0, green - 1, blue - 2, orange -
3
enum
day { sun = 1, mon, tue, wed = 21, thur, fri, sat } ;
enum
day weekday ;
// values are 1, 2, 3, 21, 22, 23, 24
Variables declared as enumerated types are
treated exactly as normal variables in use and are converted to integers in any
expressions in which they are used.
For Example :-
int
i ;
shade
= red ; // assign a value to shade enum
variable
i
= shade ; // assign value of enum
to an int
shade
= 3 ; // assign valid int to an enum, treat with care
The typedef Keyword
C makes use of the typedef keyword to allow new
data type names to be
defined. No new type is created, an existing type will now simply be recognised
by another name as well. The existing type can be one of the in-built types or
a user-defined type.
Syntax : typedef type name ;
where type is any C data type and name is
the new name for this type.
For Example :-
typedef int INTEGER ;
INTEGER
i ; // can now declare a variable of type
‘INTEGER’
typedef
double * double_ptr ;
double_ptr ptr ; // no need of * here as it is part of the type
typedef
struct coords {
int
x, y ;
}
xycoord ; // xycoord is now a type name in C
xycoord coord_var ;
The use of typedef makes program code easier to read and when
used intelligently can facilitate the porting of code to a different platform
and the modification of code. For example in a first attempt at a particular
program we might decide that floating point variables will fill our needs. At a
later date we decide that all floating point variables really should be of type
double so we have to change them all. This problem is trivial if we had used a
typedef as follows :-
typedef
float FLOATING ;
To remedy the situation we modify the user
defined type as follows
typedef
double FLOATING ;
Linked Lists
When we have needed to represent
collections of data up until now we have typically used data structures such as
arrays whose dimensions were fixed at compile time or which were dynamically
allocated as required. These arrays could have been arrays of basic data types,
e.g. doubles, or could have been arrays of structures when more complex data
needed to be represented.
However this method of representing data,
while perfectly adequate in many situations, is limited and can be very
inefficient when we are dealing with situations where the data collection has
to be modified at run-time.
For example if we are dealing with an
ordered list of records and we need to insert a new record, for John say, in
the correct position in the list we might have to go through the following
steps.
1. Expand the memory space allocated to the array to take one more
record.
Alan
|
Ben
|
Sam
|
Tom
|
|
2. Determine the
correct position in the array at which to insert the element – position 3.
3. Reposition
the data so that the appropriate position is left unused.
Alan
|
Ben
|
|
Sam
|
Tom
|
4. Insert the
new element into the list.
Alan
|
Ben
|
John
|
Sam
|
Tom
|
Whatever our internal representation of the
data, steps 1,2 & 4 will have to be carried out in one form or another.
However step 3 can be avoided if we represent the data in an alternate fashion.
If we can represent our data as a chain of
inter-connected records as illustrated below where each record has its own
memory space and knows which record comes before and after it we have a much
more flexible structure.
Now when we want to insert
a new record in position we still must allocate the appropriate amount of
memory to store it (note in this case it is a discrete block) and we still must
figure out which position it belongs in but we no longer have to manipulate the
position of the other records in memory. These will now stay exactly where they
are but we will have to inform the records before and after the position at
which we wish to insert that they will be pointing to a different record.
Using the normal
terminology we say we have a list of nodes linked together by pointer links. To
insert a new item into the list we break an existing link and create two new
links incorporating the new node into the list.
Nodes or Self-Referential Structures
A structure that contains a
pointer member that points to a structure of the same type as itself is said to
be a self-referential structure. For example :-
struct node {
char name[20] ;
struct node *nextnode ;
};
This defines a new type, struct node, which has a pointer
member, nextnode, which points to a
structure of the same type as itself. This node pointer allows this current
node to be linked to another and so a list of linked nodes can be built up.
Note that even though the
structure is not fully defined when the nextnode
field is specified the compiler does not have a problem as all it needs to
know is that there is such a type.
The above definition of
struct node allows us to build a singly linked list i.e. each node only knows
where the node following it is located. A null pointer is used to indicate the
end of a linked list structure.
The diagram below
illustrates how this node structure would be used to represent our linked list
above. Each node contains two fields the actual data and a pointer link to the
next node. The final node in the list contains a null pointer link as indicated
by the slash in the diagram.
Connecting the Nodes
The basic building block in
a linked list is just a C structure so we can initialise it in the same way as
any other structure. For example in our above situation to allocate and
initialise the first node we might do the following :-
struct node *node_1 ;
node_1 = ( struct node *)malloc( sizeof (
struct node ) ) ;
strcpy( node_1->name, “Alan” ) ;
node_1->nextnode = NULL ;
This block of code
initialises this node only and because the nextnode
pointer is assigned a null pointer value it designates it as the final (and
only) node in the list.
If we want to add on an
item to the list we can do the following :-
struct node *node_2 ;
node_2 = ( struct node *)malloc( sizeof ( struct node ) )
;
strcpy( node_2->name, “Ben” ) ;
node_2->nextnode = NULL ;
node_1->nextnode = node_2 ;
Note that it is only the
last line of code that adds the item onto the list. The remaining code just
allocates storage and initialises a new node as before.
We now have a linked list
with two nodes. The node pointer node_1
points to the first element in the list which is normally called the list head. As all the nodes in the list are
connected via their link pointer this is in fact a pointer to the complete
linked list so we wouldn’t need to store node_2, etc. at all. The last node has
a null pointer which identifies it as the last node in the linked list and is
called the list tail.
Operations on Linked Lists
When working with linked
lists we will need to perform a number of operations quite often so it makes
sense to build up a library of these at the outset. We will need to be able to
insert items into the list, remove items from the list, traverse the list, etc.
We will continue to use our
definition of a node except that we will typedef it as follows :-
typedef struct node {
char name[20] ;
struct node *nextnode ;
} list_node ;
Note in this case we cannot
use the typedef name to declare nextnode as this name is
not in existence at that point.
To declare the linked list
we simply declare a pointer to one of these structure types.
list_node *ptr_head = NULL ;
Traversing a List
In many list operations we
will need to process the information in each node, for example to print the
complete information in the list; this is termed traversing a list.
An iterative version of a print_list function might be as follows
:-
void print_list( list_node
*p_head )
{
list_node *node_ptr ;
for ( node_ptr = p_head ;
node_ptr != NULL ; node_ptr = node_ptr->nextnode )
puts( node_ptr->name ) ;
}
whereas a recursive version
might be implemented as follows :-
void print_list( list_node
*p_head )
{
if ( p_head == NULL )
puts(“\n” );
else
{
puts( p_head->name ) ;
print_list( p_head->nextnode ) ;
}
}
Inserting a Node
There are a variety of ways
this function can be implemented - inserting a node in the correct order, at
the head, at the tail, by position, etc. For our example we will write a
function to insert a new node in the correct order assuming the node has been
fully created before being passed to us.
void insert( list_node
**p_head, list_node *new )
{
list_node *node_ptr =
*p_head ;
list_node *prev = NULL ;
while ( node_ptr != NULL && (strcmp( new->name,
node_ptr->name ) > 0 ) )
{
prev = node_ptr ;
node_ptr = node_ptr->nextnode ;
}
if ( prev == NULL ) // head position
{
new->nextnode = *p_head ;
*p_head = new ;
}
else
{
prev->nextnode = new ;
new->nextnode = node_ptr ;
}
}
Removing a Node
We will implement this as a
function which finds the node by the name field, unlinks it from the list and
returns a pointer to the removed node without freeing its associated memory.
list_node *delete(
list_node **p_head, char *name )
{
list_node *prev = NULL,
*curr, *node_ptr = *p_head ;
while ( node_ptr != NULL && strcmp( name, node_ptr->name )
)
{
prev = node_ptr ;
node_ptr = node_ptr->nextnode ;
}
if ( prev == NULL ) // removing head position
{
*p_head = (*p_head)->nextnode ;
return node_ptr ;
}
else {
prev->nextnode = node_ptr->nextnode ;
return node_ptr ;
}
}
Note the use of double
indirection to type list_node in each
of the above functions. Can you explain why this is necessary ?
The following main function
is a trivial illustration of how to use the above functions in practice.
void
main()
{
list_node
*ptr_head = NULL ;
struct
node *node_1, *node_2 ;
node_1
= ( struct node *)malloc( sizeof ( struct node ) ) ;
strcpy(
node_1->name, "Alan" ) ;
insert(
&ptr_head, node_1 ) ;
node_2
= ( struct node *)malloc( sizeof ( struct node ) ) ;
strcpy(
node_2->name, "Ben" ) ;
insert(
&ptr_head, node_2) ;
puts(
“List with Alan & Ben”) ;
print_list(
ptr_head ) ;
delete(
&ptr_head, "Ben" ) ;
puts(
“List with Alan”) ;
print_list(
ptr_head ) ;
}
What we have discussed in
the example above is one of the more general variations of a linked list. Many
different implementations can be implemented with slightly different
functionality - for example we can implement doubly linked lists, stacks (LIFO)
or queues (FIFO) using the same basic building blocks.
7.7 Efficiency Considerations
As described previously pointers should be used to implement
call-by-reference in passing large user defined data items to functions to
reduce the copying overhead involved in call-by-value. If implementing a
typical call-by-value situation use of the const keyword can protect us from
inadvertent modification of the referenced argument.
Care should be taken not to go overboard in defining large
complicated data structures without need and to use them appropriately.
For example when dealing with lists of data of any type a normal
array may sometimes be a better choice as a data structure rather than a linked
list. The big advantage a linked list has over a linear array is that you can
insert or delete items at will without shifting the remaining data.
However the extra overhead involved in setting up the linked list,
both the calls to malloc() to allocate each new element and the ‘extra’ pointer
associated with each element, may not justify its selection.
Suppose we need to keep a list of integers. Then most of the storage
is taken up with the extra pointer information used solely for maintaining the
list ( the pointer to the next element and the pointer to each integer ).
Exercises
1. Write a program which will
simulate the action of a digital clock
continuously as closely as possible. Use a structure to hold the values of
hours, minutes and seconds required. Your program should include a function to
display the current time and to update the time appropriately using a delay
function/loop to simulate real time reasonably well. Pointers to the clock structure should be
passed to the display and update functions rather than using global variables.
2. Complex numbers are not
supported as a distinct type in C but are usually implemented by individual
users as user defined structures with a set of functions representing the
operations possible on them. Define a data type for complex values and provide
functions to support addition, subtraction, multiplication and division of
these numbers. Use call by reference to make these functions as efficient as
possible and try to make their use as intuitive as you can.
3. Write a program which will
(a) set up an
array of structures which will hold the names, dates of birth, and addresses
of a number of employees.
(b)
assign values to the structure from the keyboard.
(c) sort the
structures into alphabetical order by name using a simple bubble sort
algorithm.
(d) print out the contents of a specific array
element on demand.
4. Use a combination of
bit--fields and unions in C to print out the bit patterns of any unsigned
integer values input at the keyboard.
5. BCD (Binary Coded
Decimal) codes are commonly used in
industrial applications to represent
decimal numbers. Here four bits are used to
represent each decimal digit , e.g.
0000 ==>
0
0001 ==>
1
...
1001 ==>
9
Thus short int can be used to represent a
four digit decimal number.
Use Bit--Fields to implement conversion
routines to encode a decimal number into BCD format and back again.
6. Redo problem 1 using a
bit--field structure as an illustration of how memory requirements may
be reduced. Compare the old and new memory
requirements for data storage.
7. Using a doubly linked list data structure write a program that
maintains an alphabetically ordered list of student registration records. The
information held in each record should include the student’s name, student
identity number, address, course, etc. Your program should allow the user to
add on records to the list, to remove records from the list, to amend records
and to produce a screen dump of all records. Use distinct functions for all
major operations if possible.
8.
Use a FIFO implementation
of a linked list to represent a queue of people entered by the user.