Skip to main content

Building C++ Code Manually

In this section, we have five objectives:

  1. How to compile a program into a shared object
  2. How to compile a program into an executable
  3. How to link a shared object to an executable
  4. How to run an executable which relies on a shared object
  5. How to tell the compiler to optimize code

An executable is a piece of code that can be run in a terminal to produce some sort of output. A shared object is code that an executable relies on, but in and of itself is not executable. Shared objects are the primary way that C++ enables software to be re-used across code bases and avoid code duplication.

Imagine you have two services:

  1. A movie database which tracks information such as movie description, user reviews, critic reviews, actors, etc.
  2. A grocery database which tracks information such as product name, product description, quantity available, price, etc.

Both services are databases. Databases provide a few general functions: creating records, reading records, updating records, and deleting records (CRUD). As opposed to developing entire database management systems for these two use cases, one could develop a single generic database technology, and then build the movie and grocery databases using that single technology.

In C++, this can be done using a shared library. In our example:

  1. src/database_lib.cc implements the Create, Read, Update, Delete (CRUD) operations.
  2. src/grocery_db.cc implements the grocery database using CRUD operations
  3. src/movies_db.cc implements the movie database using CRUD operations

Setup + Repo Structure

git clone https://github.com/grc-iit/grc-tutorial.git
cd grc-tutorial
export GRC_TUTORIAL=${PWD}
cd ${GRC_TUTORIAL}/cpp/02-cpp-build-manually

In this example, our repo is structured as follows:

  1. src: contains all source files
  2. include: contains all header files

This is the typical way a C++ repo is structured. This is because in many cases header files are meant to be public and available to other programs, whereas source files are almost always private. Having them in separate directories makes accomplishing these two objectives much easier.

Database Library Header

#ifndef DATABASE_LIB_H
#define DATABASE_LIB_H

#include <string>

class Database {
public:
std::string file_;

Database(const std::string &file) : file_(file) {}

void create();
void read();
void update();
void del();
};

#endif

The header file defines the Database class. The class contains the CRUD methods. The methods aren't implemented here in this case. Their implementation is in the source file.

Database Library Source

#include <iostream>
#include "database_lib.h"

void Database::create() {
std::cout << file_ << ": in create" << std::endl;
}

void Database::read() {
std::cout << file_ << ": in read" << std::endl;
}

void Database::update() {
std::cout << file_ << ": in update" << std::endl;
}

void Database::del() {
std::cout << file_ << ": in delete" << std::endl;
}

The source file implements the Database methods. In this case, all they do is make print statements for simplicity. We include "database_lib.h" in order to find the Database class we're implementing.

Grocery Database Source

#include "database_lib.h"

int main() {
Database db("grocery");
db.create();
db.read();
db.update();
db.del();
}

Here we create the grocery database by creating the Database class located in "database_lib.h".

Movies Database Source

#include "database_lib.h"

int main() {
Database db("movies");
db.create();
db.read();
db.update();
db.del();
}

Here we create the movies database by creating the Database class located in "database_lib.h".

C++ Compiler Pipeline

C/C++ compilers are divided into 4 phases:

  1. Preprocessing: This will locate and load #include files and make simple modifications to source code
  2. Compiling: Will turn the pre-processed source code into assembly code.
  3. Assembling: Will convert assembly code into machine code (i.e., object code).
  4. Linking: Will locate shared libraries and ensure that any missing symbols are resolved in the object code.

The Build Directory

First, we should make a build directory to store our files. This makes cleaning up intermediate files much easier.

mkdir build

Compile + Assemble database_lib.cc

We first try to compile the database into an object file below

g++ src/database_lib.cc -fpic -c -o build/database_lib.o

-fPIC stands for Force Poisition Independent Code. Whenever trying to build a shared object, this is necessary. This is because a shared library can be loaded at different locations in a program, so having the addresses in the code being fixed is problematic.

-c tells g++ to build the object file.

-o database_lib/database_lib.o sets the output of the compilation to be database_lib/database_lib.o.

You should receive the following error:

src/database_lib.cc:2:10: fatal error: database_lib.h: No such file or directory
2 | #include "database_lib.h"
|          ^~~~~~~~~~~~~~~~
compilation terminated.

This is because the compiler doesn't know to look in the include directory for the database_lib.h file. In order to force the compiler to search for this file there are two options.

Fix 1: The -I flag

g++ src/database_lib.cc -I${PWD}/include -fpic -c -o build/database_lib.o

-I${PWD}/include will ensure the compiler searches the include directory for headers

Fix 2: Environment Variables

INCLUDE=${PWD}/include \
CPATH=${PWD}/include \
g++ src/database_lib.cc -fpic -c -o build/database_lib.o

INCLUDE and CPATH are sometimes searched by the compiler for header files. This approach is also viable because you don't need to modify build scripts in order for it to work.

g++ -shared build/database_lib.o -o build/libdatabase_lib.so

This command will produce the shared library. Note, the general naming convention of a shared object is "libNAME.so". Many compilers expect your shared object to begin with the word "lib" and have an "so" extension.

Create the Executables

g++ src/grocery_db.cc -I${PWD}/include -ldatabase_lib -o grocery_db
g++ src/movies_db.cc -I${PWD}/include -ldatabase_lib -o movies_db

-ldatabase_lib tells the compiler to search for libdatabase_lib.so

You should receive the following error:

/usr/bin/ld: cannot find -ldatabase_lib
collect2: error: ld returned 1 exit status

This is because the compiler doesn't know where to search for libdatabase_lib.so. We have to tell it to search the build directory. There are two fixes.

Fix 1: The -L flag

g++ src/grocery_db.cc -I${PWD}/include -L${PWD}/build -ldatabase_lib -o build/grocery_db
g++ src/movies_db.cc -I${PWD}/include -L${PWD}/build -ldatabase_lib -o build/movies_db

-L${PWD}/build tells the compiler to search this directory for shared objects.

Fix 2: Environment Variables

export LIBRARY_PATH=${PWD}/build
g++ src/grocery_db.cc -I${PWD}/include -ldatabase_lib -o build/grocery_db
g++ src/movies_db.cc -I${PWD}/include -ldatabase_lib -o build/movies_db

Run the Executable

build/grocery_db
build/movies_db

You should see the following errror:

build/grocery_db: error while loading shared libraries: libdatabase_lib.so: cannot open shared object file: No such file or directory

This is because shared objects are loaded dynamically at runtime. Linking is only the last phase of compilation. There are two fixes to this problem:

Fix 1: Install the shared objects

The OS will search a number of paths by default when loading a program. Probably the most popular spots are /usr/lib and /usr/local/lib. You can copy-paste your shared object in this location to resolve the issue. This approach, however, requires root privileges.

sudo cp build/libdatabase_lib.so /usr/lib/libdatabase_lib.so
build/grocery_db
build/movies_db
sudo rm /usr/lib/libdatabase_lib.so

From grocery_db:

grocery: in create
grocery: in read
grocery: in update
grocery: in delete

From movies_db:

movies: in create
movies: in read
movies: in update
movies: in delete

Fix 2: Environment Variables

export LD_LIBRARY_PATH=${PWD}/build:${LD_LIBRARY_PATH}
build/grocery_db
build/movies_db

From grocery_db:

grocery: in create
grocery: in read
grocery: in update
grocery: in delete

From movies_db:

movies: in create
movies: in read
movies: in update
movies: in delete

Introduce Compiler Optimization

One of the things to consider when building code is how optimized you want the code to be. Compilers allow for varying levels of optimization. You may ask, why would I ever want unoptimized code? There are two main reasons:

  1. Some optimizations are dangerous and can break your program
  2. Debugging is much harder since code can get rearranged for optimization

We won't go into detail about every optimization compilers provide. We will only mention the typical optimizations used to make C++ code perform well. The optimizations described in this section are all safe and will not affect program correctness.

In GCC and Clang, compiler optimization levels are tuned using the "-O" parameter. There are four levels of optimization:

  1. No optimization
  2. Some optimization
  3. Moderate optimization
  4. Heavy optimization
# No optimization
g++ src/database_lib.cc -I${PWD}/include -O0 -fpic -c -o build/database_lib.o
# Some optimization
g++ src/database_lib.cc -I${PWD}/include -O1 -fpic -c -o build/database_lib.o
# Moderate optimization
g++ src/database_lib.cc -I${PWD}/include -O2 -fpic -c -o build/database_lib.o
# Heavy optimization
g++ src/database_lib.cc -I${PWD}/include -O3 -fpic -c -o build/database_lib.o