Character Strings

There are two important string-related concepts you must come to fully understand:

the differences between the Java String class and the C++ std::string class;
the differences between the C++ std::string class and C++ "character arrays". You will undoubtedly require – and hence need to understand fully – both C++ std::string instances as well as C++ character arrays. You need to understand when both are typically employed, how usage of the two differ, and how to convert from one to the other.

Java String class versus C++ std::string class
The two are very similar in many respects (although note the different capitalization in their names). Some of the main differences:

the two have different public method prototypes, although they have largely the same functionality.

instances can be compared using == in C++, whereas equals must be used in Java

Java

C++

void compare(String s1, String s2)
{
    if (s1.equals(s2))
    	System.out.println("s1 and s2 are the same");
    else
    	System.out.println("s1 and s2 are different");
}

#include <string>

void compare(std::string s1, std::string s2)
{
    if (s1 == s2)
        std::cout << "s1 and s2 are the same\n";
    else
    	std::cout << "s1 and s2 are different\n";
}

Java String objects are immutable whereas C++ std::string objects can be modified. This difference is often not noticeable. Here is one example where it is noticeable:

Java

C++

public class StringTest
{
    // Since String is a class in Java, "str" will be passed
    // by reference, just as it is in the C++ version.
    private static void addToString(String str)
    {
        str += "def";
        System.out.println("In addToString, str = " + str);
    }

    public static void main(String[] args)
    {
        String myString = "abc";
        System.out.println(
            "In main before call to addString, myString = "
            + myString);
        addToString(myString);
        System.out.println(
            "In main after call to addString, myString = "
            + myString);
    }
}

#include <iostream>
#include <string>

void addToString(std::string& str)
{
    str += "def";
    std::cout << "In addToString, str = "
              << str << '\n';
}
	
int main()
{
    std::string myString = "abc";
    std::cout <<
        "In main before call to addString, myString = "
        <<  myString << '\n';
    addToString(myString);
    std::cout <<
        "In main after call to addString, myString = "
        << myString << '\n';
    return 0;
}

In main before call to addString, myString = abc
In addToString, str = abcdef
In main after call to addString, myString = abc

In main before call to addString, myString = abc
In addToString, str = abcdef
In main after call to addString, myString = abcdef

C++ std::string class versus C++ character arrays (char*)

Note

Note
The declaration of "`msg`" on the left is a syntactical shortcut for creating the character array: const char msg[ ] = { 'a', 'b', 'c', '\0' }; The special character `'\0'` (a zero byte) is a sentinel character marking the end of the string. This syntactical shortcut is a remnant from the old C days (and old C++ days before the `std::string` class was added) when this was the only way character string data could be stored and manipulated in a C/C++ program. The reason it should be declared as "`const`" (in the declaration on the left) is that character string literals are created as "`const char[]`" instances by most compilers these days. As we saw in item #1 above, such literals can be used to initialize `std::string` instances, however trying to use one to initialize a `char` pointer as: char msg = "abc"; will result in most compilers issuing a warning message. Some will flag it as an error.

The declaration of "msg" on the left is a syntactical shortcut for creating the character array:

const char msg[ ] = { 'a', 'b', 'c', '\0' };

The special character '\0' (a zero byte) is a sentinel character marking the end of the string.

This syntactical shortcut is a remnant from the old C days (and old C++ days before the std::string class was added) when this was the only way character string data could be stored and manipulated in a C/C++ program.

The reason it should be declared as "const" (in the declaration on the left) is that character string literals are created as "const char[]" instances by most compilers these days. As we saw in item #1 above, such literals can be used to initialize std::string instances, however trying to use one to initialize a char* pointer as:

char* msg = "abc";

will result in most compilers issuing a warning message. Some will flag it as an error.

In addition to the C++ class std::string, C++ can also hold character data in arrays whose base type is char. Such instances are first and foremost arrays, hence all the aspects of C++ arrays mentioned in the Arrays section apply. There are many ways to declare and use character arrays, but you must be careful that the actual array is sufficiently large for all operations you will perform.

As a very simple example, suppose I want to create a character array, msg, that holds "abc". I could do the following:

const char* msg = "abc"; // See "Note" on the right.

The fact that the compiler appends a sentinel zero byte to the array is important because I can output this string by executing:

std::cout << msg;

The obvious question is: "How can this work since msg is an array passed to a method of the std::cout class without also passing its length?" (Recall that C++ has no way of knowing the length of an array.) The answer is that functions that receive character arrays like this expect to see the end of the character string marked with the special '\0' sentinel. In this case, for example, when sending msg to std::cout, the implementation just starts writing one character after another to the screen until it encounters the '\0' character.

The output operator "<<" is not unique in operating this way. There are in fact several standard C/C++ functions that accept character arrays that are assumed to be terminated by this '\0' sentinel. They perform various operations like comparison, concatenation, determining the length of the string in the character array, etc. See Appendix H of Carrano for a listing of all such functions, some of which are declared when you #include <cstring>; others when you #include <cstdlib>

This use of character arrays has been available since the earliest versions of the C language (and hence the name of the header file: <cstring>). By contrast, the std::string class is a relative newcomer since the evolution of C++ from C. While std::string is generally much easier to work with, there are times when the character array must be used. One example that we have already seen is in the C++ main function as we saw in item #5 on the Similarities page. You will undoubtedly encounter many other examples when using various C-based APIs. There are a plethora of such toolkits including graphics interfaces, database systems, GPU programming interfaces, and many other very useful toolkits originally developed as C-based APIs.

Given a zero-byte terminated character array, you can create an equivalent std::string instance as follows:

void hereIsACharArray(char* msg)
{
    std::string msgAsString(msg);

    … work with msgAsString …
}

You can extract the character array from an std::string instance by using the c_str method:

void hereIsAString(std::string myString)
{
    const char* myStringAsCharArray = myString.c_str();

    … work with myStringAsCharArray …
}

You are not allowed to modify the string returned from c_str; notice that myStringAsCharArray is declared as const char*.