Recall that there is a difference between the ASCII character '1' and the internal binary representation of the number 1 as, say, a C++ or Java int. The former is represented in eight bits as 00110001; the latter would be represented in 32-bits as 00000000000000000000000000000001.

Similarly, files may consist of sequences of ASCII characters or sequences of internal binary representations of data of various types (e.g., intermixed sequences of int, float, double, etc.). A file containing the three ASCII characters "123" would be three bytes long:

001100010011001000110011

A binary file containing the three integers 1, 2, and 3 would be 12 bytes long:

00000000000000000000000000000001 00000000000000000000000000000010 00000000000000000000000000000011

A binary file containing the single integer 123 would be 4 bytes long:

00000000000000000000000001111011

(For a variety of reasons, you should not try to conclude anything from these examples in terms of typical relative sizes of files written in these two formats. The main point is simply to illustrate that different bit encodings are used for ASCII as opposed to binary file representations.)

Both C++ and Java have facilities for reading and writing both ASCII and binary files. We will consider only C++ facilities for reading and writing ASCII files in these web pages.

ASCII File I/O

  1. External files: The Basics
    1. Header files
      #include <iostream>contains classes istream and ostream as well as objects cin, cout, and cerr (all in namespace std)
      #include <fstream>contains classes ifstream and ofstream (in namespace std)
    2. The streams cin, cout, and cerr are always open and available. External files can be opened in one of two ways. Either:
      std::ifstream inp("someFile.txt");
      std::ofstream myoutputfile("someOtherFile.txt");
      or equivalently:
      std::ifstream inp;
      inp.open("someFile.txt");
      std::ofstream myoutputfile;
      myoutputfile.open("someOtherFile.txt");
    3. Reading from and writing to ASCII file streams is the same as reading from std::cin or writing to std::cout. For example, you can simply use the >> and << operators. You can also use the get and put methods.
    4. When you are done reading from or writing to an external file, you should always close the file. That will allow the file to be reopened later in your program. Also – in the case of output files – it guarantees that everything you wrote actually gets flushed from internal buffers and written to disk:
      inp.close();
      myoutputfile.close();
  2. Relationships among istream-ifstream; ostream-ofstream
    The fact that the syntax of input operations is the same regardless of whether the file variable is of type istream (e.g., std::cin) or ifstream is due to the fact that an ifstream is a special type of istream. Similarly, ofstream is a special type of ostream. In object-oriented terms, we say ifstream is a subclass (or derived class) of istream. Similarly, ofstream is a subclass (or derived class) of ostream. Any operation that can be performed on an istream (ostream) can also be applied to an ifstream (ofstream). Moreover, additional operations that are specific to the fact that the subclasses are associated with external files can be used. We have already seen explicit open and close methods, for example.
  3. Two general reading strategies
    When reading ASCII files, one of two general approaches is typically used, depending on the file structure and/or the application requirements:

    What is whitespace?

    Definition: Whitespace is a contiguous sequence of one or more characters, each character of which is a blank, tab, carriage return, or line feed (or perhaps other non-printable ASCII characters).

    The input operator (>>) skips over all whitespace before trying to read and parse characters from the input file to be placed into the operand on the right-hand side of the ">>". Once it starts to parse characters from the input stream to create a value for the variable on the right-hand side of the ">>", it stops parsing when it encounters either another whitespace character, or a character that cannot be a part of the variable being read. (For example, if it is trying to read an integer, and the sequence of characters 127GHI is encountered on the input stream, the integer will be set to 127, and the next byte to be read will be the 'G'.)

  4. The input operator (>>) returns a Boolean value
    An expression of the form "inp >> x" attempts to read "x" from the input stream "inp", and it yields a Boolean value that is true if the input operation succeeded, or false if either (i) EOF is encountered before the input is complete, or (ii) a parsing error (or some other type of file error) occurred. If an error, the input stream is placed into an error state, and future input operations on that stream will also fail. However, this error state can be cleared as illustrated in the following example:
    bool readAndReturnAnInt(std::istream& inp, int& theInt)
    {
        while (true)
        {
            if (inp >> theInt)
                return true;
            if (inp.eof()) // did it fail because EOF?
            {
                std::cerr << "Unexpected EOF\n";
                return false;
            }
            // Must be some sort of parsing error:
            inp.clear(); // clears the error state, re-enabling the input stream
            char badChar = inp.get(); // skips over the offending character
            std::cerr << "An integer cannot start with the character '"
                      << badChar << "'. Try again: ";
        }
    }
    

    NOTE: Any file stream passed as a parameter must be passed by reference as illustrated above. Moreover, unless the function/method being called needs to use methods unique to ifstream or ofstream, it is most common to declare the parameter using the base class type (istream or ostream) as shown in the example above so that it can be called in either of the two ways shown next:

    void someCaller()
    {
        int myVal;
        if (readAndReturnAnInt(std::cin, myVal))
            std::cout << "Read " << myVal << " from the console.\n";
            
        ifstream myInputFile("InputFile.txt");
        if (readAndReturnAnInt(myInputFile, myVal))
            std::cout << "Read " << myVal << " from InputFile.txt.\n";
        …
    }
  5. Reading to EOF
    Reading the entirety of a file whose length is unknown beforehand can be trickier than you might think. If you don't understand the issues, your implementations can also exhibit some platform dependencies caused by end-of-file (EOF) flags being set at slightly different times by different runtime system implementations. The table below shows two reliable and completely platform-independent program structures for reading a file to its EOF. Which of the two approaches you use depends on whether you are reading the file one byte at a time or one whitespace-delimited object at a time.

    (You may encounter a situation in which you need to switch back and forth between the "read one byte at a time" and "read one whitespace-delimited object at a time" modes. Once you fully understand the two approaches, you will be able to craft an appropriate hybrid file reading scheme – including reliably reading the file to EOF – should the need arise.)

    One byte (char) at a timeOne whitespace-delimited object at a time
    #include <iostream>
    #include <fstream>
    
    …
    
    std::ifstream inp("sample.txt");
    char ch = inp.get();
    while (!inp.eof()) // while the LAST 'get' into 'ch' succeeded
    {
        … use ch as desired …
        ch = inp.get();
    }
    #include <iostream>
    #include <fstream>
    
    …
    
    std::ifstream inp("sample.txt");
    some-type thing;
    while (inp >> thing) // implicitly assume no parse errors
    {
        … use thing as desired …
    }
    Every character (including whitespace like spaces and new lines) is received. All white space is discarded. The loop is insensitive to line breaks, for example.

    Study this example.

    Depending on the contents of the file and the task at hand, some-type can be any data type for which the >> operator is defined (e.g., double, std::string, int, etc.).

    Note that some-type can also be a character array, but you must ensure that the array is sufficiently large to hold the next whitespace-delimited set of characters along with the '\0' byte that will be automatically appended. If not, the input operation will write beyond the bounds of the array, causing problems like those mentioned in the Arrays section of these notes. Consider:

    char thing[10];
    while (inp >> thing)
        std::cout << thing << '\n';

    Everything will be fine as long as each symbol read is no more than 9 characters long. You would be in trouble, however, if one of the strings read was "crystallography"! Or even a ten-character word like "accidental".

    An advantage to using "std::string thing" instead of "char thing[10]" is that, when declared as a std::string, "thing" will automatically grow as needed to accommodate strings of whatever size are read.