Recall that there is a difference between the ASCII character '1' and the internal binary representation of the number 1 as, say, a C++ or Java int. The former is represented in eight bits as 00110001; the latter would be represented in 32-bits as 00000000000000000000000000000001.
Similarly, files may consist of sequences of ASCII characters or sequences of internal binary representations of data of various types (e.g., intermixed sequences of int, float, double, etc.). A file containing the three ASCII characters "123" would be three bytes long:
001100010011001000110011
A binary file containing the three integers 1, 2, and 3 would be 12 bytes long:
00000000000000000000000000000001 00000000000000000000000000000010 00000000000000000000000000000011
A binary file containing the single integer 123 would be 4 bytes long:
00000000000000000000000001111011
(For a variety of reasons, you should not try to conclude anything from these examples in terms of typical relative sizes of files written in these two formats. The main point is simply to illustrate that different bit encodings are used for ASCII as opposed to binary file representations.)
Both C++ and Java have facilities for reading and writing both ASCII and binary files. We will consider only C++ facilities for reading and writing ASCII files in these web pages.
#include <iostream> | contains classes istream and ostream as well as objects cin, cout, and cerr (all in namespace std) |
#include <fstream> | contains classes ifstream and ofstream (in namespace std) |
std::ifstream inp("someFile.txt"); std::ofstream myoutputfile("someOtherFile.txt");or equivalently:
std::ifstream inp; inp.open("someFile.txt"); std::ofstream myoutputfile; myoutputfile.open("someOtherFile.txt");
inp.close(); myoutputfile.close();
What is whitespace?
Definition: Whitespace is a contiguous sequence of one or more characters, each character of which is a blank, tab, carriage return, or line feed (or perhaps other non-printable ASCII characters). |
The input operator (>>
) skips over all whitespace before trying to read
and parse characters from the input file to be placed into the operand on the right-hand
side of the ">>
". Once it starts to parse characters from the input stream
to create a value for the variable on the right-hand side of the ">>
",
it stops parsing when it encounters either another whitespace character, or a character
that cannot be a part of the variable being read. (For example, if it is trying to read an
integer, and the sequence of characters 127GHI is encountered on the input stream,
the integer will be set to 127, and the next byte to be read will be the 'G'.)
>>
) returns a Boolean
valueinp >> x
" attempts to read "x
"
from the input
stream "inp
", and it yields a Boolean value that is true
if the
input operation
succeeded, or false
if either (i) EOF
is encountered before the input is complete, or (ii) a parsing error (or some other
type of file error) occurred.
If an error, the
input stream is placed into an error state, and future input operations on
that stream will also fail.
However, this error state
can be cleared as illustrated in the following example:
bool readAndReturnAnInt(std::istream& inp, int& theInt) { while (true) { if (inp >> theInt) return true; if (inp.eof()) // did it fail because EOF? { std::cerr << "Unexpected EOF\n"; return false; } // Must be some sort of parsing error: inp.clear(); // clears the error state, re-enabling the input stream char badChar = inp.get(); // skips over the offending character std::cerr << "An integer cannot start with the character '" << badChar << "'. Try again: "; } } |
NOTE: Any file stream passed as a parameter must be passed by reference as illustrated above. Moreover, unless the function/method being called needs to use methods unique to ifstream or ofstream, it is most common to declare the parameter using the base class type (istream or ostream) as shown in the example above so that it can be called in either of the two ways shown next:
void someCaller() { int myVal; if (readAndReturnAnInt(std::cin, myVal)) std::cout << "Read " << myVal << " from the console.\n"; ifstream myInputFile("InputFile.txt"); if (readAndReturnAnInt(myInputFile, myVal)) std::cout << "Read " << myVal << " from InputFile.txt.\n"; … }
(You may encounter a situation in which you need to switch back and forth between the "read one byte at a time" and "read one whitespace-delimited object at a time" modes. Once you fully understand the two approaches, you will be able to craft an appropriate hybrid file reading scheme – including reliably reading the file to EOF – should the need arise.)
One byte (char) at a time | One whitespace-delimited object at a time |
#include <iostream> #include <fstream> … std::ifstream inp("sample.txt"); char ch = inp.get(); while (!inp.eof()) // while the LAST 'get' into 'ch' succeeded { … use ch as desired … ch = inp.get(); } |
#include <iostream> #include <fstream> … std::ifstream inp("sample.txt"); some-type thing; while (inp >> thing) // implicitly assume no parse errors { … use thing as desired … } |
Every character (including whitespace like spaces and new lines) is received. | All white space is discarded. The loop is insensitive to line breaks, for example. |
Depending on the contents of the file and the task at hand, some-type can be any data type
for which the >>
operator is defined (e.g., double
,
std::string
, int
, etc.).
Note that some-type can also be a character array, but you must ensure that
the array is sufficiently large to hold the next whitespace-delimited set of characters
along with the '\0'
byte that will be automatically appended.
If not, the input operation will write beyond the bounds of
the array, causing problems like those mentioned in the Arrays section of these
notes. Consider:
char thing[10]; while (inp >> thing) std::cout << thing << '\n';
Everything will be fine as long as each symbol read is no more than 9 characters long. You would
be in trouble, however, if one of the strings read was "crystallography
"! Or even
a ten-character word like "accidental
".
An advantage to using "std::string thing" instead of "char thing[10]" is that, when declared as a std::string, "thing" will automatically grow as needed to accommodate strings of whatever size are read.