Inside The System.IO Namespace Part 2

October 19, 2010

Part 1
Part 2
Part 3

BufferedStream

This is intended to improve performance of (file) read/write operations by storing the bytes in memory as a cache. The BufferedStream is an example of the Decorator pattern. You wrap a stream inside a BufferedStream in order to benefit from its functionality.

No new methods or properties can be found in BuffereredStream, however it overrides the Stream class’ methods/properties to implement its own cache.

Whilst it’s recommended you use a BufferedStream for large files on disk, you can also just set a large buffer on your FileStream and the same benefit will be had. To quote a Microsoft developer who worked on the System.IO namespace:

“..there is zero benefit from wrapping a BufferedStream around a FileStream. We copied BufferedStream’s buffering logic into FileStream about 4 years ago to encourage better default performance”

When I was writing the file reading logic in Statmagic (an open source project for parsing web log files), my strategy was to use a large buffer and skip using the BufferedStream. I assumed that my application would be running on a server with 4gb+ of SDRAM, or a desktop machine where the size of RAM far exceeds the size of the log file. Of course running it on a mobile device would require a different approach. Many websites have 100mb or GB log files per day but from tests I ran, it tears through even 2gb log files. The default size of the buffer in Statmagic is 16mb, which I got after some experimentation and reading of this discussion. In tests, this runs fine in both the single threaded reads and the multi-threaded reads where more than 1 log file is read at the same time. The single-threaded reads run slightly faster on a SATA (raid0’d) hard drive, though I haven’t tried it on a server setup yet.

FileStream

As the name implies, this is for reading and writing files. Some of the FileStream’s functionality is available via the File class (which just uses a FileStream under the hood).

Stream stream;
stream = new FileStream("file.txt", FileMode.Open);
stream = File.Open("file.txt", FileMode.Open); // this infact simply uses the line above with either FileAccess.Write or FileAccess.ReadWrite,
stream = File.OpenRead("file.txt"); // equivalent to new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);

// Writing to a FileStream, or reading is dependent on the file type being read.
// Basic text files can be read a number of ways, possibly the easiest is
using ( StreamReader sr = new StreamReader( File.OpenRead() )
{
	string contents = sr.ReadToEnd();
}

NetworkStream

NetworkStream is used for reading binary from a socket typically via UDP or TCP. I spent some of my spare time many years ago writing C# libraries to read UDP packets from game servers for Quake 3, Half Life, Unreal. The main project went the way of a lot OS projects and remained unfinished, however the main bulk of the UDP reading logic was complete. Unfortunately there is no NetworkStream implementation with the UdpClient class that I used, instead you are fed the packet data as a byte array.

The code below is an example of doing a HTTP GET with a NetworkStream. There is of course easier ways of doing this with the WebClient class, but this demonstrates how you might use the NetworkStream class. Attempting to use the Random access methods such as Seek() with the NetworkStream class will throw a NotImplemented exception, as I mentioned above you never have the whole data to work with, so Seek’ing makes no sense.

	string host = "google.com";
	string page = "/search?q=networkstream";

	// This is the simplest way of getting a NetworkStream back. It can also be
	// done at a lower level using your own Socket class and the options you
	// require.
	TcpClient tcp = new TcpClient();
	tcp.Connect(host, 80);
	NetworkStream stream = tcp.GetStream();

	// Send a HTTP request
	byte[] data = Encoding.Unicode.GetBytes(string.Format("GET {0} HTTP/1.0{1}{1}", page, "\r\n"));
	stream.Write(data, 0, data.Length);

	// Get the returned data in 1k chunks.
	byte[] buffer = new byte[1024];
	int i = stream.Read(buffer, 0, 1024);
	StringBuilder builder = new StringBuilder();
	while (i > 0)
	{
	builder.Append(Encoding.Unicode.GetString(buffer));
	i = stream.Read(buffer, 0, 1024);
	}

	// Write out to a file. NB the server header will be contained in this file too.
	using (StreamWriter writer = new StreamWriter(@"c:\out.html"))
	writer.Write(builder.ToString());

view raw gistfile1.cs hosted with ❤ by GitHub

MemoryStream

The MemoryStream class only adds one new method, WriteTo() which copies the contents of the stream to a new stream and a Capacity property which is the size of the stream in memory.

MemoryStream always deals with a byte array, which means if you want to manipulate string data you’ll be working with the Encoding class (or possibly the Convert class too). Once you’ve create a MemoryStream you can’t change its capacity. One gotcha with the class is the Write() method.

Write(byte[] buffer,offset,length);

The offset parameter is actually the offset you want it to start from in your byte array, not the offset in the Stream.

An aside about Encodings and Unicode in .NET
One thing that can trip you up when reading character streams in .NET is using the wrong encoding to read byte representations of text. This really only happens if you are on a western computer using the default encoding or ascii. Below is some example code, some characters might appear as ‘?’ in your browser, use a Unicode text editor like Metapad to view the code in or the solution file.

	// Ascii/Latin reference:
	// 97 - a
	// 98 - b
	// 100 - c
	// 101 - d
	// 63 - ?
	// ë - 235
	// '?' is from Gujarati (Gujarati in Unicode is from 0A80 - 0AFF). The symbol is U+0A86.
	// 'ë' is from Windows 1252 which is a subset of ISO 8859-2 Latin Alphabet 2 (extended ascii to include accented symbols,
	// special letters and symbols ). Windows 1252 is a legacy codepage used by previous versions of Windows and isn't a Unicode standard.
	// However, it is implemented in Windows NT as 1 byte Unicode.
	// 'ë' is Symbol #235.
	string s = "abcde?ë";

	// This will try to translate the Gujarati symbol to ascii, fail and converts
	// it to ascii character 63 (a question mark symbol '?').
	// It will do the same with 'ë' as this isn't an ascii character.
	//
	// Ascii is 8 bit/1 byte per character.
	b = Encoding.ASCII.GetBytes(s);

	// Windows 1252 is Encoding.Default for most Western users. So this will sucessfully get the right
	// number (235) for 'ë', however the Gujarati symbol will still fail.
	//
	// Windows 1252 is 8 bit/1 byte per character.
	b = Encoding.Default.GetBytes(s);

	// UTF8 is the commonest Unicode encoding, and can encode any Unicode characters.
	// It uses 1 byte to represent ascii characters, but up to 4 for others.
	b = Encoding.UTF8.GetBytes(s); //

	// Now it's 16 bit/2 bytes per char and will pickup the Gujarati symbol correctly.
	// The byte array will now be 14 in length. This is because every character has a numerical
	// 2 byte value (16 bit, 65536 possibilities). Index 10+11 in the array hold the value of 0A86
	// [10] 134 (86).
	// [11] 10 (0A)
	// (Little Endian format so LSB first).
	//
	// UTF16 can go up to 4 bytes per character for symbols higher than U+FFFF.
	b = Encoding.Unicode.GetBytes(s);

	// UTF32 translate it to 4 bytes per character, so the array is 32 bytes now.
	b = Encoding.UTF32.GetBytes(s);

view raw gistfile1.cs hosted with ❤ by GitHub

GZipStream/DeflateStream

These were added in .NET 2 to the new System.IO.Compression namespace to provide compression and decompression, in particular with ZIP files. There are no helper readers or writers for the 2 classes, so common tasks like zipping a folder are quite cumbersome. The examples below don’t stray much from the MSDN documentation, I’ve chunked the functionality to make it a bit clearer and concise.

NB The GZipStream doesn’t support adding files to an archive, as MSDN states:

“…however, this class does not inherently provide functionality for adding files to or extracting files from .zip archives”

The GZipStream is purely for compressing a stream of bytes, it’s not intended to act as a zipping library like the SharpZipLib.

Writing

	byte[] buffer;
	using (FileStream fileStream = new FileStream(@"c:\out.html", FileMode.Open))
	{
	// Read the file's contents into a byte array
	buffer = new byte[fileStream.Length];
	fileStream.Read(buffer, 0, buffer.Length);
	}

	using (MemoryStream memoryStream = new MemoryStream())
	{
	// The GZipStream uses the MemoryStream to write its compressed version
	// of the byte array.
	using (GZipStream gzipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
	{
	gzipStream.
	gzipStream.Write(buffer, 0, buffer.Length);
	}
	// Write back to a file.
	using (FileStream fileStreamOut = new FileStream(@"c:\out.zip", FileMode.Create))
	memoryStream.WriteTo(fileStreamOut);
	}

view raw gistfile1.cs hosted with ❤ by GitHub

.net

I'm Chris Small, a software engineer working in London. This is my tech blog. Find out more about me via Github, Stackoverflow, Resume