Hashing Data In Chunks Using .NET

I used to think that you could not use .NET managed classes to hash a large download as the file was being downloaded. I thought if I wanted to stay in a managed world, i.e. no Win API interop calls through P/Invoke etc., I would have to hash the whole file after it is downloaded. Yuk, especially in the case of load testing applications that only want to stress test a web server and make sure that what you download during the test is not corrupted. In my load testing app, I only wanted to hash a large file as it was being downloaded and compare the final hash against a known value. Fifty threads concurrently downloading and saving same file to disk in 50 different places just to hash it and delete it afterwards did not sound like an attractive idea.

For the load testing app, I managed to figure out how to use P/Invoke and Win API calls to CreateCreateHash, CryptHashData, etc. so I could hash the file in memory during the download. As each chunk was read from the http resonse string it was hashed and thrown away before the next one was fetched. Oh, but did I really waste a lot of time with the Win API and interop. Hey, I got it to work, but you can use a purely managed approach without all of the interop hassle.

I always thougt you could hash chunk by chunk using the .NET classes, I just did not know how and could not find any examples. Recently, I came across some posts that demonstrated it is indeed possible to hash a download by chunks by just using the existing .NET framework classes. It’s not hard as the following code shows. This demo uses a file opened as a FileStream to prove the concept, but this could easily be a response stream retrieved from a web request.

using System;
using System.IO;
using System.Security.Cryptography;
using System.Text;


namespace ConTest
{

    class Program
    {
        static void Main(string[] args)
        {
            string sFile = @"C:\Temp\somebigfile.dat";

            // Use a file as an input stream, but we can pretend that it is something like a response stream from 
            // a web request.
            using (MD5 md5 = MD5.Create())
            using (Stream input = File.OpenRead(sFile))
            {
                // small enough buffer to make us read the file in chunks
                byte[] buffer = new byte[0x4096];
                int bytesRead;

                // open the file, 
                while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
                {
                    // hash each chunk at a time.
                    md5.TransformBlock(buffer, 0, bytesRead, null, 0);
                }

                // We need this, even though no data is hashed.  Do not
                // pass null for the buffer
                md5.TransformFinalBlock(buffer, 0, 0);

                byte[] hashVal = md5.Hash;

                // convert the hash value to a nice string, something like would see on a download site or 
                // get from the Microsoft hasing utility, fciv.exe.
                StringBuilder sbHash = new StringBuilder(hashVal.Length * 2);
                int idxMax = hashVal.Length;
                for (int idx = 0; idx < idxMax; ++idx)
                {
                    sbHash.Append(hashVal[idx].ToString("x2"));
                }

                Console.WriteLine(sbHash.ToString());
            }

            Console.ReadLine();
        }
    }
}

The hash was checked against fciv.exe, the Microsoft command line hashing utility to prove that this code works. Below are the following posts led me to figure out that I could hash by chunks in purely managed code:

http://blogs.msdn.com/b/shawnfa/archive/2004/02/20/77431.aspx
http://stackoverflow.com/questions/623159/i-have-the-wrong-hash-values-c-cryptography

In the first post, it was not clear that you could pass in a null as the second parameter to TransformBlock. The second post made it clear that you could. I merely provide a similar example in this post that you can copy and past for your own work.

Download PDF
This entry was posted in C#, Cryptography and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *