Pages

Wednesday, December 1, 2010

Stream Improvements

Stream Improvements

When dealing with a lot of file or memory writes and reads, it becomes apparent that the default stream implementation is a rather not a flexible one, because it lacks individual writes and reads of value types, and it's even more apparent when building a custom serialization when there is a need for writing value types of objects.

So I made my own stream wrapper to help in some more advanced stream handling, now the code for that class is simply to big to fit into the class so the whole thing including something additional that I write will land in the samples. Note that comments are removed from the code, because the code is self explanatory, and in the post it would obscure the implementation details.

I use this class for my database engine project, to do some custom object storing and header and index reading and writing, normal streams and serializators just couldn't be used in such scenario.

The Class



The class is just a wrapper on a simple stream, and it's inheriting from it, to be a more like decorator pattern style to give a possibility to further decorate it if needed.

Each method has it's write and read counterpart and it's for different value type, so this is nothing new really but rather it's very helpful in a class such as stream, the methods all return sizes for written value types (string isn't really a value type, but it handles like one so this is a special case besides it wasn't really needed to give it special treatment).

        public int Write(string data)
        {
            int length = 0;

            if (data == null)
            {
                return WriteNullString();
            }

            int byteCount = encoding.GetByteCount(data);
            length += WriteStringLength(byteCount);

            byte[] bytes = encoding.GetBytes(data);
            stream.Write(bytes, 0, bytes.Length);

            length += bytes.Length;

            return length;
        }

        public int Write(UInt32 value)
        {
            stream.Write(BitConverter.GetBytes(value), 0, 4);
            return intSize;
        }

        public int Write(UInt64 value)
        {
            stream.Write(BitConverter.GetBytes(value), 0, 8);
            return longSize;
        }

        public int Write(long value)
        {
            stream.Write(BitConverter.GetBytes(value), 0, 8);
            return longSize;
        }

By looking at the code one might wonder why those methods return read only int fields that define the size of the valueType. The reason for this is that VMs aren't strictly bound to processor architecture and implementations may differ, now I know that this may by unlikely but it is possible, and also practice shows that in NET event when something seams guaranteed it's not actually take for e.g typeof(T) and T.GetType() I will not go into details of this but know that these methods aren't the same.

this is just an example of the few of many many methods, but perhaps the most flexible and important are write and read value type, that use the underlying methods to perform type safe reads and writes.

        public int WriteValue<T>(T data)
        {
            return WriteValueType(data);
        }

        public int WriteValueType(object data)
        {
            Type dataType = data.GetType();
            int length = 0;

            if (dataType == typeof(int))
                length = Write((int)data);
            else if (dataType == typeof(long))
                length = Write((long)data);
            else if (dataType == typeof(double))
                length = Write((double)data);
            else if (dataType == typeof(float))
                length = Write((float)data);
            else if (dataType == typeof(byte))
                length = Write((byte)data);
            else if (dataType == typeof(decimal))
                length = Write((decimal)data);
            else if (dataType == typeof(uint))
                length = Write((uint)data);
            else if (dataType == typeof(char))
                length = Write((char)data);
            else if (dataType == typeof(TimeSpan))
                length = Write((TimeSpan)data);
            else if (dataType == typeof(Guid))
                length = Write((Guid)data);
            else if (dataType == typeof(DateTime))
                length = Write((DateTime)data);
            else if (dataType == typeof(byte[]))
                length = Write((byte[])data);
            else if (dataType == typeof(string))
                length = Write((string)data);
            else
                throw new DataStreamException("Error: element is not a value type.");

            return length;
        }

        public T ReadValueType<T>()
        {
            Type elemType = typeof(T);
            return (T)ReadValueType(elemType);
        }

        public object ReadValueType(Type element)
        {
            if (element == typeof(string))
                return ReadString();
            else if (element == typeof(Int32))
                return ReadInt32();
            else if (element == typeof(long))
                return ReadLong();
            else if (element == typeof(Guid))
                return ReadGuid();
            else if (element == typeof(DateTime))
                return ReadDateTime();
            else if (element == typeof(TimeSpan))
                return ReadTimeSpan();
            else if (element == typeof(float))
                return ReadFloat();
            else if (element == typeof(double))
                return ReadDouble();
            else if (element == typeof(decimal))
                return ReadDecimal();
            else if (element == typeof(char))
                return ReadChar();
            else if (element == typeof(uint))
                return ReadUInt16();

            throw new DataStreamException("Error: element is not a value type.");
        }

Another section of the class is reading by sections in the stream and stream copy (present in C# 4.0 by default).

        public int Write(Stream data)
        {
            data.Position = 0;
            byte[] streamData = new byte[data.Length];
            stream.Write(streamData, 0, streamData.Length);
            return (int)data.Length;
        }

        public IEnumerable<int> ReadSectionBytes(int sectionSize)
        {
            while(stream.Position < stream.Length)
                yield return stream.Read(new byte[sectionSize], 0, sectionSize);
        }

        public IEnumerable<byte[]> ReadSectionData(int sectionSize)
        {
            byte[] data = new byte[sectionSize];

            while (stream.Position < stream.Length)
            {
                stream.Read(data, 0, sectionSize);

                yield return data;
            }
        }
        public IEnumerable<T> ReadValueTypeSection<T>()
        {
            while (stream.Position < stream.Length)
            {
                T obj = ReadValueType<T>();
                yield return obj;
            }
        }

Reading data by section could be useful if the file has indexes or structures of given size, or to read section in a type safe manner, this is most useful when dealing with serialized objects that have only int fields etc, a common example would be reading a header index in a file that defines the layout of objects.

Improvements


This class could use some more improvements like:

More types handling.
Custom Value types writes and reads.
Creating non linear tree access to data but making it transparent to the consumer. (in progress).
Functional access, and extension methods.
Move/Append Method - this one requires additional word, such method is in progress and will definitely be a part of this class, it's purpose is to append data at a specified index of the stream, resulting in moving the data after the index, the hard part is how to do it efficiently.


Expanding upon this class


With such stream class many cool scenarios are possible, as it gives the user much grater control over the stream and it's operations, and type safety writing and reading, I felt that those operations were missing from the original implementation. The stream collection could be expanded even further by providing an indexed stream that will provide more dictionary Like access to files and in fact I'm in the process of writing one :-)

To Sum Up

I will be expanding this stream classes and after I'm done I will probably releasing it on codeplex or something.

No comments:

 
ranktrackr.net