Pages

Saturday, March 14, 2009

Xml2Json

Third post... (wow)

As there are much xml2json generators on the net that come in all shapes and sizes (well not that much), but all fail to deliver when there is a slight change in the conversion notation, so basically you need to implement your own.

Since there is no standard how you should do that, besides some need a little bit different conversions, I will not get into code how to do it but rather try to do some analysis on the parser and the converter from XML to json.

Since the XML parser and json parsers are usually LL(1) parsers, so they are fast in terms of parsing time. Now depending on what are you trying to do usually a converter should be also a LL(1) but depending on what are you doing it may be also LL(1 + LL(n)) for e.g if you gathering metadata or gathering some special namespace tags into one json field, and since most of those converters will be used in web apps, they need to be fast.

Things that we can do designing such a converter or any other parser, converter, compiler, interpreter.

1. Thinking out of the box, about the features that it will do in the future.
This is kinda strange, but it may happen that you were given a task to do a simple parser converter and it worked well for months doing it's job, but suddenly someone thought that it would be fun to try some crazy XML construction that would be a standard now because it is easier to write form the authors point, and your converter failed. So you are given a task to fix it asap implementing the new stuff to a converter that wasn't design for that, after a few situations like that your code will look like it's ducktaped and barely holds on.

2. Make it extendable
(At my current job, it's something I will be designing).
Let's say that you have a great converter, but let's also say that the XML its converting has 100K lines, that's crazy in terms of lines of code for the XML, so someone decides that this is way to big and to hard to manage, so XPath will do the job and the XML will be split, but your converter will also need to be extended to support this feature, and if again someone will decide that XQuery will be nice, you need to implement that standard (I think there is a C# version, but there wasn't in the past).
So a converter and a parser should be extendable, in my opinion at a basic level it is always good to define a Machine & Politics pattern.
The code is designed in such a way that if the rules of conversion, parsing will change the machine (converter, parser) will apply to those rules (rule engine anyone? ;-) ) . Well im not saying anything new but a lot of ppl forget about this or simply don't have the time to implement code in such a way. (this pattern is especially needed in big and complex algorithms)
my new idea that I'm working right now is to pass to the converter a set of language and behavioral rules that can be interpreted, in EBNF Notation for e.g (actually a modification of EBNF would be more suitable, as we need to add some meaning and a set of semantic rules also).

3. Make it as fast as possible
Again nothing new, algorithms as always need to be fast, and as this article was about LL parsing and converting, we use recurrence over iteration, but as we are converting one data language into another the recursion can be better than iteration, it would actually made a lot of more sense to avoid it if we had some contextual parser converter.

I Made a EBNF spec today on how to convert from xml 2 json that might help others implementing the code, but note that this is no strict an valid language in EBNF it's only informational (it has flaws).

EBNF:



x2jExpression = Xml (XmlNode+ (xjObject | xjArray) | XmlNode ("null" | xjTerm)) Json .

xjArray = "[" (xjObject("," xjObject)* | XmlValue("," XmlValue)*) "]".

xjObject = "{" [(xjTerm("," xjTerm)*)] "}".
xjTerm = "'" XmlTagName "'" ":" "'" XmlValue "'".

Json = String.

Xml = XmlNode+.
XmlNode = XmlTag | (XmlTag (XmlValue| XmlNode) XmlTag).
XmlTag = "<" XmlTagName ("/>" | ">" ).
XmlTagName = String.
XmlValue = String.
String = anystring.

the x2jExpression wasn't really meant to be EBNF, the purpose of this was to show what converts to what (but as I am lazy i did it as it is, hope it's now very confusing)

and also a nice picture generated from this site



if you cannot understand it, then i encourage you read a little about EBNF and Language Grammars, as they are very very cool.

later on ill try to put up some (i have the code but as this is my task from work it's a bit harder to put it up in the public, so ill have to do a new one)

that is all, hope you liked it.

Ps. sorry for my English

No comments:

 
ranktrackr.net