Weather data analysis and visualization – Big data tutorial Part 8/9 – Visualizing: HTML charts

Tutorial big data analysis: Weather changes in the Carpathian-Basin from 1900 to 2014 – Part 8/9

Data visualization – Interactive HTML5 charts with Flot JS Library

The two best tools for HTML5 charting I have checked out are

I?ve chosen Flot?s interactive chart with switchable diagrams and came to an idea that it would be nice to also have trend lines for the charts and used it as an example.

Both tools use propriety data sets, instead of manipulating their example parsers, I?ve chosen to alter my data to comply with the provided examples.

After altering the files with SED, so as above, I have came to a problem? Hadoop is a parallel system, all processes are analyzing portions of the dataset, resulting dataset is unordered and if it is complex, it can?t be ordered with big. So I had data like:

Weather station, (1950, 55), (1940, 50), (1960, 67)

Unordered key-value pairs, which was not a problem for the map tutorial but is a problem for Flot as it plots the graph iterating through the dataset. The dataset have to be ordered by the keys, resembling decades in the above example.

I?ve dropped SED and chosen to create a Python script to make replacements in the datasets and also to order the key-value pairs for Flot to have it properly shown.

Pythonsort.txt, the input file is the resulting file from the last SED command, aka rain_new3.csv

import collections
with open ("pythonsort.txt", "r") as myfile:
    # For each line
    for line in myfile.readlines():
		# If line does not contain ": { 
		if line.find('": {') > 0:
			# Replace ": { to ' - Rain": {
			string = line.replace('": {', ' - Rain": {')
			# Print string to output
			print string			
		if line.find('label') > 0:
			string = line.replace('",', ' - Rain",')
			print string
		if line.find('data') == -1 and line.find('label') == -1 and line.find('": {') == -1:
			print line
		#Print this line anyway with Flot?s markup on showing lines and data points too
		print 'lines: { show: true },points: { show: true },'
		if line.find('data') > 0:
			# Do multiple replace to comply with Pyhton collections
			string = line.replace('data: [[', '{')
			string = string.replace(']]},', '}')
			string = string.replace(',', ':')
			string = string.replace(': ', ', ')
			string = string.replace('[', '')
			string = string.replace(']', '')
			# Convert the string to a Python collection
			tempco = eval(string)
			# Order the collection of maps by their keys
			od = collections.OrderedDict(sorted(tempco.items()))
			# Convert the ordered collection to a string
			string = str(od)
			# Make multiple replacement to comply with Flot?s input format
			string = string.replace('OrderedDict([(', 'data: [[')
			string = string.replace(')])', ']]},')
			string = string.replace('(', '[')
			string = string.replace(')', ']')
			print string
# Print trendline charts in the input format of Flot
with open ("pythonsort.txt", "r") as myfile:
    for line in myfile.readlines():
		if line.find('": {') > 0:
			string = line.replace('": {', ' - Rain trend": {')
			print string	
		if line.find('label') > 0:
			string = line.replace('",', ' - Rain trend",')
			print string
		if line.find('data') == -1 and line.find('label') == -1 and line.find('": {') == -1:
			print line
		print 'lines: { show: true },points: { show: true },'
		if line.find('data') > 0:
			string = line.replace('data: [[', '{')
			string = string.replace(']]},', '}')
			string = string.replace(',', ':')
			string = string.replace(': ', ', ')
			string = string.replace('[', '')
			string = string.replace(']', '')
			tempco = eval(string)
			od = collections.OrderedDict(sorted(tempco.items()))
			y = od.values()
			N = len(y)
			x = range(N)
			B = (sum(x[i] * y[i] for i in xrange(N)) - 1./N*sum(x)*sum(y)) / (sum(x[i]**2 for i in xrange(N)) - 1./N*sum(x)**2)
			A = 1.*sum(y)/N - B * 1.*sum(x)/N
			string = 'data: ['
			for key, value in od.iteritems(): 	
			    string+=("[%s, %s], " % (key, A+B*key))
			string+=']},'
			string = string.replace(', ]},', ']},')
			print string

Be cautious about the opening and closure marks of the dataset ( []{} ), those have to be set manually.

I have made some minor cross-browser compatible modifications on the CSS to have many series shown in columns, below the diagram.

After adding the resulting dataset to Flot?s minimally modified example, toggling of PRCP charts for each weather stations and toggling their trend line became possible

Download the example Flot diagram and Python script: Flot_Diagram.zip