The Python Quants

blaze – Data Blending and Analysis

Dr. Yves J. Hilpisch

The Python Quants GmbH

analytics@pythonquants.com

www.pythonquants.com

blaze allows Python users a familiar interface to query data living in diverse data storage systems. Cf. http://blaze.pydata.org/.

In [1]:
import blaze as bz

Simple Example

The first example constructs a blaze.Data object from native Python objects.

In [2]:
t = bz.Table([('Henry', 'boy', 8),
              ('Lilli', 'girl', 14)],
            columns=['name', 'gender', 'age'])
In [3]:
t
Out[3]:
name gender age
0 Henry boy 8
1 Lilli girl 14
In [4]:
t[t.age > 10]
Out[4]:
name gender age
0 Lilli girl 14

Data from NumPy Array

Let us read data from an in-memory NumPy ndarray object.

In [5]:
import numpy as np
In [6]:
t1 = bz.Table(np.random.standard_normal((1000, 5)),
             columns=['f0', 'f1', 'f2', 'f3', 'f4'])
In [7]:
t1
Out[7]:
f0 f1 f2 f3 f4
0 -0.716963 -0.995862 -0.637685 0.765487 -0.905161
1 -1.030450 0.364067 -0.334799 1.043347 -0.192704
2 0.278630 -1.595391 -1.130073 0.783550 0.304777
3 -0.846147 1.635864 -0.988581 1.274397 -0.288971
4 -0.998443 -0.572153 -0.591410 -0.176078 -0.221542
5 -0.175455 0.770488 -0.353788 0.193937 0.208759
6 -1.916318 -1.712488 -0.966032 -1.156982 -0.647174
7 0.730536 -0.179158 0.352907 -0.415861 0.917164
8 1.541456 -0.831997 -0.775657 0.551679 0.813129
9 -0.199075 -0.849079 0.808613 -0.630164 0.073952
10 1.111952 0.892264 0.312832 1.734101 -0.729951
In [8]:
t1.data
Out[8]:
array([[-0.71696308, -0.99586196, -0.63768528,  0.76548746, -0.90516139],
       [-1.03044978,  0.36406723, -0.3347985 ,  1.04334725, -0.19270414],
       [ 0.27862972, -1.59539147, -1.13007273,  0.78354955,  0.30477694],
       ..., 
       [-0.12551142, -0.11041556, -1.56344774,  0.98935175,  0.49422456],
       [ 0.83870173,  0.27242804, -0.42901784, -0.68074583, -0.83396272],
       [ 0.0316975 , -0.72234167,  0.07680095,  0.10765042, -0.06864667]])

Data from CSV File

We generate first a CSV file using random data.

In [9]:
import pandas as pd
In [10]:
df = pd.DataFrame(np.random.standard_normal((1000, 5)),
                  columns=['f0', 'f1', 'f2', 'f3', 'f4'])
df.to_csv('data.csv', index=False)

Let us read the data with blaze.

In [11]:
c = bz.CSV('data.csv')
In [12]:
t2 = bz.Table(c)
t2.count()
Out[12]:
1000
In [13]:
t2
Out[13]:
f0 f1 f2 f3 f4
0 1.216628 0.790671 -1.052875 -0.263273 -0.621701
1 -0.382281 -0.288016 -0.566362 -0.737692 -0.877545
2 0.393580 1.722334 -0.098932 -0.127776 1.886648
3 -0.284117 0.471595 0.136858 1.690559 0.398721
4 -0.606783 0.193614 -1.087111 -0.695679 0.940503
5 -0.803144 -0.710318 -0.500398 -1.044722 0.060911
6 -0.369835 -0.138077 -1.570774 2.596149 -0.604341
7 -0.717974 2.608524 -0.165592 0.736936 1.068432
8 -0.268796 1.331343 -0.319321 -0.132833 -1.837966
9 -1.823687 0.619458 0.255937 -0.687432 -0.073094
10 0.505356 -1.428319 0.295405 -0.269511 -0.496539
In [14]:
t2.data
Out[14]:
<blaze.data.csv.CSV at 0x7f4493bf2f50>

Data from SQL

We now generate a SQLite3 table with dummy data.

In [15]:
import sqlite3 as sq3
In [16]:
con = sq3.connect('data.sql')
In [17]:
try:
    con.execute('DROP TABLE numbers')
except:
    pass
In [18]:
con.execute('CREATE TABLE numbers (f0 real, f1 real, f2 real, f3 real, f4 real)')
Out[18]:
<sqlite3.Cursor at 0x7f4493c8b490>
In [19]:
con.executemany('INSERT INTO numbers VALUES (?, ?, ?, ?, ?)',
                np.random.standard_normal((1000, 5)))
Out[19]:
<sqlite3.Cursor at 0x7f4493c8b3b0>
In [20]:
con.commit()
In [21]:
con.close()

Now reading the data with blaze.

In [22]:
t3 = bz.Table('sqlite:///data.sql::numbers')
In [23]:
t3
Out[23]:
f0 f1 f2 f3 f4
0 0.708976 0.410183 -0.696857 1.509420 0.490742
1 -0.279146 0.884721 -0.340103 0.774885 0.218360
2 -0.795093 -0.889350 -0.418793 0.976467 1.209364
3 -0.449158 -1.056430 0.182415 -0.001183 -0.339631
4 -0.515002 1.436926 -0.335134 0.867940 -0.877985
5 0.637213 0.648801 -0.452811 -0.493699 0.560406
6 -0.130063 1.131348 -2.214805 2.459480 -1.390185
7 0.086955 0.693880 1.580424 -0.057145 -1.597628
8 -0.121220 0.152641 0.279719 -0.191457 0.373112
9 -0.419203 -0.159744 0.625661 -0.631084 0.127261
10 -0.869142 -0.872415 -0.115861 0.352545 1.138114
In [24]:
t3.data
Out[24]:
<blaze.data.sql.SQL at 0x7f4493c01910>

Cleaning Up

In [25]:
!ls data.* -n
-rw-r--r-- 1 1141 8 98162 Feb 21 17:16 data.csv
-rw-r--r-- 1 1141 8 56320 Feb 21 17:16 data.sql

In [26]:
# cleaning up
!rm data.*