See on GitHub

Data Explorer

Introduction

This Explorer allows to perform various statistical analyses and data mining operations in a very easy and intuitive way. As the name implies, this software aims at exploring data and getting quick insight of the order of magnitude of the observed objects. That's why it does focus on graphical representation and mouse driven operations, unlike the traditional statistical tools cluttered with numerous dialog boxes and lists with five decimal figures. You can, however, have the detailed numbers once your analysis is completed.

Videos

Overview
Contingency table
Weather data

Screenshots

explorer screenshot explorer screenshot

Installation and run

The Explorer is written in javascript and built with electron,

OSX

Download the latest version for darwin from the release page.

Windows

Download the latest version corresponding to your system (32bit or 64bit) from the release page. The application is bundled into a single exe file, thanks to BoxedApp Packer .

Linux

Download electron for linux, download the source of the Explorer from the release page, copy the app folder into electron/resources, then run Electron.

Data loading

At launch time, the Explorer shows a window to choose the dataset to use. You can either drag and drop a file from your computer desktop, or click the clipboard button.

explorer screenshot

Various file formats are accepted :

Source File extension Remarks
Access mdb , accdb Access 2000 or higher
ARFF / KEEL * No comments at the beginning of the file.
The first line must be @relation
BigQuery * A config file with a content like this:
BigQuery
client_secret:/full/path/to/my_private_key.json
query:select * from lookerdata:cdc.project_tycho_reports limit 1000
timeout:60000
dBase dbf
Excel xlsx The names of the fields are expected at the top of the columns
JMP jmp
JSON file * A JSON array of records
LIMDEP / NLOGIT lpj
MINITAB mtw
MLwiN ws Uncompressed format only
MongoDB * A config file with a content like this:

mongodb
host:192.168.0.121:27017
database:geo
collection:countries
query:{cont:{$eq:"EU"},pop:{$gt:50000000}}
Mysql * A config file with a content like this:

mysql
host:192.168.0.2
user:bob
password:secret
database:test
query:select * from mytable
Postgres * A config file with a content like this:

postgres
host:192.168.0.2
user:bob
password:secret
database:test
query:select * from mytable

or:
postgres
connection:bob:secret@192.168.0.2/test
query:select * from mytable
R rdb Binary format only
SAS sas7bdat Uncompressed format only
SPLUS sdd
SPSS sav Uncompressed format only
SQL Server * A config file with a content like this:

mssql
host:192.168.0.121
username:bob
password:secret
query:select * from mytable
Stata dta Stata 8 or higher
Tabular file * The names of the fields are expected on the first line
Bzip2 file bz2 The uncompressed file must be in one of the previous formats
Gzip file gz The uncompressed file must be in one of the previous formats
Web file * Contains the url of the data. The remote file must be in one of the previous formats

If you click the clipboard button, the data must be in tabular form, with the name of the fields on the first line.

Main window

Once the data have been successfully loaded, the main window is displayed :

workspace

Here are the elements of the interface :

  1. List of the categorical fields (aka "the pink zone"). By default only 10 fields are displayed. To resize the list, move the mouse just below the list and drag to shrink or extend the list. To scroll the list, move the mouse to the right of the list.

  2. Icons of the existing analyses (graphs). To run a new analysis, just drag its icon to the workspace.

  3. List of the numerical fields (aka "the blue zone"). By default only 10 fields are displayed. To resize the list, move the mouse just below the list and drag to shrink or extend the list. To scroll the list, move the mouse to the right of the list.

  4. Icons of the tools

  5. Status bar. This area gives at any time details about the object under the mouse, or the action your are about to do.

  6. Dock This area is used to keep graphs that are temporarily removed from the workspace.

  7. Version number

  8. Memory usage

  9. Workspace. This area is where the graphs are created and arranged.

Graph

To create a new graph, drag its icon to the workspace. Alternatively if you dont know which icon to look at, you can right-click or control-click on the workspace to get a menu with all the possible analyses.

A graph is represented by an area with different noticeable parts : graph

  1. Close box. Click on this box to close the graph. All the computations done will be lost.

  2. Option menu. Some graphs have different ways of representing the results. In that case click on this sign to bring up the menu to choose from. Alternatively, right-click or control-click within the graph.

  3. Title bar. This area shows the current selection (see below). Click on this area to drag the graph around.

  4. Slots. These are the places where you can define the parameters of the analysis. Depending on the graph, different combinations of slots are shown. On a pink slot you can drag a categorical field. On a blue slot you can drag a numerical slot. Parameters can be swapped by dragging from one slot to another one ( of the same graph, and of the same color ).

  5. Resize box. Click on this box and drag to resize the graph.

To change the type of a graph, drag the icon of the new type onto the graph. The new analysis will retain the parameters and selection of the previous one.

Selection

Every analysis can be restricted to a part of the data only. The set of observations (records) currently processed by a graph is named the selection, and is displayed in the title bar . Initially, the selection consists of all the observations, and the title is blank.

Selection based on a categorical field

Conversely, the selection of an existing graph can be changed by dragging a pie slice onto its title. This allows to conduct successively the same analysis on different parts of the data.

Selection based on a numerical field
Combining selections

Dragging a slice to the title of a graph which already has a selection will combine the two sets.

If the two variables are the same, the resulting selection will be the union of the two sets. Example: a pie graph splits the data into Apples, Pears, Peaches, and Apricots. If you drag the apple slice to the title of another graph, the selection will be Apples. If you then drag the peach slice to the title of the graph, the selection will be Apples + Peaches

If the two variables are not the same, the resulting selection will be the intersection of the two sets. Example : a pie graph splits the data into Apples, Pears, Peaches and Apricots. If you drag the apple slice to the title of another graph, the selection will be Apples. If you change the variable defining the pie to split the data into Organic and Non-Organic, and drag the Organic slice to the title of the second graph, the selection will be Apples AND Organic.

Conversions

When loading the data, the Explorer identifies fields containing only numbers as numeric, and all others fields as categorical. Sometimes it is desirable to change this. Several possibilities exist.

Original data:

IDCOLOR
1Blue
2Red
3Green
4Red

Data after the conversion

IDBlueRedGreen
1100
2010
3001
4010

Original data :

IDCOLORHEIGHTWIDTHDEPTH
1Blue1422511
2Red1751216
3Green1094814

Data after the pivot :

IDCOLORPIVOTCOUNT
1BlueHEIGHT142
1BlueWIDTH25
1BlueDEPTH11
2RedHEIGHT175
2RedWIDTH12
2RedDEPTH16
3GreenHEIGHT109
3GreenWIDTH48
3GreenDEPTH14

Units

balloons

Tools

Here are the various tools proposed by the toolbar at the bottom of the screen :

graph

Types of analysis

In the browser

The Explorer can also be executed in any modern browser. Open app/index.html, paste the data from the clipboard, and click OK.

Contact

jfbouzereau@netcourrier.com