Copyright (c) 2016, Thom Fruehwirth, Ulm University, Germany

Programmed in SWI Prolog using Constraint Handling Rules.

This service comes with absolutely no warranty.

Query your spreadsheet files like a data base. The tool will return exemplary exact and approximate matches. Our method does not make up new data values or new rows. You can use these results to find dependencies in your data and to predict from your data on the fly. This is the easy way to data mining.

To start, upload a spreadsheet data file (up to 2 MB). The file must be a plain comma-separated csv file (no semicolons, no xls). Each line (row) consists of one or more fields, separated by commas. The first line must be the header describing the fields in each column. There must be no empty lines. Numbers must not be in quotes and use a dot as a decimal point. For dates, put the year first, e.g. 20160601 or "2016/06/01". If you think your data depends e.g. on the month (or weekday), introduce a separate field for the month (or weekday).

If the upload is successful, some initial information about your tabular data will be displayed: The field descriptions from the first line, and for each field, their minimum, median and maximum values and the number of values.

To query the uploaded csv file, specify some of the given fields and hit the return key: To specify a field, either use a concrete value, a question mark (?) which denotes a value to find, or a hyphen (-) which denotes a value you do not care about. The default for all fields is "don't care".

The answer will show rows from your data with exact and approximate matches: Answer rows will be sorted in lexical order as given, except that the values for the "?"-query fields will come first. There are several types of matches (ordered by precision, i.e. degree of approximation): exact, close, similar and related matches. There may not be exact matches, but there will always be some types of approximate matches. Values that are close to each other in standard sorting order are considered similar.

For each type of match, several exemplary answer rows will be shown: These are the median of all rows and the smallest and largest row that covers at least 75% of the rows found as well as the most frequent ones (with a bar indicating their relative frequency in the answer rows).

With our symbolic approach, data will be analysed row by row. To analyse data in several rows (such as time series), rewrite your data and put it into one row (sliding window approach).)

Your sessions expires after a time-out. Your data will not be stored. To restart, reload the inial web page.

Feedback is welcome, mail to thom.fruehwirth@uni-ulm.de