Interactive Use

Basic usage

To use antelop’s interactive mode, run the command antelop-python in the terminal, either in your conda environment, or with your singularity container alias set.

This command will prompt you for your database username and password. It will then establish a connection to the database, and will open an interactive IPython shell.

There are many objects that are available to you automatically to interact with the database and your analysis functions. First of all, we have the database tables. To see what tables are available, just type:

tables

All of these tables are available for usage automatically. For example, to query the Experimenter table, just type:

Experimenter

To see a description of any table, including its attributes, and available methods, just type:

Experimenter.help()

All the analysis functions, from the standard library, your GitHub repositories and your own analysis folders are also automatically available. To see what functions are available, just type:

functions

Functions are laid out in a folder structure, with the folder names separated by dots. For example, to see all functions in the standard library, type:

stdlib

To see all functions in the script hello_world in the standard library, type:

stdlib.hello_world

Each function is accessed via its location/folder structure, separated by dots, and its name. For example, to run the function greeting() from the script hello_world in the standard library, you would type:

stdlib.hello_world.greeting()

You can additionally get information about a function including its arguments, database dependencies and description by typing:

stdlib.hello_world.greeting.help()

DataJoint Queries

You can interact with the database using DataJoint syntax. You can query the database using the following operators:

DataJoint Query Operators (source: DataJoint documentation)

operator

notation

meaning

restriction

A & cond

The subset of entities from table A that meet condition cond

restriction

A - cond

The subset of entities from table A that do not meet condition cond

join

A * B

Combines all matching information from A and B

proj

A.proj(…)

Selects and renames attributes from A or computes new attributes

aggr

A.aggr(B, …)

Same as projection but allows computations based on matching information in B

union

A + B

All unique entities from both A and B

To perform a query, you just type your query and press enter.

For example, consider the following query:

Experiment * Experimenter & 'experimenter="rbedford"'

Which gives the result:

experimenter

experiment_id

full_name

group

institution

admin

experiment_name

experiment_notes

experiment_deleted

rbedford

1

Rory Bedford

Tripodi Lab

MRC LMB

True

experiment 1

Example notes, experiment was a terrible failure.

False

rbedford

2

Rory Bedford

Tripodi Lab

MRC LMB

True

experiment 2

Experiment was a great success, I won a Nobel prize for this.

False

Schemas

We list show here the tables available in the database within their schema structures. To see their attributes, you can run the .help() method.

Antelop metadata schema

Analysis functions

Analysis functions are laid out as described in the Basic usage section. They can be run automatically on the database. For example:

stdlib.hello_world.greeting()

Will run the function greeting() from the script stdlib.hello_world, on all the data in your database. This will return a list of dictionaries, each containing the primary keys and results of the analysis function. For our lab’s database, this returns:

experimenter

greeting

arueda

Hello, Ana Gonzalez-Rueda!

dmalmazet

Hello, Daniel de Malmazet!

dwelch

Hello, Daniel Welch!

ewilliams

Hello, Elena Williams!

fmorgese

Hello, Fabio Morgese!

mtripodi

Hello, Marco Tripodi!

rbedford

Hello, Rory Bedford!

srogers

Hello, Stefan Rogers-Coltman!

yyu

Hello, Yujiao Yu!

This can be easily converted to a pandas DataFrame for further analysis as follows:

import pandas as pd
df = pd.DataFrame(stdlib.hello_world.greeting())

Analysis functions accept as their first argument a DataJoint restriction. This allows you to run the analysis on a subset of the data. They then take additional custom arguments that modify their behaviour. For example:

stdlib.hello_world.greeting({'experimenter':'rbedford'}, excited=False)

This gives:

experimenter

greeting

rbedford

Hello, Rory Bedford.

To see what keyword arguments are available, check the args attribute as follows:

stdlib.hello_world.greeting.args

To see all functions available in antelop’s standard library, see Analysis Standard Library.

Note, if you are writing analysis functions, it is useful to be able to reload them on the fly, without having to close and reopen your antelop shell. To do this, just run:

reload()

Saving function runs

Antelop provides a comprehensive framework for saving the outputs of your analysis functions in a fully reproducable manner. For details on how this works, see Reproducibility Framework. To use this framework, however, you just need to use the following function methods.

To run a function and save the output to disk, use the following method:

stdlib.hello_world.greeting.save_result(filepath='./result', format='pkl', restriction={'experimenter':'rbedford'}, excited=False)

This will save the output the output to whatever filepath you specify as a pickle file. All additional arguments and the restriction can be left blank, and the values for filepath and format we show are the defaults. A pickle file is a very fast way of saving and loading data of arbitrary types, including figures, numpy arrays, etc. To load this data at a later point, you can run:

import pandas as pd
result = pd.read_pickle('./result.pkl')

Another format you can save your data as is a csv. This is a human-readable file, so can be more useful for simple function runs. However, this method is limited by you not being able to save complex data such as figures and arrays. To save and load a csv file, you can run:

import pandas as pd
stdlib.hello_world.greeting.save_result(filepath='./result', format='csv', restriction={'experimenter':'rbedford'}, excited=False)
result = pd.read_csv('./result.csv')

All of these methods will save the output of the function, as well as a metadata file that contains all the information needed to reproduce the function run exactly. This includes the location of the function, the restriction, the arguments, and a hash of the data in the database that the function could have used. This hash is used to check that the data hasn’t changed since the function was run. For more information on how this works, see Reproducibility Framework. This metadata file is saved alongside the output file, with the same name but with the extension .json. To rerun a function from this metadata file, use:

stdlib.hello_world.greeting.reproduce(json_path='./result.json', result_path='./result.pkl')

Rerunning a function

To rerun a function from the metadata file, use the following method:

stdlib.hello_world.greeting.rerun(json_path='./result.json')

This performs all the necessary immutability checks first, and returns a warning if these fail. It then runs the function and returns the result.

Validating a function run

To manually validate the integrity of all the above factors before rerunning an analysis function, we provide the check_hash method. This takes as input just the reproducibilty json file. It checks the data_hash and code_hash against the database and function definition, and returns a message describing what’s changed, or whether you can rerun the function. For example:

hello_world.greeting.check_hash('./result.json')

Returns:

Reproducibility checks passed

Reproducing a script

A similar method to the one above reruns the analysis function and saves the results, after checking the hashes as above. This method takes in both the saved json, and an output to save results to. Execute as follows:

hello_world.greeting.reproduce('./result.json', './result.pkl')

This allows you to reproduce the results of an analysis.