Prepare metadata

Before running an acquisition, you are responsible for ensuring that your project metadata, instrument, and procedures are valid and accessible through the metadata-service.

You are ready to generate data when:

  • Your project metadata is accessible by project name

  • Your instrument.json is ready to be copied to each data asset OR has been uploaded to the metadata-service

  • Your procedures.json is ready to be copied to each data asset OR all your procedures were performed by NSB

Project name

Your project and subproject (if applicable) needs to be accurate. The full project name <project_name> - <subproject_name> is tied directly with the funding and investigator metadata. The list of project names can be viewed at the metadata-service project_names/ endpoint. Projects that do not have metadata in the metadata-service must upload their own data_description.json – reach out to Scientific Computing for help.

If you need a new project name, please request that it be added with the project name and funding intake form.

Funding

The funding endpoint will be used during data upload to populate your data description with funding information. You can check that your project_name is linked to the correct funding through this tool. Note that changes must be made through the intake form, you cannot modify these fields manually.

Investigators

The investigators endpoint will be used during data upload to populate your data description with investigator information. You can check that your project_name is linked to the correct list of investigators through this tool. Note that changes must be made through the intake form, you cannot modify these fields manually.

Subject

Subject metadata is populated by lab animal services (LAS) without your involvement. You can fetch subject metadata from the metadata-service to verify that the subject information is accurate:

Instrument

Instrument metadata should be prepared in advance of data acquisition.

ID

The instrument_id for AIND should be the SIPE ID for an instrument. If an instrument is not tracked by SIPE, any string will be accepted.

Other details

Multiple instruments

Multiple instrument.json files can be provided when multiple separate instruments are used simultaneously to acquire a data asset. The combined instrument metadata stored with the associated data asset will have an instrument_id that is the combined names of the individual instruments, joined with the '_' character. See metadata merging rules for information about how metadata files are merged during data upload.

Upload options

Users have two options for providing instrument metadata files:

  1. Files can be provided at upload time in the data folder. In this case, it is up to users to ensure that the instrument file(s) are in the data folder when upload is triggered. Users are free to set this up however they choose. Two patterns that have been used are:

    • A static instrument metadata file is saved somewhere on the data acquisition machine and is copied into the data folder prior to upload

    • A script is run that dynamically generates an instrument metadata file before upload.

  2. A static version of the instrument metadata is uploaded to a database in advance. See details below. In this case, users must specify the instrument_id as part of the job parameters in the gather_preliminary_metadata job type settings as follows

    {
       "skip_task": false,
       "job_settings": {
          "instrument_settings": {
             "instrument_id": INSTRUMENT_ID # a string containing a valid instrument ID
          }
       },
       ...
    }
    

    The data transfer service will then pull the instrument metadata from the database during upload.

Note that it is possible to combine these methods. For example, a user could pass the instrument JSON for the behavior instrument in the data directory (named something like instrument_behavior.json) and also specify a physiology rig by instrument ID in the gather_preliminary_metadata job type settings. The two instrument files would be merged by the data transfer service. See metadata merging rules.

Also note that we require all devices in the database to have a unique instrument_id.

Maintenance responsibility

While it is ultimately the responsibility of the scientist collecting data to ensure that all metadata is correct, it is the responsibility of the person who modifies an instrument to update instrument metadata to reflect the changes they made.

How to

The following sections describe use cases for saving, fetching, editing and creating instrument metadata files

I want to write an instrument.json

Instrument JSON files should be created by a Python script using models from the aind-data-schema library to ensure the output file is valid according to the schema (as opposed to directly writing JSON). There are multiple examples of Python scripts for generating instrument JSON files in the data schema examples folder

We recommend that basic maintenance changes, e.g. replacing a device with an identical one but with a different serial number, be done by modifying the Python script and updating the Instrument.modification_date.

I’m ready to upload my instrument JSON file to the database

If you want to store your Instrument metadata file in the Scientific Computing managed database (only 2.0 schema instrument files are supported) you can use the aind-metadata-mapper package. Install it with pip install aind-metadata-mapper.

Then run the following upload code:

from aind_metadata_mapper import utils
from aind_data_schema.core.instrument import Instrument

# Load the JSON as an Instrument object
with open(instrument_path, 'r') as f:
    instrument_object = Instrument.model_validate_json(f.read())

# save the instrument to the database. 
utils.save_instrument(instrument_object)

The “modification_date” field will be automatically updated to the current date when the instrument file is uploaded. There is currently a check on uniqueness by date, so uploading more than one instrument.json per day (for example, if you make a mistake and try to upload a second time) will result in an error. If you do need to upload a second time in day, you’ll need to overwrite the previous instrument by passing the replace=True arguement to the utils.save_instrument() function.

I want to get an instrument from the database

During data upload you can automatically have your instrument.json fetched by the GatherMetadataJob. If you need to see the file you uploaded locally, you can fetch the most recent instrument.json sorted by Instrument.modification_date.

from aind_metadata_mapper import utils

# fetch the instrument, where `INSTRUMENT_ID` is a string containing the instrument ID
instrument_data = utils.get_instrument(INSTRUMENT_ID)

If you need access to an older version of an instrument metadata file from the database, please reach out to someone in Scientific Computing for assistance.

Procedures

Procedures metadata should be prepared in advance. Our goal with procedures metadata is to capture the date, time, and critical parameters of a published Protocol on our protocols.io page.

Currently, only NSB procedures are automatically attached to data assets during upload while custom procedures require a procedures.json file to be uploaded with each data asset. With the roll out of Power Platform / Dataverse, all procedures will need to be uploaded to the metadata-service as they are performed.

Custom procedures

Custom Procedures require you to generate a procedures.json file manually. Note that the data-transfer-service will NOT merge your procedures with any stored in NSB, you must pull the NSB procedures and manually merge them ahead of time, please reach out to Scientific Computing for help with this process.

NSB procedures

Standardized procedures that are performed by NSB (link?) are uploaded and accessible through the metadata-service. You can see the available procedures for a mouse by passing its subject_id here: