Database Representation Working Group

Purpose:

This working group will be responsible for developing standardized names for data fields which can be understood and interpreted by all software tools, allowing interoperability between pipelines from different developers. Close collaboration with the Minimal Standards Working Group is expected.

The formats working group is focused on developing standard file formats and schemas to represent annotated antibody and T cell receptor sequences and any downstream data representations. The proliferation of tools for processing raw AIRR data is making it more difficult to compare results between tools and to build modular data pipelines. We have been developing a CSV-like file format for representing annotated reads and clones, with the goal of having it implemented in multiple common AIRR pipelines (e.g., immcantation).

  • Multiple WGs are designing implementation standards and could use technical input on data representation.
  • Coordination with AIRR Working Groups to specify data models, e.g.,
    • Common Repo defining minimal APIs for repositories and REST resources
    • MinStd choosing ontologies for their fields
    • Germline defining new germlines and annotations
  • Ensure all AIRR groups are working in mutually compatible ways (in terms of data)
    • Ensure we have liaisons on all other relevant working groups
  • Work on representation of provenance of data sets

 

Goals for 2018:

  • Submit manuscript to publicize format
  • Develop format for representing clones
  • Finish integration of GitHub repository with MiAIRR
  • Finish specifying metadata file format
  • Public release of reference library to read/write/validate AIRR format files.
    • Initially targeting Python and R
  • Releasing documentation incl. example output/use

 

Members

Co-leaders: Uri Laserson & Scott Christley

Members: Aaron Rosenfeld, Anna Fowler, Ahmad Chan, Brian Corrie, Bojan Zimonja, Chaim Schramm, Corey Watson, Daniel Gadala-Maria, Duncan Ralph, Felix Breden, Jason Vander Heiden, Jerome Jaglale, Jessica Finn, Nishanth Marthandan, Richard Bruskiewich, Scott Christley, Steve Kleinstein, Susanna Marquez

 

Resources

Visit the Database Representation Working Group Projecon the B-T.CR Wiki

Visit the Database Representation Working Group Project on GitHub.

Visit the AIRR Standards Documentation.