We are in the process of transitioning our website and are aware that some content may require updating. Please bear with us as we work through these. Thank you for your patience.

A Critical Guide to the UniProtKB Flat-file Format

Attwood TK
F1000Research Bioinformatics Education and Training Collection. 7 2018

Abstract

This Critical Guide briefly presents the need for biological databases and for a standard format for storing and organising biological data. Web-based interfaces have made databases more user-friendly, but knowledge of the underlying file format offers a deeper understanding of how to navigate and mine the information they contain, so that humans and machines can get the most out of them. This Guide explores the file format that underpins one of today’s most popular protein sequence databases – UniProtKB.

Specifically, this Guide introduces the concept of database ‘flat-files’, and examines features of the UniProtKB flat-file format. On reading this Guide, users will be able to:

  • identify key fields within UniProtKB/Swiss-Prot and /TrEMBL flat-files;
  • explain what these fields mean, what information they contain and what the information is used for;
  • analyse the information in different fields and infer structural and functional features of a sequence;
  • examine and investigate the provenance of annotations; and
  • compare annotations at different time-points and evaluate the likely impact of annotation changes.

Title A Critical Guide to the UniProtKB Flat-file Format
Authors
Publication Type Miscellaneous
Series title F1000Research Bioinformatics Education and Training Collection. 7
Year of Publication 2018
URL https://f1000research.com/documents/7-1433
DOI https://doi.org/10.7490/f1000research.1116054.1

Topics

Introduction to bioinformatics, The UniProtKB flat-file format, flat-file databases, flat-files, training material

Keywords

Introduction to bioinformatics, The UniProtKB flat-file format, flat-file databases, flat-files, training material