Projects and Data

Table of Contents

BMRB NMR-STAR Files
Projects

This chapter discusses managing projects, storing the state of NMRViewJ between sessions, and the reading and writing of what we'll refer to as derived data, which includes information such as peak lists, assignments , polymer sequences, and constraints.

NMRView can export and import derived data from and to simple tabular files. Many users rely on this method for maintaining a persistent copy of their data between NMRView sessions. Import and export of peak lists is invoked via the File menu of the Peak Analysis panel (Read Peaks and Write Peaks), assignments through the File menu of the Atom Assignments panel (Read PPM and Write PPM), and sequences through the Molecule->Read_Topology->Sequence menu item on the main control panel.

While supported, the above protocol is not the most appropriate method for maintaining a persistent copy of the data. The various read/write methods were actually designed as a means for transferring data between NMRView and other programs. Using an external program for automated assignments, for example, could be done by exporting the peak list from NMRView and then using programs such as awk, tcl or perl to translate the list to the native format of the external program. It is particularly important to note that not all information about internal information such as peak lists is saved in to the peak list text files. If you are working with an NMRViewJ module such as RunAbout you will lose essential information for the module if you use lists (instead of STAR files discussed below) to save your data. So don't do it.

The preferred method for persistent data storage in NMRView is to write and read files in the NMR-STAR format (http://www.bmrb.wisc.edu). This format was chosen for NMRView because it is platform independent and conforms to a standard. Because it is the format developed and used by the BioMagResBank for archival of NMR data, the end results of user's NMR assignment process is already in the format necessary for uploading to the BMRB. Hopefully, this feature will encourage users to upload their data. Finally, NMRView is one of the few programs that can actually read and display BMRB NMR-STAR files. This means it is possible to download files from the BMRB and load them directly into NMRView. The large quantity of data available from the BMRB is thus available to NMR users for aiding in the assignment of homologous proteins or statistical analysis of chemical shift assignments of proteins.

NMRViewJ now also includes a comprehensive project management tool that relieves you of the need to explicitly manage your STAR files and window state files. When beginning to work with a new project you will find it very advantageous to set up an NMRViewJ project right at the start.

BMRB NMR-STAR Files

STAR Format

As of version 4.0.0, NMRView stores derived information in a text file that uses a STAR format. This is the same type of format used by the BioMagResBank. NMR-STAR files from the BMRB can be read into NMRView. The files created by NMRView are largely conformant with the NMR-STAR format. Minor changes have been made in order to ensure that all the relevant information is stored. In particular the format used for peak lists has been modified so as to store information about ambiguous assignments. Earlier versions of NMRViewJ used the STAR 2.x format. Starting with NMRViewJ 8.0 we support the new STAR 3.0 format which allows us to support additional information in the STAR files. For example, the RunAbout module makes extensive use of Resonances, which are newly supported in STAR 3.0.

The data for each category of NMR information, assignments, 3D structures peak lists etc., are grouped together within the STAR file in a so-called Saveframe. Within each saveframe the data is described by a series of keyword-data pairs. For example, the following are some of the keyword-data pairs describing a protein (ubiquitin).

                
save_ubiq
_Entity.Sf_category                 entity
_Entity.framecode                           ubiq
_Entity.ID                           1
_Entity.Name                        ubiq
_Entity.Type                          polymer
_Entity.Polymer_type                  polypeptide(L)
_Entity.Polymer_strand_ID            A
_Entity.Polymer_seq_one_letter_code_can  ?
_Entity.Polymer_seq_one_letter_code
;
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQ
QRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG
;

Repetitive information is stored in the form of loops. For example, the amino-acid sequence of the above polymer is stored as:

    
  loop_
_Entity_comp_index.ID
_Entity_comp_index.Auth_seq_ID
_Entity_comp_index.Comp_ID
_Entity_comp_index.Comp_label
_Entity_comp_index.Entity_ID

1 1 MET . 1
2 2 GLN . 1
3 3 ILE . 1
4 4 PHE . 1
5 5 VAL . 1
...
                stop_
            

The scientists at the BMRB have done a particularly thorough job of defining saveframes and the corresponding keyword-data pairs that can be used to define the types of NMR data that are typically measured on biological macromolecules. Included in the NMRView database are the peaklists, the chemical shift assignments, the molecular topology of any sequence currently in use, and the coordinates of any structures that have been read into NMRView.

NMRView reads certain of the data in the database in a way that is optimized for speed. For example, chemical shifts, assignments, peaklists and coordinates are read by special routines and stored in optimized data structures. All other data are read and stored into Tcl variables. Any information in the database will be read as long as it corresponds to the STAR format (one exception is that, at present, nested loops are not read). Thus the end user can add new saveframes to store any information and know that NMRView will read it and store it in an accessible manner.

Describe accessing STAR data (to be written).

Reading/Writing STAR Files

If you're working with a Project (see next section) all reading and writing of STAR files will be done via the project tools. But if you want to work directly with STAR files without a project use the entries in the FileSTAR menus. When you first create a STAR file it is a good idea to name the file something like myfile1.str, where the name ends with a non-integer character followd by an integer value. Then you, as you're working you can periodically use the FileSTARSave STAR3 (Autoincrement) menu entry. This will generate a name for the star file that increments the integer portion of the name (myfile2.str, myfile3.str etc.).

For new projects you should be working with STAR version 3.x format files, but if you need to work with existing files in the version 2.x format you can use the entries in the FileSTAR2 (deprecated) menu. It is possible to load a version 2.x file, and write out a version 3.x file to upgrade the format of the file.

Fetch STAR Files from the BMRB

NMRViewJ can directly load STAR 3 files from the BMRB. Note, that the BMRB has not updated all of the entries to the new STAR 3.x format so there may be entries that you want, but which are unavailable through this Fetch method. Available entries can be downloaded within NMRViewJ by choosing the FileSTARFetch from BMRB menu item. In the dialog that appears enter the accession number (e.g. 15430) and click the Fetch button. The selected entry will be downloaded to a file on your computers disk and loaded into NMRView. Before downloading another entry you should use the Clear button in the Project browser This will clear out the existing entry. Tip: if your are behind a firewall you may need to enter a proxy name and port number in order to access the BMRB.