Sequences list file and zipped sequences file

Note: The content on this page is not complete, and (for the time being) is merely meant to supplement this deprecated j5 manual page.

With the implementation of support for overriding the name (i.e. display ID) of an input sequence in j5 v3.5.0, there are now two additional columns in the sequences list file, namely the Sequence Name Override column and the Sequence URL column.

The Sequence Name Override column enables the user to specify an alternative sequence name (i.e. display id; to the one found in the sequence file itself) for any sequence. This is how j5 will refer to the sequence in all new output files, but j5 will not change the original input sequence file. It is important that if a sequence name is being overridden here, that corresponding changes are also made to the Part Source (Sequence Display ID) column field in the parts list input file (refer to this manual page). If a sequence name is being overridden (to something different than that in the sequence file itself), j5 will output a note in the assembly output files that this has been done.

Why might a user want to override the name (i.e. display ID) of a sequence? Some sequence file formats (e.g. GenBank's LOCUS field, which is effectively the sequence's display ID - at least as earlier specified and implemented) may have character number limits. In some instances, these character limits are inconvenient when two or more sequences have the same beginning characters such that they do not have distinct display IDs in that particular sequence file format. For example, both sequence names "Alligators_Are_Crazy" and "Alligators_Are_Cool" might be truncated to "Alligators_Are_C" in a GenBank file. As such, a j5 user may use the Sequence Name Override column to override "Alligators_Are_C" with "Alligators_Are_Crazy" and "Alligators_Are_Cool", respectively, so that the display IDs are distinct and as a consequence that they can both be used in the same j5 design (j5 does not allow two different sequences to have the same display ID).

The Sequence URL column is ignored by j5.

Why might a user, then, want to specify a URL to the sequence? In some workflows, it might be very convenient for a user to be able to go from a sequence name (i.e. display ID) referred to in j5 output, back to (for example) a DNA sequence repository entry page, where more information could be found relating to the sequence, or a physical copy of the sequence could be requested from an archive.

Here is an example CSV file for the sequences list file. It has one sequence whose name (i.e. display ID; "pj5_00001" in the source GenBank sequence file) is being overwritten (to "pNJH00010") and for which a URL is provided ("https://public-registry.jbei.org/entry/226"). Here is the corresponding example zip file for the zipped sequences file.