João Rodrigues | 1 Dec 13:12 2010
Picon

Re: Features of the GSOC branch ready to be merged

Ok, I managed to branch it. There were some other files needing attention
other than Atom.py and IUPACData.py so it took a while to pinpoint them
all.. lesson learned to be careful with commits :)

If you want to test it yourselves, here it is:

https://github.com/JoaoRodrigues/biopython/tree/atom-element/

Best! And thanks for the help :)

João

_______________________________________________
Biopython-dev mailing list
Biopython-dev <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
João Rodrigues | 1 Dec 18:01 2010
Picon

Re: Features of the GSOC branch ready to be merged

Following Peter's comments I changed some stuff.

I also noticed one thing: metal ions like CA and CL have their names
starting one character before regular C and N atoms. That allows some
discrimination between CA (alpha carbon) and CA (calcium) for example. I'd
never noticed this before, thus relying on the hetero_flag to try and
exclude metal ions (HETATM) because they would likely be wrong if such an
ambiguous case existed. I thus removed the hetero_flag I'd added to Atom
objects and expanded the element guessing logic to all atoms.

I also changed the tests in test_PDB.py to reflect this.

Best! And thanks Peter for the comments!
Peter | 1 Dec 18:15 2010
Picon
Picon

Re: Features of the GSOC branch ready to be merged

On Wed, Dec 1, 2010 at 5:01 PM, João Rodrigues <anaryin <at> gmail.com> wrote:
> I also noticed one thing: metal ions like CA and CL have their names
> starting one character before regular C and N atoms. That allows some
> discrimination between CA (alpha carbon) and CA (calcium) for example. I'd
> never noticed this before, ...

Is this documented in the PDB format definition? More importantly,
do third party tools follow this rule? They are the only reason we
need the code to guess the element in the first place, right? (Since
the PDB provided files should all have the element column).

Peter
Eric Talevich | 1 Dec 18:29 2010
Picon

Re: Features of the GSOC branch ready to be merged

On Wed, Dec 1, 2010 at 12:15 PM, Peter <biopython <at> maubp.freeserve.co.uk>wrote:

> On Wed, Dec 1, 2010 at 5:01 PM, João Rodrigues <anaryin <at> gmail.com> wrote:
> > I also noticed one thing: metal ions like CA and CL have their names
> > starting one character before regular C and N atoms. That allows some
> > discrimination between CA (alpha carbon) and CA (calcium) for example.
> I'd
> > never noticed this before, ...
>
> Is this documented in the PDB format definition? More importantly,
> do third party tools follow this rule? They are the only reason we
> need the code to guess the element in the first place, right? (Since
> the PDB provided files should all have the element column).
>
>
I think can rely on this convention. I'd read this somewhere else (maybe on
one of Andrew Dalke's pages) but didn't think to apply it to João's problem.

Here's a reference:
http://bmerc-www.bu.edu/needle-doc/latest/atom-format.html#pdb-atom-name-anomalies

-Eric
João Rodrigues | 1 Dec 19:34 2010
Picon

Re: Features of the GSOC branch ready to be merged

http://www.wwpdb.org/documentation/format32/sect9.html

Well, there doesn't seem to be a written rule, but it is shown in the
documentation of the format.

Also, do you think it's worthy to include a sanity check for those elements
that have been assigned? For example when parsing a file checking if the
assigned element truly corresponds to what it should be and issuing a
warning or even an exception if otherwise?

Gmane