Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remaining symmetry group fields (Hall, HM symbols, IT number) #2240

Merged
merged 3 commits into from
Mar 3, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions openapi/openapi.json
Original file line number Diff line number Diff line change
@@ -4371,6 +4371,68 @@
"x-optimade-queryable": "optional",
"x-optimade-support": "optional"
},
"space_group_symbol_hall": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Space Group Symbol Hall",
"description": "A Hall space group symbol representing the symmetry of the structure as defined in (Hall, 1981, 1981a).\n\n- **Type**: string\n\n- **Requirements/Conventions**:\n - **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.\n - **Query**: Support for queries on this property is OPTIONAL.\n - The change-of-basis operations are used as defined in the International Tables of Crystallography (ITC) Vol. B, Sect. 1.4, Appendix A1.4.2 (IUCr, 2001).\n - Each component of the Hall symbol MUST be separated by a single space symbol.\n - If there exists a standard Hall symbol which represents the symmetry it SHOULD be used.\n - MUST be `null` if `nperiodic_dimensions` is not equal to 3.\n\n- **Examples**:\n - Space group symbols with explicit origin (the Hall symbols):\n - `P 2c -2ac`\n - `I 4bd 2ab 3`\n - Space group symbols with change-of-basis operations:\n - `P 2yb (-1/2*x+z,1/2*x,y)`\n - `-I 4 2 (1/2*x+1/2*y,-1/2*x+1/2*y,z)`\n\n- **Bibliographic References**:\n - Hall, S. R. (1981) Space-group notation with an explicit origin. Acta Crystallographica Section A, 37, 517-525, International Union of Crystallography (IUCr), DOI: https://doi.org/10.1107/s0567739481001228\n - Hall, S. R. (1981a) Space-group notation with an explicit origin; erratum. Acta Crystallographica Section A, 37, 921-921, International Union of Crystallography (IUCr), DOI: https://doi.org/10.1107/s0567739481001976\n - IUCr (2001). International Tables for Crystallography vol. B. Reciprocal Space. Ed. U. Shmueli. 2-nd edition. Dordrecht/Boston/London, Kluwer Academic Publishers.",
"x-optimade-queryable": "optional",
"x-optimade-support": "optional"
},
"space_group_symbol_hermann_mauguin": {
"anyOf": [
{
"type": "string",
"pattern": "^(P|I|F|A|B|C|R)(\\s+\\d+|\\s+[a-z]+|\\s+\\d+/[a-z]+|\\s+\\d+/\\d+|\\s+-\\d*|\\s+\\d+/m|\\s+[a-z]+/m)*$"
},
{
"type": "null"
}
],
"pattern": "^(P|I|F|A|B|C|R)(\\s+\\d+|\\s+[a-z]+|\\s+\\d+/[a-z]+|\\s+\\d+/\\d+|\\s+-\\d*|\\s+\\d+/m|\\s+[a-z]+/m)*$",
"title": "Space Group Symbol Hermann Mauguin",
"description": "A human- and machine-readable string containing the short Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.\n\n- **Type**: string\n\n- **Requirements/Conventions**:\n - **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.\n - **Query**: Support for queries on this property is OPTIONAL.\n - The H-M symbol SHOULD aim to convey the closest representation of the symmetry information that can be specified using the short format used in the International Tables for Crystallography vol. A (IUCr, 2005), Table 4.3.2.1 as described in the accompanying text.\n - The symbol MAY be a non-standard short H-M symbol.\n - The H-M symbol does not unambiguously communicate the axis, cell, and origin choice, and the given symbol SHOULD NOT be amended to convey this information.\n - To encode as character strings, the following adaptations MUST be made when representing H-M symbols given in their typesetted form:\n - the overbar above the numbers MUST be changed to the minus sign in front of the digit (e.g. '-2');\n - subscripts that denote screw axes are written as digits immediately after the axis designator without a space (e.g. 'P 32')\n - the space group generators MUST be separated by a single space (e.g. 'P 21 21 2');\n - there MUST be no spaces in the space group generator designation (i.e. use 'P 21/m', not the 'P 21 / m');\n\n- **Examples**:\n - `C 2`\n - `P 21 21 21`\n\n- **Bibliographic References**:\n - IUCr (2005). International Tables for Crystallography vol. A. Space-Group Symmetry. Ed. Theo Hahn. 5-th edition. Dordrecht, Springer.\n",
"x-optimade-queryable": "optional",
"x-optimade-support": "optional"
},
"space_group_symbol_hermann_mauguin_extended": {
"anyOf": [
{
"type": "string",
"pattern": "^(P|I|F|A|B|C|R)(\\s+\\d+|\\s+[a-z]+|\\s+\\d+/[a-z]+|\\s+\\d+/\\d+|\\s+-\\d*|\\s+\\d+/m|\\s+[a-z]+/m)*$"
},
{
"type": "null"
}
],
"pattern": "^(P|I|F|A|B|C|R)(\\s+\\d+|\\s+[a-z]+|\\s+\\d+/[a-z]+|\\s+\\d+/\\d+|\\s+-\\d*|\\s+\\d+/m|\\s+[a-z]+/m)*$",
"title": "Space Group Symbol Hermann Mauguin Extended",
"description": "A human- and machine-readable string containing the extended Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.\n\n- **Type**: string\n\n- **Requirements/Conventions**:\n - **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.\n - **Query**: Support for queries on this property is OPTIONAL.\n - The H-M symbols SHOULD be given as specified in the International Tables for Crystallography vol. A (IUCr, 2005), Table 4.3.2.1.\n - The change-of-basis operation SHOULD be provided for the non-standard axis and cell choices.\n - The extended H-M symbol does not unambiguously communicate the origin choice, and the given symbol SHOULD NOT be amended to convey this information.\n - The description of the change-of-basis SHOULD follow conventions of the ITC Vol. B, Sect. 1.4, Appendix A1.4.2 (IUCr, 2001).\n - The same character string encoding conventions MUST be used as for the specification of the `space_group_symbol_hermann_mauguin` property.\n\n- **Examples**:\n - `C 1 2 1`\n\n- **Bibliographic References**:\n - IUCr (2001). International Tables for Crystallography vol. B. Reciprocal Space. Ed. U. Shmueli. 2-nd edition. Dordrecht/Boston/London, Kluwer Academic Publishers.\n - IUCr (2005). International Tables for Crystallography vol. A. Space-Group Symmetry. Ed. Theo Hahn. 5-th edition. Dordrecht, Springer.\n\n",
"x-optimade-queryable": "optional",
"x-optimade-support": "optional"
},
"space_group_it_number": {
"anyOf": [
{
"type": "integer",
"maximum": 230.0,
"minimum": 1.0
},
{
"type": "null"
}
],
"title": "Space Group It Number",
"description": "Space group number which specifies the space group of the structure as defined in the International Tables for Crystallography Vol. A. (IUCr, 2005).\n\n- **Type**: integer\n\n- **Requirements/Conventions**:\n - **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.\n - **Query**: Support for queries on this property is OPTIONAL.\n - The integer value MUST be between 1 and 230.\n - MUST be null if `nperiodic_dimensions` is not equal to 3.",
"x-optimade-queryable": "optional",
"x-optimade-support": "optional"
},
"cartesian_site_positions": {
"anyOf": [
{
114 changes: 114 additions & 0 deletions optimade/models/structures.py
Original file line number Diff line number Diff line change
@@ -11,6 +11,7 @@
ANONYMOUS_ELEMENTS,
CHEMICAL_FORMULA_REGEXP,
CHEMICAL_SYMBOLS,
HM_SYMBOL_REGEXP,
OptimadeField,
StrictField,
SupportLevel,
@@ -601,6 +602,119 @@ class StructureResourceAttributes(EntryResourceAttributes):
),
] = None

space_group_symbol_hall: Annotated[
str | None,
OptimadeField(
description="""A Hall space group symbol representing the symmetry of the structure as defined in (Hall, 1981, 1981a).
- **Type**: string
- **Requirements/Conventions**:
- **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.
- **Query**: Support for queries on this property is OPTIONAL.
- The change-of-basis operations are used as defined in the International Tables of Crystallography (ITC) Vol. B, Sect. 1.4, Appendix A1.4.2 (IUCr, 2001).
- Each component of the Hall symbol MUST be separated by a single space symbol.
- If there exists a standard Hall symbol which represents the symmetry it SHOULD be used.
- MUST be `null` if `nperiodic_dimensions` is not equal to 3.
- **Examples**:
- Space group symbols with explicit origin (the Hall symbols):
- `P 2c -2ac`
- `I 4bd 2ab 3`
- Space group symbols with change-of-basis operations:
- `P 2yb (-1/2*x+z,1/2*x,y)`
- `-I 4 2 (1/2*x+1/2*y,-1/2*x+1/2*y,z)`
- **Bibliographic References**:
- Hall, S. R. (1981) Space-group notation with an explicit origin. Acta Crystallographica Section A, 37, 517-525, International Union of Crystallography (IUCr), DOI: https://doi.org/10.1107/s0567739481001228
- Hall, S. R. (1981a) Space-group notation with an explicit origin; erratum. Acta Crystallographica Section A, 37, 921-921, International Union of Crystallography (IUCr), DOI: https://doi.org/10.1107/s0567739481001976
- IUCr (2001). International Tables for Crystallography vol. B. Reciprocal Space. Ed. U. Shmueli. 2-nd edition. Dordrecht/Boston/London, Kluwer Academic Publishers.""",
support=SupportLevel.OPTIONAL,
queryable=SupportLevel.OPTIONAL,
),
] = None

space_group_symbol_hermann_mauguin: Annotated[
str | None,
OptimadeField(
description="""A human- and machine-readable string containing the short Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.
- **Type**: string
- **Requirements/Conventions**:
- **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.
- **Query**: Support for queries on this property is OPTIONAL.
- The H-M symbol SHOULD aim to convey the closest representation of the symmetry information that can be specified using the short format used in the International Tables for Crystallography vol. A (IUCr, 2005), Table 4.3.2.1 as described in the accompanying text.
- The symbol MAY be a non-standard short H-M symbol.
- The H-M symbol does not unambiguously communicate the axis, cell, and origin choice, and the given symbol SHOULD NOT be amended to convey this information.
- To encode as character strings, the following adaptations MUST be made when representing H-M symbols given in their typesetted form:
- the overbar above the numbers MUST be changed to the minus sign in front of the digit (e.g. '-2');
- subscripts that denote screw axes are written as digits immediately after the axis designator without a space (e.g. 'P 32')
- the space group generators MUST be separated by a single space (e.g. 'P 21 21 2');
- there MUST be no spaces in the space group generator designation (i.e. use 'P 21/m', not the 'P 21 / m');
- **Examples**:
- `C 2`
- `P 21 21 21`
- **Bibliographic References**:
- IUCr (2005). International Tables for Crystallography vol. A. Space-Group Symmetry. Ed. Theo Hahn. 5-th edition. Dordrecht, Springer.
""",
support=SupportLevel.OPTIONAL,
queryable=SupportLevel.OPTIONAL,
pattern=HM_SYMBOL_REGEXP,
),
] = None

space_group_symbol_hermann_mauguin_extended: Annotated[
str | None,
OptimadeField(
description="""A human- and machine-readable string containing the extended Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.
- **Type**: string
- **Requirements/Conventions**:
- **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.
- **Query**: Support for queries on this property is OPTIONAL.
- The H-M symbols SHOULD be given as specified in the International Tables for Crystallography vol. A (IUCr, 2005), Table 4.3.2.1.
- The change-of-basis operation SHOULD be provided for the non-standard axis and cell choices.
- The extended H-M symbol does not unambiguously communicate the origin choice, and the given symbol SHOULD NOT be amended to convey this information.
- The description of the change-of-basis SHOULD follow conventions of the ITC Vol. B, Sect. 1.4, Appendix A1.4.2 (IUCr, 2001).
- The same character string encoding conventions MUST be used as for the specification of the `space_group_symbol_hermann_mauguin` property.
- **Examples**:
- `C 1 2 1`
- **Bibliographic References**:
- IUCr (2001). International Tables for Crystallography vol. B. Reciprocal Space. Ed. U. Shmueli. 2-nd edition. Dordrecht/Boston/London, Kluwer Academic Publishers.
- IUCr (2005). International Tables for Crystallography vol. A. Space-Group Symmetry. Ed. Theo Hahn. 5-th edition. Dordrecht, Springer.
""",
support=SupportLevel.OPTIONAL,
queryable=SupportLevel.OPTIONAL,
pattern=HM_SYMBOL_REGEXP,
),
] = None

space_group_it_number: Annotated[
int | None,
OptimadeField(
description="""Space group number which specifies the space group of the structure as defined in the International Tables for Crystallography Vol. A. (IUCr, 2005).
- **Type**: integer
- **Requirements/Conventions**:
- **Support**: OPTIONAL support in implementations, i.e., MAY be `null`.
- **Query**: Support for queries on this property is OPTIONAL.
- The integer value MUST be between 1 and 230.
- MUST be null if `nperiodic_dimensions` is not equal to 3.""",
support=SupportLevel.OPTIONAL,
queryable=SupportLevel.OPTIONAL,
ge=1,
le=230,
),
] = None

cartesian_site_positions: Annotated[
list[Vector3D] | None,
OptimadeField(
12 changes: 12 additions & 0 deletions optimade/models/utils.py
Original file line number Diff line number Diff line change
@@ -234,6 +234,18 @@ def reduce_formula(formula: str) -> str:

CHEMICAL_FORMULA_REGEXP = r"(^$)|^([A-Z][a-z]?([2-9]|[1-9]\d+)?)+$"
SYMMETRY_OPERATION_REGEXP = r"^([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?)$"
HM_SYMBOL_REGEXP = r"^(P|I|F|A|B|C|R)(\s+\d+|\s+[a-z]+|\s+\d+/[a-z]+|\s+\d+/\d+|\s+-\d*|\s+\d+/m|\s+[a-z]+/m)*$"


def _generate_symmetry_operation_regex():
translation = "1/2|[12]/3|[1-3]/4|[1-5]/6"
translation_appended = f"[-+]? [xyz] ([-+][xyz])? ([-+] ({translation}) )?"
translation_prepended = f"[-+]? ({translation}) ([-+] [xyz] ([-+][xyz])? )?"
symop = f"({translation_appended}|{translation_prepended})".replace(" ", "")
return f"^{symop},{symop},{symop}$"


SPACE_GROUP_SYMMETRY_OPERATION_REGEX = _generate_symmetry_operation_regex()

EXTRA_SYMBOLS = ["X", "vacancy"]

4 changes: 4 additions & 0 deletions tests/adapters/structures/test_structures.py
Original file line number Diff line number Diff line change
@@ -192,6 +192,10 @@ def compare_lossy_conversion(
"species",
"fractional_site_positions",
"space_group_symmetry_operations_xyz",
"space_group_symbol_hall",
"space_group_symbol_hermann_mauguin",
"space_group_symbol_hermann_mauguin_extended",
"space_group_it_number",
)
array_keys = ("cartesian_site_positions", "lattice_vectors")

11 changes: 9 additions & 2 deletions tests/models/test_data/test_good_structures.json
Original file line number Diff line number Diff line change
@@ -191,7 +191,10 @@
{"name": "P", "chemical_symbols": ["P"], "concentration": [1.0] }
],
"structure_features": ["site_attachments"],
"space_group_symmetry_operations_xyz": ["x,y,z", "-x,y,-z", "x+1/2,y+1/2,z", "-x+1/2,y+1/2,-z"]
"space_group_symmetry_operations_xyz": ["x,y,z", "-x,y,-z", "x+1/2,y+1/2,z", "-x+1/2,y+1/2,-z"],
"space_group_symbol_hermann_mauguin": "R -3 m",
"space_group_symbol_hermann_mauguin_extended": "R -3 m",
"space_group_symbol_hall": "I 4bd 2ab 3"
},
{
"task_id": "db/1234567",
@@ -224,6 +227,10 @@
{"name": "P", "chemical_symbols": ["P"], "concentration": [1.0] }
],
"structure_features": ["disorder", "site_attachments"],
"space_group_symmetry_operations_xyz": ["x,y,z"]
"space_group_symmetry_operations_xyz": ["x,y,z"],
"space_group_symbol_hall": "P 2yb (-1/2*x+z,1/2*x,y)",
"space_group_symbol_hermann_mauguin": "P 1",
"space_group_symbol_hermann_mauguin_extended": "P 1",
"space_group_it_number": 122
}
]
4 changes: 4 additions & 0 deletions tests/models/test_structures.py
Original file line number Diff line number Diff line change
@@ -211,6 +211,10 @@ def test_bad_structures(
{"space_group_symmetry_operations_xyz": ["xy,z"]},
"String should match pattern",
),
(
{"space_group_symbol_hermann_mauguin": "P1"},
"String should match pattern",
),
)


63 changes: 62 additions & 1 deletion tests/models/test_utils.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
import re
from collections.abc import Callable

import pytest
from pydantic import BaseModel, Field, ValidationError

from optimade.models.utils import OptimadeField, StrictField, SupportLevel
from optimade.models.utils import (
HM_SYMBOL_REGEXP,
OptimadeField,
StrictField,
SupportLevel,
)


def make_bad_models(field: Callable):
@@ -159,3 +165,58 @@ def test_anonymize_formula():
assert anonymize_formula("Si1 O2") == "A2B"
assert anonymize_formula("Si11 O2") == "A11B2"
assert anonymize_formula("Si10 O2C4") == "A5B2C"


VALID_HM_SYMBOLS = [
"P 1", # Triclinic
"P -1",
"P 2", # Monoclinic
"P 21",
"P m",
"P c",
"P 2/m",
"P 21/c",
"P 21/n",
"C 2/c",
"P 2 2 2", # Orthorhombic
"P 21 21 21",
"P n n n",
"P m m a",
"F d d d",
"I m m a",
"P 4", # Tetragonal
"P 41",
"P 42",
"P 43",
"I 4/m m m",
"P 3", # Trigonal
"R 3",
"P 31",
"R -3 m",
"P 6", # Hexagonal
"P 63/m m c",
"P m -3", # Cubic
"F m -3 m",
"I a -3 d",
]

INVALID_HM_SYMBOLS = [
"", # Empty string
"p 1", # Lowercase lattice
"Q 1", # Invalid lattice
"P1", # No space
"1 P", # Wrong order
"P 2/c/m", # Invalid combination
"PP 2", # Double letter
"X -3 m", # Invalid lattice
]


@pytest.mark.parametrize("hm_symbol", VALID_HM_SYMBOLS)
def test_hm_symbol_regexp(hm_symbol):
assert re.match(HM_SYMBOL_REGEXP, hm_symbol)


@pytest.mark.parametrize("hm_symbol", INVALID_HM_SYMBOLS)
def test_invalid_space_groups(hm_symbol):
assert re.match(HM_SYMBOL_REGEXP, hm_symbol) is None
2 changes: 1 addition & 1 deletion tests/server/test_client.py
Original file line number Diff line number Diff line change
@@ -509,7 +509,7 @@ def test_list_properties(

results = cli.list_properties("structures")
for database in results:
assert len(results[database]) == 23, str(results[database])
assert len(results[database]) == 27, str(results[database])

results = cli.search_property("structures", "site")
for database in results: