Feature Request
Problem
Currently, Diagram supports construction from a single source: Table, Schema, or module with a schema attribute.
The use case is: when there are multiple connected (with foreign key references between their tables) schemas, and the user wishes to resolve (e.g. to view) these references. However, a user cannot construct a single Diagram from multiple schemas. The workaround
diagrams = [dj.Diagram(schema) for schema in schema_list]
full_diagram: dj.Diagram = sum(diagrams[1::], diagrams[0])
Is not efficient, since it scales linearly with respect the number of schemas (let this be $n$). This is because each Diagram constructor call loads Dependencies:
|
connection.dependencies.load() |
which makes 2 queries to INFORMATION_SCHEMA tables to fetch primary and foreign keys:
|
# load primary key info |
|
keys = self._conn.query( |
|
""" |
|
SELECT |
|
concat('`', table_schema, '`.`', table_name, '`') as tab, column_name |
|
FROM information_schema.key_column_usage |
|
WHERE table_name not LIKE "~%%" AND table_schema in ('{schemas}') AND constraint_name="PRIMARY" |
|
""".format( |
|
schemas="','".join(self._conn.schemas) |
|
) |
|
) |
|
pks = defaultdict(set) |
|
for key in keys: |
|
pks[key[0]].add(key[1]) |
|
|
|
# add nodes to the graph |
|
for n, pk in pks.items(): |
|
self.add_node(n, primary_key=pk) |
|
|
|
# load foreign keys |
|
keys = ( |
|
{k.lower(): v for k, v in elem.items()} |
|
for elem in self._conn.query( |
|
""" |
|
SELECT constraint_name, |
|
concat('`', table_schema, '`.`', table_name, '`') as referencing_table, |
|
concat('`', referenced_table_schema, '`.`', referenced_table_name, '`') as referenced_table, |
|
column_name, referenced_column_name |
|
FROM information_schema.key_column_usage |
|
WHERE referenced_table_name NOT LIKE "~%%" AND (referenced_table_schema in ('{schemas}') OR |
|
referenced_table_schema is not NULL AND table_schema in ('{schemas}')) |
|
""".format( |
|
schemas="','".join(self._conn.schemas) |
|
), |
|
as_dict=True, |
|
) |
|
) |
So $2n$ queries to INFORMATION_SCHEMA are made.
Requirements
Allow construction of a Diagram from multiple schema names. Construction from multiple schemas should be efficient, making $2 = O(1)$ queries (primary key and foreign key query) to INFORMATION_SCHEMA.
Justification
Adding support for this feature would enable users to efficiently view inter-schema references, even for large numbers of schemas.
Alternative Considerations
In my application, I workaround this by creating a mock class that we can pass to the Diagram constructor to get this behavior:
class AbstractDependencies(object):
"""
Lightweight subclass of dj.connection.Dependencies that is meant
to be passed to dj.Diagram constructor.
"""
def __init__(self, schema_names: list[str], deps: dj.connection.Dependencies):
super().__init__()
if not schema_names:
raise NotImplementedError("No schemas passed")
# Set attributes for dj.Diagram.__init__
self.schema_names = schema_names
self.connection = deps._conn
self.context = dict()
# Edit Dependencies object in-place
deps.clear()
schema_dict: dict[str, dj.VirtualModule] = {
schema: dj.VirtualModule(schema, schema, connection=self.connection)
for schema in self.schema_names
}
deps._conn.schemas = schema_dict
deps.load(force=True)
assert deps._loaded, "Dependencies not loaded"
self.nodes = deps.nodes
self.database = self.schema_names[0]
def to_diagram(self) -> dj.Diagram:
diagram = dj.Diagram(self, self.context)
for node in self.nodes:
if any(
node.startswith("`%s`" % database) for database in self.schema_names
):
diagram.nodes_to_show.add(node)
return diagram
def get_diagram_from_schemas(schemas: list[str]) -> dj.Diagram:
deps: dj.connection.Dependencies = connection.dependencies
abs_deps = AbstractDependencies(schemas, deps)
return abs_deps.to_diagram()
This is $O(1)$ but hacky.
Environment
- OS: Linux (relevant for all)
- Python Version: 3.9 (relevant for all)
- MySQL Version: 8.0 (relevant for all)
- MySQL Deployment Strategy: remote (percona-qa.datajoint.io)
- DataJoint Version: 0.14.3
Feature Request
Problem
Currently,
Diagramsupports construction from a singlesource:Table,Schema, or module with aschemaattribute.The use case is: when there are multiple connected (with foreign key references between their tables) schemas, and the user wishes to resolve (e.g. to view) these references. However, a user cannot construct a single
Diagramfrom multiple schemas. The workaroundIs not efficient, since it scales linearly with respect the number of schemas (let this be$n$ ). This is because each
Diagramconstructor call loadsDependencies:datajoint-python/datajoint/diagram.py
Line 103 in 77b75e9
which makes 2 queries to
INFORMATION_SCHEMAtables to fetch primary and foreign keys:datajoint-python/datajoint/dependencies.py
Lines 99 to 135 in 77b75e9
So$2n$ queries to
INFORMATION_SCHEMAare made.Requirements
Allow construction of a$2 = O(1)$ queries (primary key and foreign key query) to
Diagramfrom multiple schema names. Construction from multiple schemas should be efficient, makingINFORMATION_SCHEMA.Justification
Adding support for this feature would enable users to efficiently view inter-schema references, even for large numbers of schemas.
Alternative Considerations
In my application, I workaround this by creating a mock class that we can pass to the
Diagramconstructor to get this behavior:This is$O(1)$ but hacky.
Environment