lamindb.models

Models library.

Feature and label managers

class lamindb.models.FeatureManager(host)

Feature manager.

property slots: dict[str, Schema]

Features by schema slot.

Example:

artifact.features.slots
#> {'var': <Schema: var>, 'obs': <Schema: obs>}
describe(return_str=False)

Pretty print features.

This is what artifact.describe() calls under the hood.

Return type:

str | None

get_values(external_only=False)

Get features as a dictionary.

Includes annotation with internal and external feature values.

Parameters:

external_only (bool, default: False) – If True, only return external feature annotations.

Return type:

dict[str, Any]

add_values(values, feature_field=FieldAttr(Feature.name), schema=None)

Add values for features.

Parameters:
  • values (dict[str, str | int | float | bool]) – A dictionary of keys (features) & values (labels, strings, numbers, booleans, datetimes, etc.). If a value is None, it will be skipped.

  • feature_field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a registry to map the keys of the values dictionary.

  • schema (Schema, default: None) – Schema to validate against.

Return type:

None

set_values(values, feature_field=FieldAttr(Feature.name), schema=None)

Set values for features.

Like add_values, but first removes all existing external feature annotations.

Parameters:
  • values (dict[str, str | int | float | bool]) – A dictionary of keys (features) & values (labels, strings, numbers, booleans, datetimes, etc.). If a value is None, it will be skipped.

  • feature_field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a registry to map the keys of the values dictionary.

  • schema (Schema, default: None) – Schema to validate against.

Return type:

None

remove_values(feature=None, *, value=None)

Remove values for features.

Parameters:
  • feature (str | Feature | list[str | Feature], default: None) – Indicate one or several features for which to remove values. If None, values for all external features will be removed.

  • value (Any | None, default: None) – An optional value to restrict removal to a single value.

Return type:

None

make_external(feature)

Make a feature external.

This removes a feature from artifact.feature_sets and thereby no longer marks it as a dataset feature.

Parameters:

feature (Feature) – A feature.

Return type:

None

class lamindb.models.LabelManager(host)

Label manager.

This allows to manage untyped labels ULabel and arbitrary typed labels (e.g., CellLine) and associate labels with features.

describe(return_str=True)

Describe the labels.

Return type:

str

add(records, feature=None)

Add one or several labels and associate them with a feature.

Parameters:
Return type:

None

get(feature, mute=False, flat_names=False)

Get labels given a feature.

Parameters:
  • feature (Feature) – Feature under which labels are grouped.

  • mute (bool, default: False) – Show no logging.

  • flat_names (bool, default: False) – Flatten list to names rather than returning records.

Return type:

QuerySet | dict[str, QuerySet] | list

add_from(data, transfer_logs=None)

Add labels from an artifact or collection to another artifact or collection.

Return type:

None

Examples

artifact1 = ln.Artifact(pd.DataFrame(index=[0, 1])).save()
artifact2 = ln.Artifact(pd.DataFrame(index=[2, 3])).save()
records = ln.Record.from_values(["Label1", "Label2"], field="name").save()
labels = ln.Record.filter(name__icontains = "label")
artifact1.records.set(labels)
artifact2.labels.add_from(artifact1)
make_external(label)

Make a label external, aka dissociate label from internal features.

Parameters:

label (SQLRecord) – Label record to make external.

Return type:

None

Registry base classes

class lamindb.models.BaseSQLRecord(*args, **kwargs)

Basic metadata record.

It has the same methods as SQLRecord, but doesn’t have the additional fields.

It’s mainly used for IsLinks and similar.

classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")
classmethod to_dataframe(include=None, features=False, limit=100)

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod connect(instance)

Query a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")
save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

TypeVar(T, bound= SQLRecord)

delete(permanent=None)

Delete.

Parameters:

permanent (bool | None, default: None) – For consistency, False raises an error, as soft delete is impossible.

Return type:

None

refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)
class lamindb.models.SQLRecord(*args, **kwargs)

Metadata record.

Every SQLRecord is a data model that comes with a registry in form of a SQL table in your database.

Sub-classing SQLRecord creates a new registry while instantiating a SQLRecord creates a new record.

Example:

from lamindb import SQLRecord, fields

# sub-classing `SQLRecord` creates a new registry
class Experiment(SQLRecord):
    name: str = fields.CharField()

# instantiating `Experiment` creates a record `experiment`
experiment = Experiment(name="my experiment")

# you can save the record to the database
experiment.save()

# `Experiment` refers to the registry, which you can query
df = Experiment.filter(name__startswith="my ").to_dataframe()

SQLRecord’s metaclass is Registry.

SQLRecord inherits from Django’s Model class. Why does LaminDB call it SQLRecord and not Model? The term SQLRecord can’t lead to confusion with statistical, machine learning or biological models.

is_locked: bool

Whether the record is locked for edits.

branch: Branch

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space

The space in which the record lives.

classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")
classmethod to_dataframe(include=None, features=False, limit=100)

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod connect(instance)

Query a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")
restore()

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:

None

delete(permanent=None, **kwargs)

Delete record.

If record is HasType with is_type = True, deletes all descendant records, too.

Parameters:

permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). If None, performs soft delete if the record is not already in the trash.

Return type:

None

Examples

For any SQLRecord object record, call:

>>> record.delete()
save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

TypeVar(T, bound= SQLRecord)

refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)
class lamindb.models.Registry(name, bases, attrs, **kwargs)

Metaclass for SQLRecord.

Each Registry object is a SQLRecord class and corresponds to a table in the metadata SQL database.

You work with Registry objects whenever you use class methods of SQLRecord.

You call any subclass of SQLRecord a “registry” and their objects “records”. A SQLRecord object corresponds to a row in the SQL table.

If you want to create a new registry, you sub-class SQLRecord.

Example:

from lamindb import SQLRecord, fields

# sub-classing `SQLRecord` creates a new registry
class Experiment(SQLRecord):
    name: str = fields.CharField()

# instantiating `Experiment` creates a record `experiment`
experiment = Experiment(name="my experiment")

# you can save the record to the database
experiment.save()

# `Experiment` refers to the registry, which you can query
df = Experiment.filter(name__startswith="my ").to_dataframe()

Note: Registry inherits from Django’s ModelBase.

lookup(field=None, return_field=None, keep='first')

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep (Literal['first', 'last', False], default: 'first') – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()
get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

TypeVar(T, bound= SQLRecord)

See also

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")
to_dataframe(*, include=None, features=None, limit=100, order_by='-id')

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (str | list[str] | None, default: None) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int | None, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by (str | None, default: '-id') – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
connect(instance)

Query a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

Mixins for registries

class lamindb.models.IsVersioned
class lamindb.models.IsVersioned(*db_args)

Base class for versioned models.

Meta = <class 'lamindb.models._is_versioned.IsVersioned.Meta'>
property pk
property stem_uid: str

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid
property versions: QuerySet

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()
refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)
save(*args, force_insert=False, force_update=False, using=None, update_fields=None)

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

delete(using=None, keep_parents=False)
class lamindb.models.HasType

Mixin for registries that have a hierarchical type assigned.

Such registries have a .type foreign key pointing to themselves.

A type hence allows hierarchically grouping records under types.

For instance, using the example of ln.Record:

experiment_type = ln.Record(name="Experiment", is_type=True).save()
experiment1 = ln.Record(name="Experiment 1", type=experiment_type).save()
experiment2 = ln.Record(name="Experiment 2", type=experiment_type).save()
query_types()

Query types of a record recursively.

While .type retrieves the type, this method retrieves all super types of that type:

# Create type hierarchy
type1 = model_class(name="Type1", is_type=True).save()
type2 = model_class(name="Type2", is_type=True, type=type1).save()
type3 = model_class(name="Type3", is_type=True, type=type2).save()

# Create a record with type3
record = model_class(name=f"{model_name}3", type=type3).save()

# Query super types
super_types = record.query_types()
assert super_types[0] == type3
assert super_types[1] == type2
assert super_types[2] == type1
Return type:

SQLRecordList

class lamindb.models.HasParents

Base class for hierarchical registries (ontologies).

view_parents(field=None, with_children=False, distance=5)

View parents in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • with_children (bool, default: False) – Whether to also show children.

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)
view_children(field=None, distance=5)

View children in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)
query_parents()

Query parents in an ontology.

Return type:

QuerySet

query_children()

Query children in an ontology.

Return type:

QuerySet

class lamindb.models.CanCurate

Base class providing SQLRecord-based validation.

classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, from_source=True, strict_source=False)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to inspect against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

bionty.base.dev.InspectResult

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])
classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)

Bulk create validated records by parsing values for an identifier such as a name or an id).

Parameters:
  • values (list[str] | Series | array) – A list of values for an identifier, e.g. ["name1", "name2"].

  • field (str | DeferredAttribute | None, default: None) – A SQLRecord field to look up, e.g., bt.CellMarker.name.

  • create (bool, default: False) – Whether to create records if they don’t exist.

  • organism (SQLRecord | str | None, default: None) – A bionty.Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record to validate against to create records for.

  • mute (bool, default: False) – Whether to mute logging.

Return type:

SQLRecordList

Returns:

A list of validated records. For bionty registries. Also returns knowledge-coupled records.

Notes

For more info, see tutorial: Manage biological ontologies.

Example:

import bionty as bt

# Bulk create from non-validated values will log warnings & returns empty list
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"])
assert len(ulabels) == 0

# Bulk create records from validated values returns the corresponding existing records
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save()
assert len(ulabels) == 3

# Bulk create records from public reference
bt.CellType.from_values(["T cell", "B cell"]).save()
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Whether to mute logging.

  • source_aware (bool, default: True) – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

Parameters:
  • synonym (str | list[str] | Series | array) – The synonyms to add to the record.

  • force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.

  • save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# add a synonym
record.add_synonym("T cells")
record.synonyms
#> "T cells|T-cell|T-lymphocyte|T lymphocyte"
remove_synonym(synonym)

Remove synonyms from a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym()

Add synonyms

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# remove a synonym
record.remove_synonym("T-cell")
record.synonyms
#> "T lymphocyte|T-lymphocyte"
set_abbr(value)

Set value for abbr field and add to synonyms.

Parameters:

value (str) – A value for an abbreviation.

See also

add_synonym()

Example:

import bionty as bt

# save an experimental factor record
scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
assert scrna.abbr is None
assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing"

# set abbreviation
scrna.set_abbr("scRNA")
assert scrna.abbr == "scRNA"
# synonyms are updated
assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"
class lamindb.models.TracksRun
class lamindb.models.TracksRun(*db_args)

Base class tracking latest run, creating user, and created_at timestamp.

Meta = <class 'lamindb.models.run.TracksRun.Meta'>
created_by: User

Creator of record.

created_by_id
property pk
run: Run | None

Run that created record.

run_id
refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)
save(*args, force_insert=False, force_update=False, using=None, update_fields=None)

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

delete(using=None, keep_parents=False)
class lamindb.models.TracksUpdates
class lamindb.models.TracksUpdates(*db_args)

Base class tracking previous runs and updated_at timestamp.

Meta = <class 'lamindb.models.run.TracksUpdates.Meta'>
property pk
refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)
save(*args, force_insert=False, force_update=False, using=None, update_fields=None)

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

delete(using=None, keep_parents=False)

Query sets & managers

class lamindb.models.BasicQuerySet(model=None, query=None, using=None, hints=None)

Sets of records returned by queries.

See also

django QuerySet

Examples

Any filter statement produces a query set:

queryset = Registry.filter(name__startswith="keyword")
property db

Return the database used if this query is executed now.

property ordered

Return True if the QuerySet is ordered – i.e. has an order_by() clause or a default ordering on the model (or is empty).

property query
classmethod as_manager()
to_dataframe(*, include=None, features=None, limit=100, order_by='-id')

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (str | list[str] | None, default: None) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int | None, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by (str | None, default: '-id') – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
delete(*args, permanent=None, **kwargs)

Delete all records in the query set.

Parameters:

permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). Is only relevant for records that have the branch field. If None, uses soft delete for records that have the branch field, hard delete otherwise.

Note

Calling delete() twice on the same queryset does NOT permanently delete in bulk operations. Use permanent=True for actual deletion.

Examples

For any QuerySet object qs, call:

>>> qs.delete()
to_list(field=None)

Populate an (unordered) list with the results.

Note that the order in this list is only meaningful if you ordered the underlying query set with .order_by().

Return type:

list[SQLRecord] | list[str]

Examples

>>> queryset.to_list()  # list of records
>>> queryset.to_list("name")  # list of values
first()

If non-empty, the first result in the query set, otherwise None.

Return type:

SQLRecord | None

Examples

>>> queryset.first()
one()

Exactly one result. Raises error if there are more or none.

Return type:

SQLRecord

one_or_none()

At most one result. Returns it if there is one, otherwise returns None.

Return type:

SQLRecord | None

Examples

>>> ULabel.filter(name="benchmark").one_or_none()
>>> ULabel.filter(name="non existing label").one_or_none()
latest_version()

Filter every version family by latest version.

Return type:

QuerySet

search(string, **kwargs)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field – The field or fields to search. Search all string fields by default.

  • limit – Maximum amount of top results to return.

  • case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
lookup(field=None, **kwargs)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
validate(values, field=None, **kwargs)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute – Whether to mute logging.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to validate against.

  • strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])
inspect(values, field=None, **kwargs)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute – Whether to mute logging.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to inspect against.

  • strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
standardize(values, field=None, **kwargs)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field – The field to return. Defaults to field.

  • return_mapper – If True, returns {input_value: standardized_name}.

  • case_sensitive – Whether the mapping is case sensitive.

  • mute – Whether to mute logging.

  • source_aware – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field – A field containing the concatenated synonyms.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to validate against.

  • strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
iterator(chunk_size=None)

An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

async aiterator(chunk_size=2000)

An asynchronous iterator over the results from applying this QuerySet to the database.

aggregate(*args, **kwargs)

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

async aaggregate(*args, **kwargs)
count()

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

async acount()
get(*args, **kwargs)

Perform the query and return a single object matching the given keyword arguments.

async aget(*args, **kwargs)
create(**kwargs)

Create a new object with the given kwargs, saving it to the database and returning the created object.

async acreate(**kwargs)
bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)

Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

async abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)
bulk_update(objs, fields, batch_size=None)

Update the given fields in each of the given objects in the database.

async abulk_update(objs, fields, batch_size=None)
get_or_create(defaults=None, **kwargs)

Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

async aget_or_create(defaults=None, **kwargs)
update_or_create(defaults=None, create_defaults=None, **kwargs)

Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

async aupdate_or_create(defaults=None, create_defaults=None, **kwargs)
earliest(*fields)
async aearliest(*fields)
latest(*fields)

Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

async alatest(*fields)
async afirst()
last()

Return the last object of a query or None if no match is found.

async alast()
in_bulk(id_list=None, *, field_name='pk')

Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

async ain_bulk(id_list=None, *, field_name='pk')
update(**kwargs)

Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

async aupdate(**kwargs)
exists()

Return True if the QuerySet would have any results, False otherwise.

async aexists()
contains(obj)

Return True if the QuerySet contains the provided obj, False otherwise.

async acontains(obj)
explain(*, format=None, **options)

Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

async aexplain(*, format=None, **options)
raw(raw_query, params=(), translations=None, using=None)
values(*fields, **expressions)
values_list(*fields, flat=False, named=False)
dates(field_name, kind, order='ASC')

Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)

Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

none()

Return an empty QuerySet.

all()

Return a new QuerySet that is a copy of the current one. This allows a QuerySet to proxy for a model manager in some cases.

filter(*args, **kwargs)

Return a new QuerySet instance with the args ANDed to the existing set.

exclude(*args, **kwargs)

Return a new QuerySet instance with NOT (args) ANDed to the existing set.

complex_filter(filter_obj)

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

union(*other_qs, all=False)
intersection(*other_qs)
difference(*other_qs)
select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)

Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

annotate(*args, **kwargs)

Return a query set in which the returned objects have been annotated with extra data or aggregations.

alias(*args, **kwargs)

Return a query set with added aliases for extra data or aggregations.

order_by(*field_names)

Return a new QuerySet instance with the ordering changed.

distinct(*field_names)

Return a new QuerySet instance that will select only distinct results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)

Add extra SQL fragments to the query.

reverse()

Reverse the ordering of the QuerySet.

defer(*fields)

Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case remove all deferrals.

only(*fields)

Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

using(alias)

Select which database this QuerySet should execute against.

resolve_expression(*args, **kwargs)
class lamindb.models.QuerySet(model=None, query=None, using=None, hints=None)

Sets of records returned by queries.

Implements additional filtering capabilities.

See also

django QuerySet

Examples

>>> ULabel(name="my label").save()
>>> queryset = ULabel.filter(name="my label")
>>> queryset # an instance of QuerySet
property db

Return the database used if this query is executed now.

property ordered

Return True if the QuerySet is ordered – i.e. has an order_by() clause or a default ordering on the model (or is empty).

property query
classmethod as_manager()
get(idlike=None, **expressions)

Query a single record. Raises error if there are more or none.

Return type:

SQLRecord

filter(*queries, **expressions)

Query a set of records.

Return type:

QuerySet

to_dataframe(*, include=None, features=None, limit=100, order_by='-id')

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (str | list[str] | None, default: None) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int | None, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by (str | None, default: '-id') – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
delete(*args, permanent=None, **kwargs)

Delete all records in the query set.

Parameters:

permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). Is only relevant for records that have the branch field. If None, uses soft delete for records that have the branch field, hard delete otherwise.

Note

Calling delete() twice on the same queryset does NOT permanently delete in bulk operations. Use permanent=True for actual deletion.

Examples

For any QuerySet object qs, call:

>>> qs.delete()
to_list(field=None)

Populate an (unordered) list with the results.

Note that the order in this list is only meaningful if you ordered the underlying query set with .order_by().

Return type:

list[SQLRecord] | list[str]

Examples

>>> queryset.to_list()  # list of records
>>> queryset.to_list("name")  # list of values
first()

If non-empty, the first result in the query set, otherwise None.

Return type:

SQLRecord | None

Examples

>>> queryset.first()
one()

Exactly one result. Raises error if there are more or none.

Return type:

SQLRecord

one_or_none()

At most one result. Returns it if there is one, otherwise returns None.

Return type:

SQLRecord | None

Examples

>>> ULabel.filter(name="benchmark").one_or_none()
>>> ULabel.filter(name="non existing label").one_or_none()
latest_version()

Filter every version family by latest version.

Return type:

QuerySet

search(string, **kwargs)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field – The field or fields to search. Search all string fields by default.

  • limit – Maximum amount of top results to return.

  • case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
lookup(field=None, **kwargs)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
validate(values, field=None, **kwargs)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute – Whether to mute logging.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to validate against.

  • strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])
inspect(values, field=None, **kwargs)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute – Whether to mute logging.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to inspect against.

  • strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
standardize(values, field=None, **kwargs)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field – The field to return. Defaults to field.

  • return_mapper – If True, returns {input_value: standardized_name}.

  • case_sensitive – Whether the mapping is case sensitive.

  • mute – Whether to mute logging.

  • source_aware – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field – A field containing the concatenated synonyms.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to validate against.

  • strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
iterator(chunk_size=None)

An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

async aiterator(chunk_size=2000)

An asynchronous iterator over the results from applying this QuerySet to the database.

aggregate(*args, **kwargs)

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

async aaggregate(*args, **kwargs)
count()

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

async acount()
async aget(*args, **kwargs)
create(**kwargs)

Create a new object with the given kwargs, saving it to the database and returning the created object.

async acreate(**kwargs)
bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)

Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

async abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)
bulk_update(objs, fields, batch_size=None)

Update the given fields in each of the given objects in the database.

async abulk_update(objs, fields, batch_size=None)
get_or_create(defaults=None, **kwargs)

Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

async aget_or_create(defaults=None, **kwargs)
update_or_create(defaults=None, create_defaults=None, **kwargs)

Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

async aupdate_or_create(defaults=None, create_defaults=None, **kwargs)
earliest(*fields)
async aearliest(*fields)
latest(*fields)

Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

async alatest(*fields)
async afirst()
last()

Return the last object of a query or None if no match is found.

async alast()
in_bulk(id_list=None, *, field_name='pk')

Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

async ain_bulk(id_list=None, *, field_name='pk')
update(**kwargs)

Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

async aupdate(**kwargs)
exists()

Return True if the QuerySet would have any results, False otherwise.

async aexists()
contains(obj)

Return True if the QuerySet contains the provided obj, False otherwise.

async acontains(obj)
explain(*, format=None, **options)

Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

async aexplain(*, format=None, **options)
raw(raw_query, params=(), translations=None, using=None)
values(*fields, **expressions)
values_list(*fields, flat=False, named=False)
dates(field_name, kind, order='ASC')

Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)

Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

none()

Return an empty QuerySet.

all()

Return a new QuerySet that is a copy of the current one. This allows a QuerySet to proxy for a model manager in some cases.

exclude(*args, **kwargs)

Return a new QuerySet instance with NOT (args) ANDed to the existing set.

complex_filter(filter_obj)

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

union(*other_qs, all=False)
intersection(*other_qs)
difference(*other_qs)
select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)

Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

annotate(*args, **kwargs)

Return a query set in which the returned objects have been annotated with extra data or aggregations.

alias(*args, **kwargs)

Return a query set with added aliases for extra data or aggregations.

order_by(*field_names)

Return a new QuerySet instance with the ordering changed.

distinct(*field_names)

Return a new QuerySet instance that will select only distinct results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)

Add extra SQL fragments to the query.

reverse()

Reverse the ordering of the QuerySet.

defer(*fields)

Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case remove all deferrals.

only(*fields)

Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

using(alias)

Select which database this QuerySet should execute against.

resolve_expression(*args, **kwargs)
class lamindb.models.QueryDB(instance)

Convenient access to QuerySets for every entity in a LaminDB instance.

Parameters:

instance (str) – Instance identifier in format “account/instance” or full instance string.

Examples

Query records from a remote instance:

cellxgene = ln.QueryDB("laminlabs/cellxgene")
artifacts = cellxgene.artifacts.filter(suffix=".h5ad")
records = cellxgene.records.filter(name__startswith="cell")
class lamindb.models.ArtifactSet

Abstract class representing sets of artifacts returned by queries.

This class automatically extends BasicQuerySet and QuerySet when the base model is Artifact.

Examples

>>> artifacts = ln.Artifact.filter(otype="AnnData")
>>> artifacts # an instance of ArtifactQuerySet inheriting from ArtifactSet
load(join='outer', is_run_input=None, **kwargs)

Cache and load to memory.

Returns an in-memory concatenated DataFrame or AnnData object.

Return type:

DataFrame | AnnData

open(engine='pyarrow', is_run_input=None, **kwargs)

Open a dataset for streaming.

Works for pyarrow and polars compatible formats (.parquet, .csv, .ipc etc. files or directories with such files).

Parameters:
  • engine (Literal['pyarrow', 'polars'], default: 'pyarrow') – Which module to use for lazy loading of a dataframe from pyarrow or polars compatible formats.

  • is_run_input (bool | None, default: None) – Whether to track this artifact as run input.

  • **kwargs – Keyword arguments for pyarrow.dataset.dataset or polars.scan_* functions.

Return type:

Dataset | Iterator[LazyFrame]

Notes

For more info, see guide: Slice & stream arrays.

mapped(layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None, stream=False, is_run_input=None)

Return a map-style dataset.

Returns a pytorch map-style dataset by virtually concatenating AnnData arrays.

By default (stream=False) AnnData arrays are moved into a local cache first.

__getitem__ of the MappedCollection object takes a single integer index and returns a dictionary with the observation data sample for this index from the AnnData objects in the collection. The dictionary has keys for layers_keys (.X is in "X"), obs_keys, obsm_keys (under f"obsm_{key}") and also "_store_idx" for the index of the AnnData object containing this observation sample.

Note

For a guide, see Train a machine learning model on a collection.

This method currently only works for collections or query sets of AnnData artifacts.

Parameters:
  • layers_keys (str | list[str] | None, default: None) – Keys from the .layers slot. layers_keys=None or "X" in the list retrieves .X.

  • obs_keys (str | list[str] | None, default: None) – Keys from the .obs slots.

  • obsm_keys (str | list[str] | None, default: None) – Keys from the .obsm slots.

  • obs_filter (dict[str, str | list[str]] | None, default: None) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.

  • join (Literal['inner', 'outer'] | None, default: 'inner') – "inner" or "outer" virtual joins. If None is passed, does not join.

  • encode_labels (bool | list[str], default: True) – Encode labels into integers. Can be a list with elements from obs_keys.

  • unknown_label (str | dict[str, str] | None, default: None) – Encode this label to -1. Can be a dictionary with keys from obs_keys if encode_labels=True or from encode_labels if it is a list.

  • cache_categories (bool, default: True) – Enable caching categories of obs_keys for faster access.

  • parallel (bool, default: False) – Enable sampling with multiple processes.

  • dtype (str | None, default: None) – Convert numpy arrays from .X, .layers and .obsm

  • stream (bool, default: False) – Whether to stream data from the array backend.

  • is_run_input (bool | None, default: None) – Whether to track this collection as run input.

Return type:

MappedCollection

Examples

>>> import lamindb as ln
>>> from torch.utils.data import DataLoader
>>> ds = ln.Collection.get(description="my collection")
>>> mapped = collection.mapped(obs_keys=["cell_type", "batch"])
>>> dl = DataLoader(mapped, batch_size=128, shuffle=True)
>>> # also works for query sets of artifacts, '...' represents some filtering condition
>>> # additional filtering on artifacts of the collection
>>> mapped = collection.artifacts.all().filter(...).order_by("-created_at").mapped()
>>> # or directly from a query set of artifacts
>>> mapped = ln.Artifact.filter(..., otype="AnnData").order_by("-created_at").mapped()
class lamindb.models.QueryManager(*args, **kwargs)

Manage queries through fields.

Examples

Populate the .parents ManyToMany relationship (a QueryManager):

ln.Record.from_values(["Label1", "Label2", "Label3"], field="name")).save()
labels = ln.Record.filter(name__icontains="label")
label1 = ln.Record.get(name="Label1")
label1.parents.set(labels)

Convert all linked parents to a DataFrame:

label1.parents.to_dataframe()
auto_created = False
creation_counter = 43
property db
use_in_migrations = False

If set to True the manager will be serialized into migrations and will thus be available in e.g. RunPython operations.

classmethod from_queryset(queryset_class, class_name=None)
track_run_input_manager()
to_list(field=None)

Populate a list.

to_dataframe(**kwargs)

Convert to DataFrame.

For **kwargs, see lamindb.models.QuerySet.to_dataframe().

all()

Return QuerySet of all.

For **kwargs, see lamindb.models.QuerySet.to_dataframe().

search(string, **kwargs)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field – The field or fields to search. Search all string fields by default.

  • limit – Maximum amount of top results to return.

  • case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
lookup(field=None, **kwargs)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
get_queryset()
aaggregate(*args, **kwargs)
abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)
abulk_update(objs, fields, batch_size=None)
acontains(obj)
acount()
acreate(**kwargs)
aearliest(*fields)
aexists()
aexplain(*, format=None, **options)
afirst()
aget(*args, **kwargs)
aget_or_create(defaults=None, **kwargs)
aggregate(*args, **kwargs)

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

ain_bulk(id_list=None, *, field_name='pk')
aiterator(chunk_size=2000)

An asynchronous iterator over the results from applying this QuerySet to the database.

alast()
alatest(*fields)
alias(*args, **kwargs)

Return a query set with added aliases for extra data or aggregations.

annotate(*args, **kwargs)

Return a query set in which the returned objects have been annotated with extra data or aggregations.

aupdate(**kwargs)
aupdate_or_create(defaults=None, create_defaults=None, **kwargs)
bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)

Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

bulk_update(objs, fields, batch_size=None)

Update the given fields in each of the given objects in the database.

complex_filter(filter_obj)

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

contains(obj)

Return True if the QuerySet contains the provided obj, False otherwise.

count()

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

create(**kwargs)

Create a new object with the given kwargs, saving it to the database and returning the created object.

dates(field_name, kind, order='ASC')

Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)

Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

defer(*fields)

Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case remove all deferrals.

difference(*other_qs)
distinct(*field_names)

Return a new QuerySet instance that will select only distinct results.

earliest(*fields)
exclude(*args, **kwargs)

Return a new QuerySet instance with NOT (args) ANDed to the existing set.

exists()

Return True if the QuerySet would have any results, False otherwise.

explain(*, format=None, **options)

Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)

Add extra SQL fragments to the query.

filter(*args, **kwargs)

Return a new QuerySet instance with the args ANDed to the existing set.

first()

Return the first object of a query or None if no match is found.

get(*args, **kwargs)

Perform the query and return a single object matching the given keyword arguments.

get_or_create(defaults=None, **kwargs)

Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

in_bulk(id_list=None, *, field_name='pk')

Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

intersection(*other_qs)
iterator(chunk_size=None)

An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

last()

Return the last object of a query or None if no match is found.

latest(*fields)

Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

none()

Return an empty QuerySet.

only(*fields)

Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

order_by(*field_names)

Return a new QuerySet instance with the ordering changed.

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

raw(raw_query, params=(), translations=None, using=None)
reverse()

Reverse the ordering of the QuerySet.

select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)

Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

union(*other_qs, all=False)
update(**kwargs)

Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

update_or_create(defaults=None, create_defaults=None, **kwargs)

Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

using(alias)

Select which database this QuerySet should execute against.

values(*fields, **expressions)
values_list(*fields, flat=False, named=False)
deconstruct()

Return a 5-tuple of the form (as_manager (True), manager_class, queryset_class, args, kwargs).

Raise a ValueError if the manager is dynamically generated.

check(**kwargs)
contribute_to_class(cls, name)
db_manager(using=None, hints=None)

Storage of feature values

class lamindb.models.FeatureValue(*args, **kwargs)

Non-categorical features values.

Categorical feature values are stored in their respective registries: ULabel, CellType, etc.

Unlike for ULabel, in FeatureValue, values are grouped by features and not by an ontological hierarchy.

Simple fields

value: Any

The JSON-like value.

hash: str

Value hash.

is_locked: bool

Whether the record is locked for edits.

created_at: datetime

Time of creation of record.

Relational fields

branch: Branch

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space

The space in which the record lives.

created_by: User

Creator of record.

run: Run | None

Run that created record.

feature: Feature | None

The dimension metadata.

runs: Run

Runs annotated with this feature value.

artifacts: Artifact

Artifacts annotated with this feature value.

Class methods

classmethod get_or_create(feature, value)
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")
classmethod to_dataframe(include=None, features=False, limit=100)

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod connect(instance)

Query a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

Methods

restore()

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:

None

delete(permanent=None, **kwargs)

Delete record.

If record is HasType with is_type = True, deletes all descendant records, too.

Parameters:

permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). If None, performs soft delete if the record is not already in the trash.

Return type:

None

Examples

For any SQLRecord object record, call:

>>> record.delete()
save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

TypeVar(T, bound= SQLRecord)

refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)

Utility classes

class lamindb.models.LazyArtifact(suffix, overwrite_versions, **kwargs)

Lazy artifact for streaming to auto-generated internal paths.

This is needed when it is desirable to stream to a lamindb auto-generated internal path and register the path as an artifact (see Artifact).

This object creates a real artifact on .save() with the provided arguments.

Parameters:
  • suffix (str) – The suffix for the auto-generated internal path

  • overwrite_versions (bool) – Whether to overwrite versions.

  • **kwargs – Keyword arguments for the artifact to be created.

Examples

Create a lazy artifact, write to the path and save to get a real artifact:

lazy = ln.Artifact.from_lazy(suffix=".zarr", overwrite_versions=True, key="mydata.zarr")
zarr.open(lazy.path, mode="w")["test"] = np.array(["test"]) # stream to the path
artifact = lazy.save()
property path: UPath
save(upload=None, **kwargs)
Return type:

Artifact

class lamindb.models.SQLRecordList(records)

Is ordered, can’t be queried, but has .to_dataframe().

to_dataframe()
Return type:

DataFrame

to_list(field)
Return type:

list[str]

one()

Exactly one result. Throws error if there are more or none.

Return type:

TypeVar(T)

save()

Save all records to the database.

Return type:

SQLRecordList[TypeVar(T)]

append(item)
insert(i, item)
pop(i=-1)
remove(item)
clear()
copy()
count(item)
index(item, *args)
reverse()
sort(*args, **kwds)
extend(other)
class lamindb.models.InspectResult(validated_df, validated, nonvalidated, frac_validated, n_empty, n_unique)

Result of inspect.

An InspectResult object of calls such as inspect().

property df: DataFrame

A DataFrame indexed by values with a boolean __validated__ column.

property frac_validated: float

Fraction of items that were validated.

property n_empty: int

Number of empty items.

property n_unique: int

Number of unique items.

property non_validated: list[str]

List of unsuccessfully validate() items.

This list can be used to remove any non-validated values such as genes that do not map against the specified source.

property synonyms_mapper: dict

Synonyms mapper dictionary.

Such a dictionary maps the actual values to their synonyms which can be used to rename values accordingly.

Examples

>>> markers = pd.DataFrame(index=["KI67","CCR7"])
>>> synonyms_mapper = bt.CellMarker.standardize(markers.index, return_mapper=True)

{‘KI67’: ‘Ki67’, ‘CCR7’: ‘Ccr7’}

property validated: list[str]

List of successfully validate() validated items.

class lamindb.models.ValidateFields
class lamindb.models.SchemaOptionals(schema)

Manage and access optional features in a schema.

get_uids()

Get the uids of the optional features.

Does not need an additional query to the database, while get() does.

Return type:

list[str]

get()

Get the optional features.

Return type:

QuerySet

set(features)

Set the optional features (overwrites whichever schemas are currently optional).

Return type:

None

remove(features)

Make one or multiple features required by removing them from the set of optional features.

Return type:

None

add(features)

Make one or multiple features optional by adding them to the set of optional features.

Return type:

None