lamindb.models ¶

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Guide: Query & search registries
Django documentation: Queries

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")

classmethod to_dataframe(include=None, features=False, limit=100)¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

classmethod connect(instance)¶

Query a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

save(*args, **kwargs)¶

Save.

Always saves to the default database.

Return type:: TypeVar(T, bound= SQLRecord)

delete(permanent=None)¶

Delete.

Parameters:: permanent (bool | None, default: None) – For consistency, False raises an error, as soft delete is impossible.
Return type:: None

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶

class lamindb.models.SQLRecord(*args, **kwargs)¶

Metadata record.

Every SQLRecord is a data model that comes with a registry in form of a SQL table in your database.

Sub-classing SQLRecord creates a new registry while instantiating a SQLRecord creates a new record.

Example:

from lamindb import SQLRecord, fields

# sub-classing `SQLRecord` creates a new registry
class Experiment(SQLRecord):
    name: str = fields.CharField()

# instantiating `Experiment` creates a record `experiment`
experiment = Experiment(name="my experiment")

# you can save the record to the database
experiment.save()

# `Experiment` refers to the registry, which you can query
df = Experiment.filter(name__startswith="my ").to_dataframe()

SQLRecord’s metaclass is Registry.

SQLRecord inherits from Django’s Model class. Why does LaminDB call it SQLRecord and not Model? The term SQLRecord can’t lead to confusion with statistical, machine learning or biological models.

is_locked: bool¶: Whether the record is locked for edits.

branch: Branch¶

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space¶: The space in which the record lives.

classmethod filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Guide: Query & search registries
Django documentation: Queries

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")

classmethod to_dataframe(include=None, features=False, limit=100)¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

classmethod connect(instance)¶

Query a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

restore()¶

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:: None

delete(permanent=None, **kwargs)¶

Delete record.

If record is HasType with is_type = True, deletes all descendant records, too.

Parameters:: permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). If None, performs soft delete if the record is not already in the trash.
Return type:: None

Examples

For any SQLRecord object record, call:

>>> record.delete()

save(*args, **kwargs)¶

Save.

Always saves to the default database.

Return type:: TypeVar(T, bound= SQLRecord)

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶

class lamindb.models.Registry(name, bases, attrs, **kwargs)¶

Metaclass for SQLRecord.

Each Registry object is a SQLRecord class and corresponds to a table in the metadata SQL database.

You work with Registry objects whenever you use class methods of SQLRecord.

You call any subclass of SQLRecord a “registry” and their objects “records”. A SQLRecord object corresponds to a row in the SQL table.

If you want to create a new registry, you sub-class SQLRecord.

Example:

from lamindb import SQLRecord, fields

# sub-classing `SQLRecord` creates a new registry
class Experiment(SQLRecord):
    name: str = fields.CharField()

# instantiating `Experiment` creates a record `experiment`
experiment = Experiment(name="my experiment")

# you can save the record to the database
experiment.save()

# `Experiment` refers to the registry, which you can query
df = Experiment.filter(name__startswith="my ").to_dataframe()

Note: Registry inherits from Django’s ModelBase.

lookup(field=None, return_field=None, keep='first')¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.
keep (Literal['first', 'last', False], default: 'first') – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()

get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

TypeVar(T, bound= SQLRecord)

See also

Guide: Query & search registries
Django documentation: Queries

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")

to_dataframe(*, include=None, features=None, limit=100, order_by='-id')¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (str | list[str] | None, default: None) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int | None, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by (str | None, default: '-id') – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

connect(instance)¶

Query a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

Mixins for registries¶

class lamindb.models.IsVersioned¶

class lamindb.models.IsVersioned(*db_args)

Base class for versioned models.

Meta = <class 'lamindb.models._is_versioned.IsVersioned.Meta'>¶

property pk¶

property stem_uid: str¶

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid

property versions: QuerySet¶

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶

save(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

delete(using=None, keep_parents=False)¶

class lamindb.models.HasType¶

Mixin for registries that have a hierarchical type assigned.

Such registries have a .type foreign key pointing to themselves.

A type hence allows hierarchically grouping records under types.

For instance, using the example of ln.Record:

experiment_type = ln.Record(name="Experiment", is_type=True).save()
experiment1 = ln.Record(name="Experiment 1", type=experiment_type).save()
experiment2 = ln.Record(name="Experiment 2", type=experiment_type).save()

query_types()¶

Query types of a record recursively.

While .type retrieves the type, this method retrieves all super types of that type:

# Create type hierarchy
type1 = model_class(name="Type1", is_type=True).save()
type2 = model_class(name="Type2", is_type=True, type=type1).save()
type3 = model_class(name="Type3", is_type=True, type=type2).save()

# Create a record with type3
record = model_class(name=f"{model_name}3", type=type3).save()

# Query super types
super_types = record.query_types()
assert super_types[0] == type3
assert super_types[1] == type2
assert super_types[2] == type1

Return type:: SQLRecordList

class lamindb.models.HasParents¶

Base class for hierarchical registries (ontologies).

view_parents(field=None, with_children=False, distance=5)¶

View parents in an ontology.

Parameters:

field (str | DeferredAttribute | None, default: None) – Field to display on graph
with_children (bool, default: False) – Whether to also show children.
distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)

view_children(field=None, distance=5)¶

View children in an ontology.

Parameters:

field (str | DeferredAttribute | None, default: None) – Field to display on graph
distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)

query_parents()¶

Query parents in an ontology.

Return type:: QuerySet

query_children()¶

Query children in an ontology.

Return type:: QuerySet

class lamindb.models.CanCurate¶

Base class providing SQLRecord-based validation.

classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, from_source=True, strict_source=False)¶

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:

values (list[str] | Series | array) – Values that will be checked against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute (bool, default: False) – Whether to mute logging.
organism (str | SQLRecord | None, default: None) – An Organism name or record.
source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to inspect against.
strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

bionty.base.dev.InspectResult

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]

classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)¶

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:

values (list[str] | Series | array) – Values that will be validated against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute (bool, default: False) – Whether to mute logging.
organism (str | SQLRecord | None, default: None) – An Organism name or record.
source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.
strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])

classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)¶

Bulk create validated records by parsing values for an identifier such as a name or an id).

Parameters:

values (list[str] | Series | array) – A list of values for an identifier, e.g. ["name1", "name2"].
field (str | DeferredAttribute | None, default: None) – A SQLRecord field to look up, e.g., bt.CellMarker.name.
create (bool, default: False) – Whether to create records if they don’t exist.
organism (SQLRecord | str | None, default: None) – A bionty.Organism name or record.
source (SQLRecord | None, default: None) – A bionty.Source record to validate against to create records for.
mute (bool, default: False) – Whether to mute logging.

Return type:

SQLRecordList

Returns:

A list of validated records. For bionty registries. Also returns knowledge-coupled records.

Notes

For more info, see tutorial: Manage biological ontologies.

Example:

import bionty as bt

# Bulk create from non-validated values will log warnings & returns empty list
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"])
assert len(ulabels) == 0

# Bulk create records from validated values returns the corresponding existing records
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save()
assert len(ulabels) == 3

# Bulk create records from public reference
bt.CellType.from_values(["T cell", "B cell"]).save()

classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)¶

Maps input synonyms to standardized names.

Parameters:

values (Iterable) – Identifiers that will be standardized.
field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.
return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.
return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.
case_sensitive (bool, default: False) – Whether the mapping is case sensitive.
mute (bool, default: False) – Whether to mute logging.
source_aware (bool, default: True) – Whether to standardize from public source. Defaults to True for BioRecord registries.
keep (Literal['first', 'last', False], default: 'first') –
When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.
organism (str | SQLRecord | None, default: None) – An Organism name or record.
source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.
strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym(): Add synonyms.
remove_synonym(): Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

add_synonym(synonym, force=False, save=None)¶

Add synonyms to a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonyms to add to the record.
force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.
save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym(): Remove synonyms.

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# add a synonym
record.add_synonym("T cells")
record.synonyms
#> "T cells|T-cell|T-lymphocyte|T lymphocyte"

remove_synonym(synonym)¶

Remove synonyms from a record.

Parameters:: synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym(): Add synonyms

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# remove a synonym
record.remove_synonym("T-cell")
record.synonyms
#> "T lymphocyte|T-lymphocyte"

set_abbr(value)¶

Set value for abbr field and add to synonyms.

Parameters:: value (str) – A value for an abbreviation.

See also

add_synonym()

Example:

import bionty as bt

# save an experimental factor record
scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
assert scrna.abbr is None
assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing"

# set abbreviation
scrna.set_abbr("scRNA")
assert scrna.abbr == "scRNA"
# synonyms are updated
assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"

class lamindb.models.TracksRun¶

class lamindb.models.TracksRun(*db_args)

Base class tracking latest run, creating user, and created_at timestamp.

Meta = <class 'lamindb.models.run.TracksRun.Meta'>¶

created_by: User¶: Creator of record.

created_by_id¶

property pk¶

run: Run | None¶: Run that created record.

run_id¶

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶

save(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

delete(using=None, keep_parents=False)¶

class lamindb.models.TracksUpdates¶

class lamindb.models.TracksUpdates(*db_args)

Base class tracking previous runs and updated_at timestamp.

Meta = <class 'lamindb.models.run.TracksUpdates.Meta'>¶

property pk¶

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶

save(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶

Save the current instance. Override this in a subclass if you want to control the saving process.

The ‘force_insert’ and ‘force_update’ parameters can be used to insist that the “save” must be an SQL insert or update (or equivalent for non-SQL backends), respectively. Normally, they should not be set.

delete(using=None, keep_parents=False)¶

Query sets & managers¶

class lamindb.models.BasicQuerySet(model=None, query=None, using=None, hints=None)¶

Sets of records returned by queries.

See also

django QuerySet

Examples

Any filter statement produces a query set:

queryset = Registry.filter(name__startswith="keyword")

property db¶: Return the database used if this query is executed now.

property ordered¶: Return True if the QuerySet is ordered – i.e. has an order_by() clause or a default ordering on the model (or is empty).

property query¶

classmethod as_manager()¶

to_dataframe(*, include=None, features=None, limit=100, order_by='-id')¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (str | list[str] | None, default: None) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int | None, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by (str | None, default: '-id') – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

delete(*args, permanent=None, **kwargs)¶

Delete all records in the query set.

Parameters:: permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). Is only relevant for records that have the branch field. If None, uses soft delete for records that have the branch field, hard delete otherwise.

Note

Calling delete() twice on the same queryset does NOT permanently delete in bulk operations. Use permanent=True for actual deletion.

Examples

For any QuerySet object qs, call:

>>> qs.delete()

to_list(field=None)¶

Populate an (unordered) list with the results.

Note that the order in this list is only meaningful if you ordered the underlying query set with .order_by().

Return type:: list[SQLRecord] | list[str]

Examples

>>> queryset.to_list()  # list of records
>>> queryset.to_list("name")  # list of values

first()¶

If non-empty, the first result in the query set, otherwise None.

Return type:: SQLRecord | None

Examples

>>> queryset.first()

one()¶

Exactly one result. Raises error if there are more or none.

Return type:: SQLRecord

one_or_none()¶

At most one result. Returns it if there is one, otherwise returns None.

Return type:: SQLRecord | None

Examples

>>> ULabel.filter(name="benchmark").one_or_none()
>>> ULabel.filter(name="non existing label").one_or_none()

latest_version()¶

Filter every version family by latest version.

Return type:: QuerySet

search(string, **kwargs)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field – The field or fields to search. Search all string fields by default.
limit – Maximum amount of top results to return.
case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

lookup(field=None, **kwargs)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

validate(values, field=None, **kwargs)¶

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:

values (list[str] | Series | array) – Values that will be validated against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute – Whether to mute logging.
organism – An Organism name or record.
source – A bionty.Source record that specifies the version to validate against.
strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])

inspect(values, field=None, **kwargs)¶

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:

values (list[str] | Series | array) – Values that will be checked against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute – Whether to mute logging.
organism – An Organism name or record.
source – A bionty.Source record that specifies the version to inspect against.
strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]

standardize(values, field=None, **kwargs)¶

Maps input synonyms to standardized names.

Parameters:

values (Iterable) – Identifiers that will be standardized.
field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.
return_field – The field to return. Defaults to field.
return_mapper – If True, returns {input_value: standardized_name}.
case_sensitive – Whether the mapping is case sensitive.
mute – Whether to mute logging.
source_aware – Whether to standardize from public source. Defaults to True for BioRecord registries.
keep –
When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
synonyms_field – A field containing the concatenated synonyms.
organism – An Organism name or record.
source – A bionty.Source record that specifies the version to validate against.
strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym(): Add synonyms.
remove_synonym(): Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

iterator(chunk_size=None)¶: An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

async aiterator(chunk_size=2000)¶: An asynchronous iterator over the results from applying this QuerySet to the database.

aggregate(*args, **kwargs)¶

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

async aaggregate(*args, **kwargs)¶

count()¶

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

async acount()¶

get(*args, **kwargs)¶: Perform the query and return a single object matching the given keyword arguments.

async aget(*args, **kwargs)¶

create(**kwargs)¶: Create a new object with the given kwargs, saving it to the database and returning the created object.

async acreate(**kwargs)¶

bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)¶: Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

async abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)¶

bulk_update(objs, fields, batch_size=None)¶: Update the given fields in each of the given objects in the database.

async abulk_update(objs, fields, batch_size=None)¶

get_or_create(defaults=None, **kwargs)¶: Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

async aget_or_create(defaults=None, **kwargs)¶

update_or_create(defaults=None, create_defaults=None, **kwargs)¶: Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

async aupdate_or_create(defaults=None, create_defaults=None, **kwargs)¶

earliest(*fields)¶

async aearliest(*fields)¶

latest(*fields)¶: Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

async alatest(*fields)¶

async afirst()¶

last()¶: Return the last object of a query or None if no match is found.

async alast()¶

in_bulk(id_list=None, *, field_name='pk')¶: Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

async ain_bulk(id_list=None, *, field_name='pk')¶

update(**kwargs)¶: Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

async aupdate(**kwargs)¶

exists()¶: Return True if the QuerySet would have any results, False otherwise.

async aexists()¶

contains(obj)¶: Return True if the QuerySet contains the provided obj, False otherwise.

async acontains(obj)¶

explain(*, format=None, **options)¶: Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

async aexplain(*, format=None, **options)¶

raw(raw_query, params=(), translations=None, using=None)¶

values(*fields, **expressions)¶

values_list(*fields, flat=False, named=False)¶

dates(field_name, kind, order='ASC')¶: Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)¶: Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

none()¶: Return an empty QuerySet.

all()¶: Return a new QuerySet that is a copy of the current one. This allows a QuerySet to proxy for a model manager in some cases.

filter(*args, **kwargs)¶: Return a new QuerySet instance with the args ANDed to the existing set.

exclude(*args, **kwargs)¶: Return a new QuerySet instance with NOT (args) ANDed to the existing set.

complex_filter(filter_obj)¶

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

union(*other_qs, all=False)¶

intersection(*other_qs)¶

difference(*other_qs)¶

select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)¶: Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

select_related(*fields)¶

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

prefetch_related(*lookups)¶

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

annotate(*args, **kwargs)¶: Return a query set in which the returned objects have been annotated with extra data or aggregations.

alias(*args, **kwargs)¶: Return a query set with added aliases for extra data or aggregations.

order_by(*field_names)¶: Return a new QuerySet instance with the ordering changed.

distinct(*field_names)¶: Return a new QuerySet instance that will select only distinct results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)¶: Add extra SQL fragments to the query.

reverse()¶: Reverse the ordering of the QuerySet.

defer(*fields)¶: Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case remove all deferrals.

only(*fields)¶: Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

using(alias)¶: Select which database this QuerySet should execute against.

resolve_expression(*args, **kwargs)¶

class lamindb.models.QuerySet(model=None, query=None, using=None, hints=None)¶

Sets of records returned by queries.

Implements additional filtering capabilities.

See also

django QuerySet

Examples

>>> ULabel(name="my label").save()
>>> queryset = ULabel.filter(name="my label")
>>> queryset # an instance of QuerySet

property db¶: Return the database used if this query is executed now.

property ordered¶: Return True if the QuerySet is ordered – i.e. has an order_by() clause or a default ordering on the model (or is empty).

property query¶

classmethod as_manager()¶

get(idlike=None, **expressions)¶

Query a single record. Raises error if there are more or none.

Return type:: SQLRecord

filter(*queries, **expressions)¶

Query a set of records.

Return type:: QuerySet

to_dataframe(*, include=None, features=None, limit=100, order_by='-id')¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (str | list[str] | None, default: None) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int | None, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by (str | None, default: '-id') – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

delete(*args, permanent=None, **kwargs)¶

Delete all records in the query set.

Parameters:: permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). Is only relevant for records that have the branch field. If None, uses soft delete for records that have the branch field, hard delete otherwise.

Note

Calling delete() twice on the same queryset does NOT permanently delete in bulk operations. Use permanent=True for actual deletion.

Examples

For any QuerySet object qs, call:

>>> qs.delete()

to_list(field=None)¶

Populate an (unordered) list with the results.

Note that the order in this list is only meaningful if you ordered the underlying query set with .order_by().

Return type:: list[SQLRecord] | list[str]

Examples

>>> queryset.to_list()  # list of records
>>> queryset.to_list("name")  # list of values

first()¶

If non-empty, the first result in the query set, otherwise None.

Return type:: SQLRecord | None

Examples

>>> queryset.first()

one()¶

Exactly one result. Raises error if there are more or none.

Return type:: SQLRecord

one_or_none()¶

At most one result. Returns it if there is one, otherwise returns None.

Return type:: SQLRecord | None

Examples

>>> ULabel.filter(name="benchmark").one_or_none()
>>> ULabel.filter(name="non existing label").one_or_none()

latest_version()¶

Filter every version family by latest version.

Return type:: QuerySet

search(string, **kwargs)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field – The field or fields to search. Search all string fields by default.
limit – Maximum amount of top results to return.
case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

lookup(field=None, **kwargs)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

validate(values, field=None, **kwargs)¶

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:

values (list[str] | Series | array) – Values that will be validated against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute – Whether to mute logging.
organism – An Organism name or record.
source – A bionty.Source record that specifies the version to validate against.
strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])

inspect(values, field=None, **kwargs)¶

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:

values (list[str] | Series | array) – Values that will be checked against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute – Whether to mute logging.
organism – An Organism name or record.
source – A bionty.Source record that specifies the version to inspect against.
strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]

standardize(values, field=None, **kwargs)¶

Maps input synonyms to standardized names.

Parameters:

values (Iterable) – Identifiers that will be standardized.
field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.
return_field – The field to return. Defaults to field.
return_mapper – If True, returns {input_value: standardized_name}.
case_sensitive – Whether the mapping is case sensitive.
mute – Whether to mute logging.
source_aware – Whether to standardize from public source. Defaults to True for BioRecord registries.
keep –
When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
synonyms_field – A field containing the concatenated synonyms.
organism – An Organism name or record.
source – A bionty.Source record that specifies the version to validate against.
strict_source – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym(): Add synonyms.
remove_synonym(): Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

iterator(chunk_size=None)¶: An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

async aiterator(chunk_size=2000)¶: An asynchronous iterator over the results from applying this QuerySet to the database.

aggregate(*args, **kwargs)¶

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

async aaggregate(*args, **kwargs)¶

count()¶

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

async acount()¶

async aget(*args, **kwargs)¶

create(**kwargs)¶: Create a new object with the given kwargs, saving it to the database and returning the created object.

async acreate(**kwargs)¶

bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)¶: Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

async abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)¶

bulk_update(objs, fields, batch_size=None)¶: Update the given fields in each of the given objects in the database.

async abulk_update(objs, fields, batch_size=None)¶

get_or_create(defaults=None, **kwargs)¶: Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

async aget_or_create(defaults=None, **kwargs)¶

update_or_create(defaults=None, create_defaults=None, **kwargs)¶: Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

async aupdate_or_create(defaults=None, create_defaults=None, **kwargs)¶

earliest(*fields)¶

async aearliest(*fields)¶

latest(*fields)¶: Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

async alatest(*fields)¶

async afirst()¶

last()¶: Return the last object of a query or None if no match is found.

async alast()¶

in_bulk(id_list=None, *, field_name='pk')¶: Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

async ain_bulk(id_list=None, *, field_name='pk')¶

update(**kwargs)¶: Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

async aupdate(**kwargs)¶

exists()¶: Return True if the QuerySet would have any results, False otherwise.

async aexists()¶

contains(obj)¶: Return True if the QuerySet contains the provided obj, False otherwise.

async acontains(obj)¶

explain(*, format=None, **options)¶: Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

async aexplain(*, format=None, **options)¶

raw(raw_query, params=(), translations=None, using=None)¶

values(*fields, **expressions)¶

values_list(*fields, flat=False, named=False)¶

dates(field_name, kind, order='ASC')¶: Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)¶: Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

none()¶: Return an empty QuerySet.

all()¶: Return a new QuerySet that is a copy of the current one. This allows a QuerySet to proxy for a model manager in some cases.

exclude(*args, **kwargs)¶: Return a new QuerySet instance with NOT (args) ANDed to the existing set.

complex_filter(filter_obj)¶

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

union(*other_qs, all=False)¶

intersection(*other_qs)¶

difference(*other_qs)¶

select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)¶: Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

select_related(*fields)¶

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

prefetch_related(*lookups)¶

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

annotate(*args, **kwargs)¶: Return a query set in which the returned objects have been annotated with extra data or aggregations.

alias(*args, **kwargs)¶: Return a query set with added aliases for extra data or aggregations.

order_by(*field_names)¶: Return a new QuerySet instance with the ordering changed.

distinct(*field_names)¶: Return a new QuerySet instance that will select only distinct results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)¶: Add extra SQL fragments to the query.

reverse()¶: Reverse the ordering of the QuerySet.

defer(*fields)¶: Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case remove all deferrals.

only(*fields)¶: Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

using(alias)¶: Select which database this QuerySet should execute against.

resolve_expression(*args, **kwargs)¶

class lamindb.models.QueryDB(instance)¶

Convenient access to QuerySets for every entity in a LaminDB instance.

Parameters:: instance (str) – Instance identifier in format “account/instance” or full instance string.

Examples

Query records from a remote instance:

cellxgene = ln.QueryDB("laminlabs/cellxgene")
artifacts = cellxgene.artifacts.filter(suffix=".h5ad")
records = cellxgene.records.filter(name__startswith="cell")

class lamindb.models.ArtifactSet¶

Abstract class representing sets of artifacts returned by queries.

This class automatically extends BasicQuerySet and QuerySet when the base model is Artifact.

Examples

>>> artifacts = ln.Artifact.filter(otype="AnnData")
>>> artifacts # an instance of ArtifactQuerySet inheriting from ArtifactSet

load(join='outer', is_run_input=None, **kwargs)¶

Cache and load to memory.

Returns an in-memory concatenated DataFrame or AnnData object.

Return type:: DataFrame | AnnData

open(engine='pyarrow', is_run_input=None, **kwargs)¶

Open a dataset for streaming.

Works for pyarrow and polars compatible formats (.parquet, .csv, .ipc etc. files or directories with such files).

Parameters:

engine (Literal['pyarrow', 'polars'], default: 'pyarrow') – Which module to use for lazy loading of a dataframe from pyarrow or polars compatible formats.
is_run_input (bool | None, default: None) – Whether to track this artifact as run input.
**kwargs – Keyword arguments for pyarrow.dataset.dataset or polars.scan_* functions.

Return type:

Dataset | Iterator[LazyFrame]

Notes

For more info, see guide: Slice & stream arrays.

mapped(layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None, stream=False, is_run_input=None)¶

Return a map-style dataset.

Returns a pytorch map-style dataset by virtually concatenating AnnData arrays.

By default (stream=False) AnnData arrays are moved into a local cache first.

__getitem__ of the MappedCollection object takes a single integer index and returns a dictionary with the observation data sample for this index from the AnnData objects in the collection. The dictionary has keys for layers_keys (.X is in "X"), obs_keys, obsm_keys (under f"obsm_{key}") and also "_store_idx" for the index of the AnnData object containing this observation sample.

Note

For a guide, see Train a machine learning model on a collection.

This method currently only works for collections or query sets of AnnData artifacts.

Parameters:

layers_keys (str | list[str] | None, default: None) – Keys from the .layers slot. layers_keys=None or "X" in the list retrieves .X.
obs_keys (str | list[str] | None, default: None) – Keys from the .obs slots.
obsm_keys (str | list[str] | None, default: None) – Keys from the .obsm slots.
obs_filter (dict[str, str | list[str]] | None, default: None) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.
join (Literal['inner', 'outer'] | None, default: 'inner') – "inner" or "outer" virtual joins. If None is passed, does not join.
encode_labels (bool | list[str], default: True) – Encode labels into integers. Can be a list with elements from obs_keys.
unknown_label (str | dict[str, str] | None, default: None) – Encode this label to -1. Can be a dictionary with keys from obs_keys if encode_labels=True or from encode_labels if it is a list.
cache_categories (bool, default: True) – Enable caching categories of obs_keys for faster access.
parallel (bool, default: False) – Enable sampling with multiple processes.
dtype (str | None, default: None) – Convert numpy arrays from .X, .layers and .obsm
stream (bool, default: False) – Whether to stream data from the array backend.
is_run_input (bool | None, default: None) – Whether to track this collection as run input.

Return type:

MappedCollection

Examples

>>> import lamindb as ln
>>> from torch.utils.data import DataLoader
>>> ds = ln.Collection.get(description="my collection")
>>> mapped = collection.mapped(obs_keys=["cell_type", "batch"])
>>> dl = DataLoader(mapped, batch_size=128, shuffle=True)
>>> # also works for query sets of artifacts, '...' represents some filtering condition
>>> # additional filtering on artifacts of the collection
>>> mapped = collection.artifacts.all().filter(...).order_by("-created_at").mapped()
>>> # or directly from a query set of artifacts
>>> mapped = ln.Artifact.filter(..., otype="AnnData").order_by("-created_at").mapped()

class lamindb.models.QueryManager(*args, **kwargs)¶

Manage queries through fields.

Examples

Populate the .parents ManyToMany relationship (a QueryManager):

ln.Record.from_values(["Label1", "Label2", "Label3"], field="name")).save()
labels = ln.Record.filter(name__icontains="label")
label1 = ln.Record.get(name="Label1")
label1.parents.set(labels)

Convert all linked parents to a DataFrame:

label1.parents.to_dataframe()

auto_created = False¶

creation_counter = 43¶

property db¶

use_in_migrations = False¶: If set to True the manager will be serialized into migrations and will thus be available in e.g. RunPython operations.

classmethod from_queryset(queryset_class, class_name=None)¶

track_run_input_manager()¶

to_list(field=None)¶: Populate a list.

to_dataframe(**kwargs)¶

Convert to DataFrame.

For **kwargs, see lamindb.models.QuerySet.to_dataframe().

all()¶

Return QuerySet of all.

For **kwargs, see lamindb.models.QuerySet.to_dataframe().

search(string, **kwargs)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field – The field or fields to search. Search all string fields by default.
limit – Maximum amount of top results to return.
case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

lookup(field=None, **kwargs)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

get_queryset()¶

aaggregate(*args, **kwargs)¶

abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)¶

abulk_update(objs, fields, batch_size=None)¶

acontains(obj)¶

acount()¶

acreate(**kwargs)¶

aearliest(*fields)¶

aexists()¶

aexplain(*, format=None, **options)¶

afirst()¶

aget(*args, **kwargs)¶

aget_or_create(defaults=None, **kwargs)¶

aggregate(*args, **kwargs)¶

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

ain_bulk(id_list=None, *, field_name='pk')¶

aiterator(chunk_size=2000)¶: An asynchronous iterator over the results from applying this QuerySet to the database.

alast()¶

alatest(*fields)¶

alias(*args, **kwargs)¶: Return a query set with added aliases for extra data or aggregations.

annotate(*args, **kwargs)¶: Return a query set in which the returned objects have been annotated with extra data or aggregations.

aupdate(**kwargs)¶

aupdate_or_create(defaults=None, create_defaults=None, **kwargs)¶

bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)¶: Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

bulk_update(objs, fields, batch_size=None)¶: Update the given fields in each of the given objects in the database.

complex_filter(filter_obj)¶

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

contains(obj)¶: Return True if the QuerySet contains the provided obj, False otherwise.

count()¶

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

create(**kwargs)¶: Create a new object with the given kwargs, saving it to the database and returning the created object.

dates(field_name, kind, order='ASC')¶: Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)¶: Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

defer(*fields)¶: Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case remove all deferrals.

difference(*other_qs)¶

distinct(*field_names)¶: Return a new QuerySet instance that will select only distinct results.

earliest(*fields)¶

exclude(*args, **kwargs)¶: Return a new QuerySet instance with NOT (args) ANDed to the existing set.

exists()¶: Return True if the QuerySet would have any results, False otherwise.

explain(*, format=None, **options)¶: Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)¶: Add extra SQL fragments to the query.

filter(*args, **kwargs)¶: Return a new QuerySet instance with the args ANDed to the existing set.

first()¶: Return the first object of a query or None if no match is found.

get(*args, **kwargs)¶: Perform the query and return a single object matching the given keyword arguments.

get_or_create(defaults=None, **kwargs)¶: Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

in_bulk(id_list=None, *, field_name='pk')¶: Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

intersection(*other_qs)¶

iterator(chunk_size=None)¶: An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

last()¶: Return the last object of a query or None if no match is found.

latest(*fields)¶: Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

none()¶: Return an empty QuerySet.

only(*fields)¶: Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

order_by(*field_names)¶: Return a new QuerySet instance with the ordering changed.

prefetch_related(*lookups)¶

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

raw(raw_query, params=(), translations=None, using=None)¶

reverse()¶: Reverse the ordering of the QuerySet.

select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)¶: Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

select_related(*fields)¶

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

union(*other_qs, all=False)¶

update(**kwargs)¶: Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

update_or_create(defaults=None, create_defaults=None, **kwargs)¶: Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

using(alias)¶: Select which database this QuerySet should execute against.

values(*fields, **expressions)¶

values_list(*fields, flat=False, named=False)¶

deconstruct()¶

Return a 5-tuple of the form (as_manager (True), manager_class, queryset_class, args, kwargs).

Raise a ValueError if the manager is dynamically generated.

check(**kwargs)¶

contribute_to_class(cls, name)¶

db_manager(using=None, hints=None)¶

Storage of feature values¶

class lamindb.models.FeatureValue(*args, **kwargs)¶

Non-categorical features values.

Categorical feature values are stored in their respective registries: ULabel, CellType, etc.

Unlike for ULabel, in FeatureValue, values are grouped by features and not by an ontological hierarchy.

Simple fields¶

value: Any¶: The JSON-like value.

hash: str¶: Value hash.

is_locked: bool¶: Whether the record is locked for edits.

created_at: datetime¶: Time of creation of record.

Relational fields¶

branch: Branch¶

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space¶: The space in which the record lives.

created_by: User¶: Creator of record.

run: Run | None¶: Run that created record.

feature: Feature | None¶: The dimension metadata.

runs: Run¶: Runs annotated with this feature value.

artifacts: Artifact¶: Artifacts annotated with this feature value.

Class methods¶

classmethod get_or_create(feature, value)¶

classmethod filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Guide: Query & search registries
Django documentation: Queries

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")

classmethod to_dataframe(include=None, features=False, limit=100)¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also