🦑 Tru¶

trulens_eval.tru.Tru ¶

Bases: SingletonPerName

Tru is the main class that provides an entry points to trulens-eval.

Tru lets you:

Log app prompts and outputs
Log app Metadata
Run and log feedback functions
Run streamlit dashboard to view experiment results

By default, all data is logged to the current working directory to "default.sqlite". Data can be logged to a SQLAlchemy-compatible url referred to by database_url.

Supported App Types

TruChain: Langchain apps.

TruLlama: Llama Index apps.

TruRails: NeMo Guardrails apps.

TruBasicApp: Basic apps defined solely using a function from str to str.

TruCustomApp: Custom apps containing custom structures and methods. Requres annotation of methods to instrument.

TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.

PARAMETER	DESCRIPTION
`database`	Database to use. If not provided, an SQLAlchemyDB database will be initialized based on the other arguments. TYPE: `Optional[DB]` DEFAULT: `None`
`database_url`	Database URL. Defaults to a local SQLite database file at `"default.sqlite"` See this article on SQLAlchemy database URLs. (defaults to `sqlite://DEFAULT_DATABASE_FILE`). TYPE: `Optional[str]` DEFAULT: `None`
`database_file`	Path to a local SQLite database file. Deprecated: Use `database_url` instead. TYPE: `Optional[str]` DEFAULT: `None`
`database_prefix`	Prefix for table names for trulens_eval to use. May be useful in some databases hosting other apps. TYPE: `Optional[str]` DEFAULT: `None`
`database_redact_keys`	Whether to redact secret keys in data to be written to database (defaults to `False`) TYPE: `Optional[bool]` DEFAULT: `None`
`database_args`	Additional arguments to pass to the database constructor. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

Attributes¶

RETRY_RUNNING_SECONDS `class-attribute` `instance-attribute` ¶

RETRY_RUNNING_SECONDS: float = 60.0

How long to wait (in seconds) before restarting a feedback function that has already started

A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.

RETRY_FAILED_SECONDS `class-attribute` `instance-attribute` ¶

RETRY_FAILED_SECONDS: float = 5 * 60.0

How long to wait (in seconds) to retry a failed feedback function run.

DEFERRED_NUM_RUNS `class-attribute` `instance-attribute` ¶

DEFERRED_NUM_RUNS: int = 32

Number of futures to wait for when evaluating deferred feedback functions.

db `instance-attribute` ¶

db: Union[DB, OpaqueWrapper[DB]]

Database supporting this workspace.

Will be an opqaue wrapper if it is not ready to use due to migration requirements.

Functions¶

Chain ¶

Chain(chain: Chain, **kwargs: dict) -> TruChain

Create a langchain app recorder with database managed by self.

PARAMETER	DESCRIPTION
`chain`	The langchain chain defining the app to be instrumented. TYPE: `Chain`
`**kwargs`	Additional keyword arguments to pass to the TruChain. TYPE: `dict` DEFAULT: `{}`

Llama ¶

Llama(
    engine: Union[BaseQueryEngine, BaseChatEngine],
    **kwargs: dict
) -> TruLlama

Create a llama-index app recorder with database managed by self.

PARAMETER	DESCRIPTION
`engine`	The llama-index engine defining the app to be instrumented. TYPE: `Union[BaseQueryEngine, BaseChatEngine]`
`**kwargs`	Additional keyword arguments to pass to TruLlama. TYPE: `dict` DEFAULT: `{}`

Basic ¶

Basic(
    text_to_text: Callable[[str], str], **kwargs: dict
) -> TruBasicApp

Create a basic app recorder with database managed by self.

PARAMETER	DESCRIPTION
`text_to_text`	A function that takes a string and returns a string. The wrapped app's functionality is expected to be entirely in this function. TYPE: `Callable[[str], str]`
`**kwargs`	Additional keyword arguments to pass to TruBasicApp. TYPE: `dict` DEFAULT: `{}`

Custom ¶

Custom(app: Any, **kwargs: dict) -> TruCustomApp

Create a custom app recorder with database managed by self.

PARAMETER	DESCRIPTION
`app`	The app to be instrumented. This can be any python object. TYPE: `Any`
`**kwargs`	Additional keyword arguments to pass to TruCustomApp. TYPE: `dict` DEFAULT: `{}`

Virtual ¶

Virtual(
    app: Union[VirtualApp, Dict], **kwargs: dict
) -> TruVirtual

Create a virtual app recorder with database managed by self.

PARAMETER	DESCRIPTION
`app`	The app to be instrumented. If not a VirtualApp, it is passed to VirtualApp constructor to create it. TYPE: `Union[VirtualApp, Dict]`
`**kwargs`	Additional keyword arguments to pass to TruVirtual. TYPE: `dict` DEFAULT: `{}`

reset_database ¶

reset_database()

Reset the database. Clears all tables.

See DB.reset_database.

migrate_database ¶

migrate_database(**kwargs: Dict[str, Any])

Migrates the database.

This should be run whenever there are breaking changes in a database created with an older version of trulens_eval.

PARAMETER	DESCRIPTION
`**kwargs`	Keyword arguments to pass to migrate_database of the current database. TYPE: `Dict[str, Any]` DEFAULT: `{}`

See DB.migrate_database.

add_record ¶

add_record(
    record: Optional[Record] = None, **kwargs: dict
) -> RecordID

Add a record to the database.

PARAMETER	DESCRIPTION
`record`	The record to add. TYPE: `Optional[Record]` DEFAULT: `None`
`**kwargs`	Record fields to add to the given record or a new record if no `record` provided. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`RecordID`	Unique record identifier str .

run_feedback_functions ¶

run_feedback_functions(
    record: Record,
    feedback_functions: Sequence[Feedback],
    app: Optional[AppDefinition] = None,
    wait: bool = True,
) -> Union[
    Iterable[FeedbackResult],
    Iterable[Future[FeedbackResult]],
]

Run a collection of feedback functions and report their result.

PARAMETER	DESCRIPTION
`record`	The record on which to evaluate the feedback functions. TYPE: `Record`
`app`	The app that produced the given record. If not provided, it is looked up from the given database `db`. TYPE: `Optional[AppDefinition]` DEFAULT: `None`
`feedback_functions`	A collection of feedback functions to evaluate. TYPE: `Sequence[Feedback]`
`wait`	If set (default), will wait for results before returning. TYPE: `bool` DEFAULT: `True`

YIELDS	DESCRIPTION
`Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]`	One result for each element of `feedback_functions` of FeedbackResult if `wait` is enabled (default) or Future of FeedbackResult if `wait` is disabled.

add_app ¶

add_app(app: AppDefinition) -> AppID

Add an app to the database and return its unique id.

PARAMETER	DESCRIPTION
`app`	The app to add to the database. TYPE: `AppDefinition`

RETURNS	DESCRIPTION
`AppID`	A unique app identifier str.

delete_app ¶

delete_app(app_id: AppID) -> None

Deletes an app from the database based on its app_id.

PARAMETER	DESCRIPTION
`app_id`	The unique identifier of the app to be deleted. TYPE: `AppID`

add_feedback ¶

add_feedback(
    feedback_result_or_future: Optional[
        Union[FeedbackResult, Future[FeedbackResult]]
    ] = None,
    **kwargs: dict
) -> FeedbackResultID

Add a single feedback result or future to the database and return its unique id.

PARAMETER	DESCRIPTION
`feedback_result_or_future`	If a Future is given, call will wait for the result before adding it to the database. If `kwargs` are given and a FeedbackResult is also given, the `kwargs` will be used to update the FeedbackResult otherwise a new one will be created with `kwargs` as arguments to its constructor. TYPE: `Optional[Union[FeedbackResult, Future[FeedbackResult]]]` DEFAULT: `None`
`**kwargs`	Fields to add to the given feedback result or to create a new FeedbackResult with. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`FeedbackResultID`	A unique result identifier str.

add_feedbacks ¶

add_feedbacks(
    feedback_results: Iterable[
        Union[FeedbackResult, Future[FeedbackResult]]
    ]
) -> List[FeedbackResultID]

Add multiple feedback results to the database and return their unique ids.

PARAMETER	DESCRIPTION
`feedback_results`	An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited. TYPE: `Iterable[Union[FeedbackResult, Future[FeedbackResult]]]`

RETURNS	DESCRIPTION
`List[FeedbackResultID]`	List of unique result identifiers str in the same order as input `feedback_results`.

get_app ¶

get_app(app_id: AppID) -> JSONized[AppDefinition]

Look up an app from the database.

This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:

Example

from trulens_eval.schema import app
app_json = tru.get_app(app_id="Custom Application v1")
app = app.AppDefinition.model_validate(app_json)

Warning

Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.

PARAMETER	DESCRIPTION
`app_id`	The unique identifier str of the app to look up. TYPE: `AppID`

RETURNS	DESCRIPTION
`JSONized[AppDefinition]`	JSON-ized version of the app.

get_apps ¶

get_apps() -> List[JSONized[AppDefinition]]

Look up all apps from the database.

RETURNS	DESCRIPTION
`List[JSONized[AppDefinition]]`	A list of JSON-ized version of all apps in the database.

Warning

Same Deserialization caveats as get_app.

get_records_and_feedback ¶

get_records_and_feedback(
    app_ids: Optional[List[AppID]] = None,
) -> Tuple[DataFrame, List[str]]

Get records, their feeback results, and feedback names.

PARAMETER	DESCRIPTION
`app_ids`	A list of app ids to filter records by. If empty or not given, all apps' records will be returned. TYPE: `Optional[List[AppID]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	Dataframe of records with their feedback results.
`List[str]`	List of feedback names that are columns in the dataframe.

get_leaderboard ¶

get_leaderboard(
    app_ids: Optional[List[AppID]] = None,
) -> DataFrame

Get a leaderboard for the given apps.

PARAMETER	DESCRIPTION
`app_ids`	A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard. TYPE: `Optional[List[AppID]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	Dataframe of apps with their feedback results aggregated.

start_evaluator ¶

start_evaluator(
    restart: bool = False, fork: bool = False
) -> Union[Process, Thread]

Start a deferred feedback function evaluation thread or process.

PARAMETER	DESCRIPTION
`restart`	If set, will stop the existing evaluator before starting a new one. TYPE: `bool` DEFAULT: `False`
`fork`	If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`Union[Process, Thread]`	The started process or thread that is executing the deferred feedback evaluator.

Relevant constants

RETRY_RUNNING_SECONDS

RETRY_FAILED_SECONDS

DEFERRED_NUM_RUNS

MAX_THREADS

stop_evaluator ¶

stop_evaluator()

Stop the deferred feedback evaluation thread.

run_dashboard ¶

run_dashboard(
    port: Optional[int] = 8501,
    address: Optional[str] = None,
    force: bool = False,
    _dev: Optional[Path] = None,
) -> Process

Run a streamlit dashboard to view logged results and apps.

PARAMETER	DESCRIPTION
`port`	Port number to pass to streamlit through `server.port`. TYPE: `Optional[int]` DEFAULT: `8501`
`address`	Address to pass to streamlit through `server.address`. Address cannot be set if running from a colab notebook. TYPE: `Optional[str]` DEFAULT: `None`
`force`	Stop existing dashboard(s) first. Defaults to `False`. TYPE: `bool` DEFAULT: `False`
`_dev`	If given, run dashboard with the given `PYTHONPATH`. This can be used to run the dashboard from outside of its pip package installation folder. TYPE: `Optional[Path]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Process`	The Process executing the streamlit dashboard.

RAISES	DESCRIPTION
`RuntimeError`	Dashboard is already running. Can be avoided if `force` is set.

stop_dashboard ¶

stop_dashboard(force: bool = False) -> None

Stop existing dashboard(s) if running.

PARAMETER	DESCRIPTION
`force`	Also try to find any other dashboard processes not started in this notebook and shut them down too. This option is not supported under windows. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`RuntimeError`	Dashboard is not running in the current process. Can be avoided with `force`.