🦑 Tru¶
trulens_eval.tru.Tru
¶
Bases: SingletonPerName
Tru is the main class that provides an entry points to trulens-eval.
Tru lets you:
- Log app prompts and outputs
- Log app Metadata
- Run and log feedback functions
- Run streamlit dashboard to view experiment results
By default, all data is logged to the current working directory to
"default.sqlite"
. Data can be logged to a SQLAlchemy-compatible url
referred to by database_url
.
Supported App Types
TruChain: Langchain apps.
TruLlama: Llama Index apps.
TruRails: NeMo Guardrails apps.
TruBasicApp:
Basic apps defined solely using a function from str
to str
.
TruCustomApp: Custom apps containing custom structures and methods. Requres annotation of methods to instrument.
TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.
PARAMETER | DESCRIPTION |
---|---|
database |
Database to use. If not provided, an SQLAlchemyDB database will be initialized based on the other arguments. |
database_url |
Database URL. Defaults to a local SQLite
database file at |
database_file |
Path to a local SQLite database file. Deprecated: Use |
database_prefix |
Prefix for table names for trulens_eval to use. May be useful in some databases hosting other apps. |
database_redact_keys |
Whether to redact secret keys in data to be
written to database (defaults to |
database_args |
Additional arguments to pass to the database constructor. |
Attributes¶
RETRY_RUNNING_SECONDS
class-attribute
instance-attribute
¶
RETRY_RUNNING_SECONDS: float = 60.0
How long to wait (in seconds) before restarting a feedback function that has already started
A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.
RETRY_FAILED_SECONDS
class-attribute
instance-attribute
¶
RETRY_FAILED_SECONDS: float = 5 * 60.0
How long to wait (in seconds) to retry a failed feedback function run.
DEFERRED_NUM_RUNS
class-attribute
instance-attribute
¶
DEFERRED_NUM_RUNS: int = 32
Number of futures to wait for when evaluating deferred feedback functions.
db
instance-attribute
¶
db: Union[DB, OpaqueWrapper[DB]]
Database supporting this workspace.
Will be an opqaue wrapper if it is not ready to use due to migration requirements.
Functions¶
Chain
¶
Llama
¶
Basic
¶
Basic(
text_to_text: Callable[[str], str], **kwargs: dict
) -> TruBasicApp
Create a basic app recorder with database managed by self.
PARAMETER | DESCRIPTION |
---|---|
text_to_text |
A function that takes a string and returns a string. The wrapped app's functionality is expected to be entirely in this function. |
**kwargs |
Additional keyword arguments to pass to TruBasicApp.
TYPE:
|
Custom
¶
Custom(app: Any, **kwargs: dict) -> TruCustomApp
Create a custom app recorder with database managed by self.
PARAMETER | DESCRIPTION |
---|---|
app |
The app to be instrumented. This can be any python object.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to TruCustomApp.
TYPE:
|
Virtual
¶
Virtual(
app: Union[VirtualApp, Dict], **kwargs: dict
) -> TruVirtual
Create a virtual app recorder with database managed by self.
PARAMETER | DESCRIPTION |
---|---|
app |
The app to be instrumented. If not a VirtualApp, it is passed to VirtualApp constructor to create it.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to TruVirtual.
TYPE:
|
migrate_database
¶
Migrates the database.
This should be run whenever there are breaking changes in a database created with an older version of trulens_eval.
PARAMETER | DESCRIPTION |
---|---|
**kwargs |
Keyword arguments to pass to migrate_database of the current database. |
See DB.migrate_database.
add_record
¶
run_feedback_functions
¶
run_feedback_functions(
record: Record,
feedback_functions: Sequence[Feedback],
app: Optional[AppDefinition] = None,
wait: bool = True,
) -> Union[
Iterable[FeedbackResult],
Iterable[Future[FeedbackResult]],
]
Run a collection of feedback functions and report their result.
PARAMETER | DESCRIPTION |
---|---|
record |
The record on which to evaluate the feedback functions.
TYPE:
|
app |
The app that produced the given record.
If not provided, it is looked up from the given database
TYPE:
|
feedback_functions |
A collection of feedback functions to evaluate. |
wait |
If set (default), will wait for results before returning.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]
|
One result for each element of |
add_app
¶
add_app(app: AppDefinition) -> AppID
Add an app to the database and return its unique id.
PARAMETER | DESCRIPTION |
---|---|
app |
The app to add to the database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AppID
|
A unique app identifier str. |
delete_app
¶
delete_app(app_id: AppID) -> None
Deletes an app from the database based on its app_id.
PARAMETER | DESCRIPTION |
---|---|
app_id |
The unique identifier of the app to be deleted.
TYPE:
|
add_feedback
¶
add_feedback(
feedback_result_or_future: Optional[
Union[FeedbackResult, Future[FeedbackResult]]
] = None,
**kwargs: dict
) -> FeedbackResultID
Add a single feedback result or future to the database and return its unique id.
PARAMETER | DESCRIPTION |
---|---|
feedback_result_or_future |
If a Future
is given, call will wait for the result before adding it to the
database. If
TYPE:
|
**kwargs |
Fields to add to the given feedback result or to create a new FeedbackResult with.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
FeedbackResultID
|
A unique result identifier str. |
add_feedbacks
¶
add_feedbacks(
feedback_results: Iterable[
Union[FeedbackResult, Future[FeedbackResult]]
]
) -> List[FeedbackResultID]
Add multiple feedback results to the database and return their unique ids.
PARAMETER | DESCRIPTION |
---|---|
feedback_results |
An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[FeedbackResultID]
|
List of unique result identifiers str in the same order as input
|
get_app
¶
get_app(app_id: AppID) -> JSONized[AppDefinition]
Look up an app from the database.
This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:
Example
from trulens_eval.schema import app
app_json = tru.get_app(app_id="Custom Application v1")
app = app.AppDefinition.model_validate(app_json)
Warning
Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.
PARAMETER | DESCRIPTION |
---|---|
app_id |
The unique identifier str of the app to look up.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
JSONized[AppDefinition]
|
JSON-ized version of the app. |
get_apps
¶
get_apps() -> List[JSONized[AppDefinition]]
Look up all apps from the database.
RETURNS | DESCRIPTION |
---|---|
List[JSONized[AppDefinition]]
|
A list of JSON-ized version of all apps in the database. |
Warning
Same Deserialization caveats as get_app.
get_records_and_feedback
¶
Get records, their feeback results, and feedback names.
PARAMETER | DESCRIPTION |
---|---|
app_ids |
A list of app ids to filter records by. If empty or not given, all apps' records will be returned. |
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
Dataframe of records with their feedback results. |
List[str]
|
List of feedback names that are columns in the dataframe. |
get_leaderboard
¶
Get a leaderboard for the given apps.
PARAMETER | DESCRIPTION |
---|---|
app_ids |
A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard. |
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
Dataframe of apps with their feedback results aggregated. |
start_evaluator
¶
Start a deferred feedback function evaluation thread or process.
PARAMETER | DESCRIPTION |
---|---|
restart |
If set, will stop the existing evaluator before starting a new one.
TYPE:
|
fork |
If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[Process, Thread]
|
The started process or thread that is executing the deferred feedback evaluator. |
run_dashboard
¶
run_dashboard(
port: Optional[int] = 8501,
address: Optional[str] = None,
force: bool = False,
_dev: Optional[Path] = None,
) -> Process
Run a streamlit dashboard to view logged results and apps.
PARAMETER | DESCRIPTION |
---|---|
port |
Port number to pass to streamlit through |
address |
Address to pass to streamlit through Address cannot be set if running from a colab notebook. |
force |
Stop existing dashboard(s) first. Defaults to
TYPE:
|
_dev |
If given, run dashboard with the given
|
RETURNS | DESCRIPTION |
---|---|
Process
|
The Process executing the streamlit dashboard. |
RAISES | DESCRIPTION |
---|---|
RuntimeError
|
Dashboard is already running. Can be avoided if |
stop_dashboard
¶
stop_dashboard(force: bool = False) -> None
Stop existing dashboard(s) if running.
PARAMETER | DESCRIPTION |
---|---|
force |
Also try to find any other dashboard processes not started in this notebook and shut them down too. This option is not supported under windows.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
RuntimeError
|
Dashboard is not running in the current process. Can be avoided with |