LLM Provider¶

trulens_eval.feedback.provider.base.LLMProvider ¶

Bases: Provider

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

OpenAI and subclass AzureOpenAI.
Bedrock.
LiteLLM. LiteLLM provides an interface to a wide range of models.
Langchain.

Functions¶

generate_score ¶

generate_score(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    normalize: float = 10.0,
    temperature: float = 0.0,
) -> float

Base method to generate a score only, used for evaluation.

PARAMETER	DESCRIPTION
`system_prompt`	A pre-formatted system prompt. TYPE: `str`
`user_prompt`	An optional user prompt. TYPE: `Optional[str]` DEFAULT: `None`
`normalize`	The normalization factor for the score. TYPE: `float` DEFAULT: `10.0`
`temperature`	The temperature for the LLM response. TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`float`	The score on a 0-1 scale.

generate_score_and_reasons ¶

generate_score_and_reasons(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    normalize: float = 10.0,
    temperature: float = 0.0,
) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER	DESCRIPTION
`system_prompt`	A pre-formatted system prompt. TYPE: `str`
`user_prompt`	An optional user prompt. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`normalize`	The normalization factor for the score. TYPE: `float` DEFAULT: `10.0`
`temperature`	The temperature for the LLM response. TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`float`	The score on a 0-1 scale.
`Dict`	Reason metadata if returned by the LLM.

context_relevance ¶

context_relevance(
    question: str, context: str, temperature: float = 0.0
) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

PARAMETER	DESCRIPTION
`question`	A question being asked. TYPE: `str`
`context`	Context related to the question. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not relevant) and 1.0 (relevant). TYPE: `float`

qs_relevance ¶

qs_relevance(question: str, context: str) -> float

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

context_relevance_with_cot_reasons ¶

context_relevance_with_cot_reasons(
    question: str, context: str, temperature: float = 0.0
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

PARAMETER	DESCRIPTION
`question`	A question being asked. TYPE: `str`
`context`	Context related to the question. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". TYPE: `Tuple[float, Dict]`

qs_relevance_with_cot_reasons ¶

qs_relevance_with_cot_reasons(
    question: str, context: str
) -> Tuple[float, Dict]

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

relevance ¶

relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example

feedback = Feedback(provider.relevance).on_input_output()

Usage on RAG Contexts

feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". TYPE: `float`

relevance_with_cot_reasons ¶

relevance_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example

feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". TYPE: `Tuple[float, Dict]`

sentiment ¶

sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example

feedback = Feedback(provider.sentiment).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate sentiment of. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons ¶

sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (negative sentiment) and 1.0 (positive sentiment). TYPE: `Tuple[float, Dict]`

model_agreement ¶

model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example

feedback = Feedback(provider.model_agreement).on_input_output()

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not in agreement) and 1.0 (in agreement). TYPE: `float`

conciseness ¶

conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate the conciseness of. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons ¶

conciseness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output()

Args: text: The text to evaluate the conciseness of.

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (not concise) and 1.0 (concise) and a string containing the reasons for the evaluation.

correctness ¶

correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.correctness).on_output()

PARAMETER	DESCRIPTION
`text`	A prompt to an agent. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons ¶

correctness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.correctness_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0 (not correct) and 1.0 (correct) and a string containing the reasons for the evaluation.

coherence ¶

coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.coherence).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not coherent) and 1.0 (coherent). TYPE: `float`

coherence_with_cot_reasons ¶

coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.coherence_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0 (not coherent) and 1.0 (coherent) and a string containing the reasons for the evaluation.

harmfulness ¶

harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.harmfulness).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not harmful) and 1.0 (harmful)". TYPE: `float`

harmfulness_with_cot_reasons ¶

harmfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0 (not harmful) and 1.0 (harmful) and a string containing the reasons for the evaluation.

maliciousness ¶

maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.maliciousness).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not malicious) and 1.0 (malicious). TYPE: `float`

maliciousness_with_cot_reasons ¶

maliciousness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat compoletion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0 (not malicious) and 1.0 (malicious) and a string containing the reasons for the evaluation.

helpfulness ¶

helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.helpfulness).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not helpful) and 1.0 (helpful). TYPE: `float`

helpfulness_with_cot_reasons ¶

helpfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0 (not helpful) and 1.0 (helpful) and a string containing the reasons for the evaluation.

controversiality ¶

controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example

feedback = Feedback(provider.controversiality).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not controversial) and 1.0 (controversial). TYPE: `float`

controversiality_with_cot_reasons ¶

controversiality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0 (not controversial) and 1.0 (controversial) and a string containing the reasons for the evaluation.

misogyny ¶

misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.misogyny).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not misogynistic) and 1.0 (misogynistic). TYPE: `float`

misogyny_with_cot_reasons ¶

misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (not misogynistic) and 1.0 (misogynistic) and a string containing the reasons for the evaluation.

criminality ¶

criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.criminality).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not criminal) and 1.0 (criminal). TYPE: `float`

criminality_with_cot_reasons ¶

criminality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.criminality_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (not criminal) and 1.0 (criminal) and a string containing the reasons for the evaluation.

insensitivity ¶

insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.insensitivity).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not insensitive) and 1.0 (insensitive). TYPE: `float`

insensitivity_with_cot_reasons ¶

insensitivity_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (not insensitive) and 1.0 (insensitive) and a string containing the reasons for the evaluation.

comprehensiveness_with_cot_reasons ¶

comprehensiveness_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example

feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()

PARAMETER	DESCRIPTION
`source`	Text corresponding to source material. TYPE: `str`
`summary`	Text corresponding to a summary. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (not comprehensive) and 1.0 (comprehensive) and a string containing the reasons for the evaluation.

summarization_with_cot_reasons ¶

summarization_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. Defaulting to comprehensiveness_with_cot_reasons.

stereotypes ¶

stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons ¶

stereotypes_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed) and a string containing the reasons for the evaluation.

groundedness_measure_with_cot_reasons ¶

groundedness_measure_with_cot_reasons(
    source: str, statement: str
) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The LLM will process the entire statement at once, using chain of thought methodology to emit the reasons.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )

Args: source: The source that should support the statement. statement: The statement to check groundedness.

RETURNS	DESCRIPTION
`Tuple[float, dict]`	Tuple[float, str]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a string containing the reasons for the evaluation.