Titan Takeoff
TitanML
helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform.
Our inference server, Titan Takeoff enables deployment of LLMs locally on your hardware in a single command. Most generative model architectures are supported, such as Falcon, Llama 2, GPT2, T5 and many more. If you experience trouble with a specific model, please let us know at hello@titanml.co.
Example usage
Here are some helpful examples to get started using Titan Takeoff Server. You need to make sure Takeoff Server has been started in the background before running these commands. For more information see docs page for launching Takeoff.
import time
# Note importing TitanTakeoffPro instead of TitanTakeoff will work as well both use same object under the hood
from langchain_community.llms import TitanTakeoff
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
Example 1
Basic use assuming Takeoff is running on your machine using its default ports (ie localhost:3000).
llm = TitanTakeoff()
output = llm.invoke("What is the weather in London in August?")
print(output)
Example 2
Specifying a port and other generation parameters
llm = TitanTakeoff(port=3000)
# A comprehensive list of parameters can be found at https://docs.titanml.co/docs/next/apis/Takeoff%20inference_REST_API/generate#request
output = llm.invoke(
"What is the largest rainforest in the world?",
consumer_group="primary",
min_new_tokens=128,
max_new_tokens=512,
no_repeat_ngram_size=2,
sampling_topk=1,
sampling_topp=1.0,
sampling_temperature=1.0,
repetition_penalty=1.0,
regex_string="",
json_schema=None,
)
print(output)
Example 3
Using generate for multiple inputs
llm = TitanTakeoff()
rich_output = llm.generate(["What is Deep Learning?", "What is Machine Learning?"])
print(rich_output.generations)