Class tasks
Stateful GPU services: load once, serve many. Lifecycle hooks, probes, and concurrency.
Define a class task
Decorate a class with @app.cls. Mark methods you want to call remotely with @gw.method(). The class is instantiated once per container; state persists across calls to the same instance.
1import gworker_client as gw2 3app = gw.App("inference-server")4 5@app.cls(gpu=gw.Gpu("A100"), min_workers=1)6class Embedder:7 @gw.on_enter()8 def setup(self):9 from sentence_transformers import SentenceTransformer10 self.model = SentenceTransformer("all-MiniLM-L6-v2")11 12 @gw.method()13 def embed(self, texts: list[str]) -> list[list[float]]:14 return self.model.encode(texts).tolist()Lifecycle hooks
@gw.on_enter() runs once at container startup before the first request. @gw.on_exit() runs on graceful shutdown. Use on_enter to load model weights so every request hits a warm model.
1@app.cls(gpu=gw.Gpu("H100"))2class LLMServer:3 @gw.on_enter()4 async def load(self):5 import torch6 self.model = load_model("/models/llama-3")7 self.model.cuda()8 9 @gw.on_exit()10 async def unload(self):11 del self.model12 torch.cuda.empty_cache()13 14 @gw.method()15 async def generate(self, prompt: str) -> str:16 return self.model.generate(prompt)Concurrent inputs
Let one container handle multiple inputs simultaneously. Good for I/O-bound tasks or inference servers where GPU utilisation is otherwise low.
1@app.cls(gpu=gw.Gpu("A100"))2@gw.concurrent(max_inputs=16)3class EmbedServer:4 @gw.on_enter()5 def load(self):6 self.model = load_embedding_model()7 8 @gw.method()9 async def embed(self, text: str) -> list[float]:10 return await self.model.aencode(text)Batching
Coalesce concurrent inputs into a single GPU call. max_batch_size caps the batch; wait_ms is the coalescing window. The method receives a list and must return a list of the same length.
1@app.cls(gpu=gw.Gpu("A100"))2@gw.concurrent(max_inputs=128)3class BatchEmbedder:4 @gw.method()5 @gw.batched(max_batch_size=64, wait_ms=20)6 def embed(self, texts: list[str]) -> list[list[float]]:7 return self.model.encode(texts).tolist()Health probes
Attach a liveness or readiness probe to a method. The platform calls it on a schedule; failures trigger a restart (liveness) or traffic drain (readiness).
1@app.cls(2 gpu=gw.Gpu("A100"),3 probes=[gw.Probe.on("health", kind="liveness", period_s=30, failures=3)],4)5class Model:6 @gw.method()7 def health(self) -> dict:8 return {"ok": self.model is not None}9 10 @gw.method()11 def predict(self, x: list[float]) -> float: ...Runtime parameters
Mark class attributes with @gw.parameter() to inject them at call time without redeploying. Pass defaults for local development.
1@app.cls(gpu=gw.Gpu("A100"))2class Classifier:3 threshold: float = gw.parameter(default=0.5)4 5 @gw.method()6 def classify(self, text: str) -> str:7 score = self.model.predict(text)8 return "positive" if score > self.threshold else "negative"9 10# Call with a custom threshold11result = Classifier(threshold=0.7).classify.remote("great product!")