ニキル・サルダナ

ニキル・サルダナ's posts

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

January 31, 2024/1分未満

NVIDIA H100 Tensor Core GPU上でのクオンタイズ(量子化)LLMの処理

LLM Inference Performance Engineering: Best Practices

October 12, 2023/2分で読めます

LLM推論パフォーマンスエンジニアリング：ベストプラクティス