Nikhil Sardana

Nikhil Sardana's posts

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

April 11, 2024/1분 이내 소요

엔비디아 H100 텐서 코어 GPU에서 정량화된 거대 언어 모델(LLM) 제공

LLM Inference Performance Engineering: Best Practices

April 11, 2024/2분 소요

LLM 추론 성능 엔지니어링: 모범 사례