GPU Optimization Specialist
Task:
Develop and implement methods for accelerating computations.
Configure containers for various accelerators (GPU, TPU, etc.).
Create strategies for balancing performance and quality.
Prepare specialized container images for specific architectures.
Design scalable systems for different computing protocols.
Adapt code for new versions of frameworks.
Integrate specialized hardware.
Requirements:
Experience in optimizing large language models.
Deep knowledge of transformers and attention mechanisms.
Proficiency with tools for accelerators and their optimization (CUDA, ROCm, etc.).
Experience with optimization frameworks (TensorRT, OpenVINO, Triton).
Understanding of containerization (Docker) and workload management (Kubernetes, Slurm).
Will be a plus:
Experience with distributed networks.
Participation in ML competitions.
Knowledge of modern accelerator architectures.
Work with specialized devices (FPGA, ASIC).
Conditions:
Possibility of transitioning to a full-time position with a share in profits.
Bonuses tied to real performance improvements that you implement.
Fully remote work, flexible schedule.
-
14 days15 000 USD
1213 8 0 14 days15 000 USDWe at SDEV have extensive experience in optimizing computations on GPUs and configuring infrastructure for high-load tasks. We achieve this through profiling CUDA cores and fine-tuning Docker containers for efficient use of hardware accelerators. We are ready to ensure a balance of performance and quality for your models.
-
1 day50 USD
1738 9 0 1 day50 USDHello. For the implementation of the project, I will focus on developing low-level optimizations and adapting architectural solutions for different types of accelerators, such as GPU and TPU, using CUDA/ROCm and frameworks like TensorRT or OpenVINO. A scalable container infrastructure based on Docker and Kubernetes will be created for efficient deployment and management of computations, ensuring an optimal balance of performance and quality for large language models. I have already successfully implemented similar projects for optimizing transformers and have ready scripts and templates to accelerate setup and benchmarking. I suggest discussing all implementation details, final budget, and timelines in private messages.