# Dineshkumar Bhaskaran

☑ dineshkumarb@gmail.com 📞 778 893 8274 🔗 dkbhaskaran.github.io in dineshkumarb ♀ Canada (OWP)

#### Summary \_\_\_\_\_

- Expert in high-performance parallel computing, with a strong focus on algorithm parallelization, optimization, and benchmarking for AI/ML workloads, image processing pipelines, and distributed storage systems.
- Extensive experience in Linux kernel and systems programming, covering storage virtualization, device drivers, and board bring-ups for ARM-based architectures.

#### Experience \_\_\_\_\_

| Arista Networks, Software Engineer                                                                                                                                                                                                                                                                                                  | Vancouver, Nov                    |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|
| <ul> <li>Refactored the Layer 3 Unicast routing abstraction layer, removing approximately 7K lines of<br/>redundant code between protocol and hardware interface layers.</li> </ul>                                                                                                                                                 | 2023 - Apr 2025                   |
| <ul> <li>Unified portions of IPv4 and IPv6 handling logic using C++ templates, resulting in a cleaner<br/>codebase and 4% improvement in module build times.</li> </ul>                                                                                                                                                             |                                   |
| AMD India, Senior Member Technical Staff                                                                                                                                                                                                                                                                                            | Bengaluru, Aug                    |
| <ul> <li>Rapids – Accelerated Data Science for ROCm: Owned and maintained RAPIDS CUDF sub-<br/>projects (i.e. rapids-cmake, RMM, NVComp) to support PyData libraries on AMD GPUs under<br/>the ROCm stack.</li> </ul>                                                                                                               | 2019 - Oct 2013                   |
| • <b>MLPerf Inferencing:</b> Implemented Python reference code for models Resnet50, Yolov4 and Bert on AMD Instinct GPUs for multiple backends like pytorch, tensorflow, Tensor virtual machine (TVM) and MIGraphX. Developed a C++ inference server on TVM for ResNet50, improving performance by 51.5%.                           |                                   |
| • ROCm Clang Compiler                                                                                                                                                                                                                                                                                                               |                                   |
| <ul> <li>ROCm Clang compiler 2: Maintainer from Aug 2019 to Sept. 2021.</li> <li>Implemented multithreading and in-memory compilation in AMD's Lightning compiler (based on LLVM), improving compile time by 29% on Windows and 1.07x on Linux.</li> </ul>                                                                          |                                   |
| Aricent (later Capgemini Engineering), Principal Engineer                                                                                                                                                                                                                                                                           | Bengaluru, Oct                    |
| <ul> <li>Developed GPU-accelerated erasure coding algorithms for CEPH. Presented results at SNIA<br/>SDC 2018 (India and Santa Clara) under the title Accelerated Erasure Coding: The New<br/>Frontiers of Software-Defined Storage – 2018 2.</li> </ul>                                                                            | 2017 - Jul 2019                   |
| <ul> <li>Implemented FFT offload in the OpenAirInterface 4G stack as part of an SDR solution,<br/>leveraging NVIDIA GPUs and Xilinx FPGAs to improve performance.</li> </ul>                                                                                                                                                        |                                   |
| Canon Inc, Principal Engineer                                                                                                                                                                                                                                                                                                       | Tokyo,                            |
| <ul> <li>Led a team to create an efficient medical image processing library for Canon medical<br/>apparatuses. Parallelized and optimized Image registration components like<br/>Pre-processing algorithms, Optimizers (Powell, LM, GD, SGD), Metrics (MI, NMI, RIU, SSD),<br/>transformation algorithms, and Resampler.</li> </ul> | Bengaluru, Mar<br>2010 - Oct 2017 |
| <ul> <li>Managed a team that maintained and enhanced Linux based OS for Canon embedded<br/>products. Involved in porting Linux kernel and essential system applications to various ARM<br/>based SoCs.</li> </ul>                                                                                                                   |                                   |
| Early Experience (Brocade communication and Tata Elxsi), Software Engineer                                                                                                                                                                                                                                                          | Bengaluru, Sep                    |
| <ul> <li>Worked on Brocade Storage Application Services. SAS services include storage virtualization,<br/>online data migration, CDR, and CDP. Owned virtualized initiator module in SAS solution.</li> </ul>                                                                                                                       | 2003 - Mar 2010                   |
| <ul> <li>Worked on Target Mode driver for LSI Logic FC HBAs based on LSI-Logic Fusion message<br/>passing technology to act as a virtualized storage box.</li> </ul>                                                                                                                                                                |                                   |

## Education

Deep Learning Theory and Practice IISc Bengaluru M.S Software systems 2006-2009, BITS Pilani Bachelor of Technology, Computer Engineering 1999-2003, University of Calicut.

### Technical Writing & Talks \_\_\_\_\_

- Blog and Assorted articles 🗹
- Accelerated Erasure Coding
- Why erasure coding is the future of data resiliency
- Writing a Network device driver

## Technologies \_

Languages: C, HIP, OpenCL, familiar with CUDA, C++, Python, PTX, HLSL, ARM, X86 assembly.

Protocols stacks: FC, Familiar with SCSI, USB, OpenAirInterface 4G stack in Linux Kernel.

Tools and ASICs: ROCm and GNU Toolchain, Xilinx ZC-702/706, TI AM437x, AMD Instinct GPUs gfx90x series.