Job Description

We are seeking experienced Multimodal and Vision AI Engineers/Scientists to research, develop, optimize, and deploy Vision-Language Models (VLMs) , multimodal generative models, diffusion models, and traditional computer vision techniques. You will work on foundational models integrating vision, language, and audio, optimize AI architectures, and push the boundaries of multimodal AI research.

Responsibilities:

Research, design, and train multimodal vision-language models (VLMs), integrating deep learning, transformers, and attention mechanisms.
Develop and optimize small-scale distillation of VLMs for efficient deployment on resource-constrained devices.
Implement state-of-the-art object detection (YOLO, Faster R-CNN), segmentation (Panoptic Segmentation), classification (ResNets, Vision Transformers), and image generation (Stable Diffusion, Stable Cascade).
Train or fine-tune vision models for representation (e.g., Vision Transformers, Q-Former, CLIP, SigLIP), generation, and video representation (e.g., Video-Swin Transformer).
Work with diffusion models and generative models for conditional image generation and multimodal applications.
Optimize CNN-based architectures for computer vision tasks like recognition, tracking, and feature extraction.
Implement and optimize audio models for representation (e.g., W2V-BERT) and generation (e.g., Hi-Fi GAN, SeamlessM4T).
Innovate with multimodal fusion techniques such as early fusion, deep fusion, Mixture-of-Experts (MoE), FlashAttention, MQA, GQA, MLA, and other transformer architectures.
Advance video analysis, video summarization, and video question-answering models to enhance multimedia understanding.
Integrate and tailor deep learning frameworks like PyTorch, TensorFlow, DeepSpeed, Lightning, Habana, and FSDP.
Deploy large-scale distributed AI models using MLOps frameworks such as AirFlow, MosaicML, Anyscale, Kubeflow, and Terraform.
Publish research in top-tier conferences (NeurIPS, CVPR, ICCV, ICLR, ICML) and contribute to open-source AI projects.

Qualifications:

Ph.D. or Master’s degree with 2+ years of experience in Vision-Language Models (VLMs), multimodal AI, diffusion models, CNNs, ResNets, computer vision, and generative models.
Demonstrated expertise in high-performance computing, proficiency in Python, C/C++, CUDA, and kernel-level programming for AI applications.
Experience in optimizing training and inference of large-scale AI models, with knowledge of quantization, distillation, and LLMOps.
Hands-on experience with object detection (YOLO, Faster R-CNN), image segmentation (Panoptic Segmentation), and video understanding (Swin Transformer, Timesformer).
Proficiency in AI toolkits like PyTorch, TensorFlow, OpenCV, and familiarity with MLOps frameworks.

Job Tags

Similar Jobs

University of Missouri-Columbia

Division Chief of Pediatric Neurology Job at University of Missouri-Columbia

...University of Missouri School of Medicine Department of Pediatrics Division Chief Pediatric Neurology The Department of Pediatrics at the University of Missouri-Columbia is seeking a Division Chief of Pediatric Neurology. This key physician leader is an integral...

Compunnel Inc.

Marketing Specialist Job at Compunnel Inc.

...Demand Generation Digital Marketing Specialist Position : Marketing Specialist I Location : 4200 Corning Place, Charlotte, NC, USA 28216 Duration : 12 Months Contract (Possible Extension) Pay Rate : $26/hour Job Purpose Support the Digital Demand...

The Medical Transcription Service

Work From Home as a Medical Transcriptionist Job at The Medical Transcription Service

We are looking for good pathology MTs who can work evenings. Must reside in the U.S. or Canada. Our MTs are independent contractors working from home. For more information regarding earning potential, equipment needed, and the reports we transcribe. Please email us for...

US Tsubaki Automotive, LLC

Intern, Human Resources Job at US Tsubaki Automotive, LLC

...The TSUBAKI name is synonymous with excellence in quality, dependability and customer service. U.S. Tsubaki Automotive, LLC is an international tier-one supplier of high-speed chain drive systems to the automotive industry. The Human Resources Intern will be...

Ruan Transportation Management Systems

Regional Flex CDL Driver Job at Ruan Transportation Management Systems

About the Job Ruan is now hiring Class A CDL Flex drivers to haul freight for our dedicated customer based in Dixon, IL! Flex drivers will learn several different dedicated routes to cover vacations, absences, and fill-ins, or be assigned to other Ruan operations as...

Computer Vision Engineer Job at Tykhe Inc, Palo Alto, CA

WVE4bUwxS0dvREZ0WlU1cjJuNC82N0dkRHc9PQ==