Ollama multiple gpus. Click on Edit environment variables for your account.

If you have multiple AMD GPUs in your system and want to limit Ollama to use a subset, you can set HIP_VISIBLE_DEVICES to a comma separated list of GPUs. May 6, 2024 · edited. Even desktop GPUs can handle easily load more than one model. Now we can upload multiple types of files to an LLM and have it parsed. It's possible the combination of the two prevents ollama from using the GPU. default Nov 25, 2023 · When running with 4 GPUs: First GPU (device (0)) shoots to 70% usage, while other 3 remain around 15% usage only. May 15, 2024 · For our node groups, we landed on three node groups for our initial build: the Open WebUI services would run on a node group using m5a. It supports multiple LLM runners. My code is based on some very basic llama generation code: model = AutoModelForCausalLM. Install IPEX-LLM for Ollama. The goal is to enable simultaneous response generation, which would be particularly useful in a collaborative environment where multiple users are interacting with Ollama services at the same Here is the command: sudo docker run -d --gpus=1 -v ollama:/root/. Mar 17, 2024 · Given that LLMs typically demand robust GPUs for their operation due to their considerable size, the models supported by Ollama employ neural network quantization. First Quit Ollama by clicking on it in the task bar. If you’re a developer or a researcher, It helps you to use the power of AI without relying on cloud-based platforms. Mar 14, 2024 · To get started with Ollama with support for AMD graphics cards, download Ollama for Linux or Windows. Reload to refresh your session. If you want to ignore the GPUs and force CPU usage, use an invalid GPU ID (e. May 6, 2024 · New Ollama update adds Llama 3, ability to ask multiple questions at once and more. With the Ollama Docker container up and running, the next step is to download the LLaMA 3 model: docker exec -it ollama ollama pull llama3. Dec 19, 2023 · In fact, a minimum of 16GB is required to run a 7B model, which is a basic LLaMa 2 model provided by Meta. You can even use this single-liner command: $ alias ollama='docker run -d -v ollama:/root/. brev shell --host [instancename]is Sep 27, 2023 · Running Llama 2 70B on Your GPU with ExLlamaV2. With GPTQ quantization, we can further reduce the precision to 3-bit without losing much in the performance of the model. Ollama AI is an open-source framework that allows you to run large language models (LLMs) locally on your computer. I've also included the relevant sections of my YAML configuration files: Sep 28, 2023 · A high-end consumer GPU, such as the NVIDIA RTX 3090 or 4090, has 24 GB of VRAM. Running Ollama [cmd] Ollama communicates via pop-up messages. As a sanity check, make sure you've installed nvidia-container-toolkit and are passing in --gpus otherwise the container will not have access to the GPU. Set parameter 'num_thread' to '16'. 1 GB About a minute ago. As commenters in this issue have pointed out, you can set this in the CLI. 8 How to reproduce starting the server by hand ollama serve ollama run zephyr >>> why is the sky bl on May 31, 2023. SLURM uses CUDA_VISIBLE_DEVICES to assign GPUs to jobs/processes. When I run ollama directly from commandline - within a SLURM managed context with 1 GPU assigned - it uses all availables GPUs in the server and ignores CUDA_VISIBLE We would like to show you a description here but the site won’t allow us. They can even use your CPU and regular RAM if the whole thing doesn't fit in your combined GPU memory. I am running a headless server and the Jun 13, 2024 · Current Set up with 1 GPU server and 4 GPU Server: 1GPU Running following models with ollama 1. 33, it was found that loading a model would automatically use one card. Dec 14, 2023 · This is a demo of using the llava multimodal model via ollama 1. cpp, which can only load one model. I thought of utilizing these and running on Kubernetes. May 15, 2024 · Learn how to run Ollama, an open-source tool for machine learning models, with 4 GPUs simultaneously using the LaMa 3:7b model. May 23, 2024 · Deploying Ollama with GPU. mistral:latest 2ae6f6dd7a3d 4. Ollama supports multiple platforms, including Windows, Mac, and Linux, catering to a wide range of users from hobbyists to professional developers. I happen to possess several AMD Radeon RX 580 8GB GPUs that are currently idle. 31. Maybe try only 3090. Dec 5, 2023 · Currently what ollama does is UNLOAD the previously loaded model, and loads the last model you try to use. You signed out in another tab or window. I do not see any concurrency benefits at all. To set up the WebUI, I'm using the following command: docker compose -f docker-compose. I'm playing around with multiple GPU and came across "This functionality enables LocalAI to However, it's important to note that Chat with RTX relies on TensorRTX-LLM, which is only supported on 30 series GPUs or newer. 3 days ago · System has 2 discrete GPUs: AMD RX 7600 XT (16 GB) nvidia 1050 TI (4 GB) RAM: 48 GB CPU: AMD 7600X. You can see the list of devices with rocminfo. This was a major drawback, as the next level graphics card, the RTX 4080 and 4090 with 16GB and 24GB, costs around $1. 32 and v0. cpp: gguf-split: split and merge gguf per batch of tensors #6135. They don't need to be identical. Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 3 8B model on multiple GPUs in one node or multi Convert the checkpoints to your GPU configuration. When used by multiple users simultaneously, it is slower. model. common : add HF arg helpers #6234. ollama is where I already have a ton of models downloaded and I don't want to download them again inside the container. Use all to utilize all available GPUs or specify a specific number if you have multiple GPUs and want to dedicate a subset for Ollama. 1 PARAMETER top_k 22 PARAMETER top_p 0. Persistent Volume Definition: The volumes section defines a persistent volume named ollama. NAME ID SIZE MODIFIED. How can I use all 4 GPUs simultaneously? I am not using a docker, just use ollama serve and ollama run. ollama -p 11434:11434 --name ollama ollama/ollama:rocm. May 29, 2024 · When CUDA_VISIBLE_DEVICES=1,2 is used, and Ollama pulls a model, it only uses GPU numbered 1. txt. By running LLMs locally, you can avoid the costs and privacy concerns associated with cloud-based services. Oct 18, 2023 · slychief commented on Oct 18, 2023. In this way, if Ollama turns on parallelism, parallel reasoning for the same model will only be performed on gpu 1, and the parallel speed is very slow. dhiltgen mentioned this issue on Mar 12. from transformers import Nov 30, 2023 · A simple calculation, for the 70B model this KV cache size is about: 2 * input_length * num_layers * num_heads * vector_dim * 4. However, none of my hardware is even slightly in the compatibility list; and the publicly posted thread reference results were before that feature was released. Mar 13, 2024 · I would imagine for anyone who has an Intel integrated GPU, the otherwise unused GPU would add an additional GPU to utilize. . 46: root@4cdbe351ed8b:/# ollama list. Running multiple GPU won't offload to CPU like it does with a single GPU. Find out the memory requirements, the command-line options, and the references for Ollama and LaMa. 9:12 am May 6, 2024 By Julian Horsey. To run ollama in the container, the command is: sudo docker exec -it ollama1 ollama run llama3. A 96GB Mac has 72 GB available to the GPU. , "-1") Using NVIDIA GPUs with WSL2. It is a 3GB GPU that is not utilized when a model is split between an Nvidia GPU and CPU. I managed to fix this adding a systemd service that does this: options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/tmp. Feb 3, 2024 · Introduction. Package to install : Start serving Llama2 on a 4 CPU instance with the following command: sky launch ollama. [2024/04] You can now run Llama 3 on Intel GPU using llama. After downloading Apr 5, 2024 · Ollama now allows for GPU usage. Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in a new native Windows experience. As an app dev, we have 2 choices: (1) Build our own support for LLMs, GPU/CPU execution, model downloading, inference optimizations, etc. Although the load is reasonably fast (if you intend to manually enter text and such) but if you want to use it with AutoGen or similar, loads and unloads put additional latency into the system, when token generation can already be pretty slow. 7b-base-q5_0 TEMPLATE """{{ . I write the code following popular repositories in GitHub. Welcome to the Ollama Docker Compose Setup! This project simplifies the deployment of Ollama using Docker Compose, making it easy to run Ollama with all its dependencies in a containerized environm Mar 7, 2024 · Now you are ready torun Ollama and download some models :) 3. By providing On Windows, Ollama inherits your user and system environment variables. You switched accounts on another tab or window. Oct 9, 2023 · Hi, I’ve been looking this problem up all day, however, I cannot find a good practice for running multi-GPU LLM inference, information about DP/deepspeed documentation is so outdated. com/blog/python-javascript-librariesOllama Vision models: https://ollama. Jun 9, 2024 · Ollama. cpp, or NVIDIA Chat with RTX. Ollama Libraries: https://ollama. cpp code directly and recompiling for my own purposes. There are some things in the middle, like less polished MacOS gives the GPU access to 2/3rds of system memory on Macs with 36GB or less and 3/4 on machines with 48GB or more. Use wsl --update on the command line. com/blog/vision-modelsOllama OpenAI API: https://ol Mar 4, 2024 · Intel Extension for PyTorch enables PyTorch XPU devices, which allows users to easily move PyTorch model and input data to the device to run on an Intel discrete GPU with GPU acceleration. But when I tried to ran it on multiple GPUs, I met the following problem (I used TORCH_DISTRIBUTED_DEBUG=DETAIL to debug): Parameter at index 127 with name base_model. After the installation, you We would like to show you a description here but the site won’t allow us. To enable WSL 2 GPU Paravirtualization, you need: The latest version of the WSL 2 Linux kernel. v_proj. from accelerate import Accelerator. large instances since they don’t need GPU, and we built two May 25, 2024 · Running Ollama on AMD GPU. gpu. , "-1") Apr 19, 2024 · Open WebUI UI running LLaMA-3 model deployed with Ollama Introduction. [2024/04] ipex-llm now provides C++ interface, which can be used as an accelerated backend for running llama. The last parameter determines the number of layers offloaded to the GPU during processing. I'm sure many people have their old GPUs either still in their rig or lying around, and those GPUs could now have new purpose for accelerating the outputs. FSDP which helps us parallelize the training over multiple GPUs. Now that Ollama is up and running, execute the following command to run a model: docker exec -it ollama ollama run llama2. io’s powerful GPUs means you can use bigger models with more parameters and a larger context window. Mar 18, 2024 · TL; DR You can run inference with Smaug-72B 4 bit quantized at 5 tokens/second locally for under $800, using Ubuntu Linux, Ollama, two Nvidia Tesla P40 GPU's, a server motherboard, and suitable 💯 Ollama will run on cloud servers with multiple GPUs attached 🤖 Ollama will run on WSL 2 with GPU support 😍 Ollama maximizes the number of GPU layers to load to increase performance Nov 27, 2023 · Multi GPU inference (simple) The following is a simple, non-batched approach to inference. Let's check out. lora_B. Let’s run Feb 21, 2024 · The CUDA_VISIBLE_DEVICES=0 locks this container done to the first GPU. Dec 1, 2023 · ollama show --modelfile coder-16k # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: # FROM coder-16k:latest FROM deepseek-coder:6. Wait until the model command returns successfully. the machine has 4 x 3070 (8GB) and an older i5-7400, UBU 22. If you've tried distribute inference, share your knowledge. cpp with IPEX-LLM on Intel GPU Guide, and follow the instructions in section Prerequisites to setup and section Install IPEX-LLM cpp to install the IPEX-LLM with Ollama binaries. The -d flag ensures the container runs in the background. Blending natural language processing and computer vision, these models can interpret text, analyze images, and make recomendations. starcoder2:7b 0679cedc1189 4. It provides a user-friendly approach to Yes multi-GPU is supported. This will launch the respective model within a Docker container, allowing you to interact with it through a command-line interface. I tried to manipulate CUDA_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES envvars. The following table helps I'm a newcomer to the realm of AI for personal utilization. You can also simply test the model with test_inference. Setting either to -1 makes ollama run with GPU that's left. from_pretrained( llama_model_id Mar 5, 2024 · Many tools report the number of hyperthreads as the number of CPUs, so this can be a bit misleading. The project is mainly for Hi, My name is Sunny Solanki, and in this video, I provide a step-by-step guide to building a chatbot using Gradio and Ollama. yml in your desired directory. For example: % ollama run llama3. Unloading and reloading the kernel module is not possible in some cases. For Llama 3 70B: ollama run llama3-70b. Since our converting tool splits the checkpoints to one file per GPU, you need to determine the number of GPUs (<num_gpus>) the model is supposed to run on. Visit Run llama. I use "llama2" model with 7B p Right now Ollama is limited to one request and one model. OpenUI (formerly Open WebUI) is a user-friendly, self-hosted web interface for LLMs. If we quantize Llama 2 70B to 4-bit precision, we still need 35 GB of memory (70 billion * 0. cpp server may effect you alot! #2191. 0 GB About a minute ago. If you do have multiple GPUs, you can use the GPU vendor specific GPU Selection variable to Once the model download is complete, you can start running the Llama 3 models locally using ollama. The memory is combined. ollama -p 11435:11434 --name ollama1 ollama/ollama. Start the Settings (Windows 11) or Control Panel (Windows 10) application and search for environment variables. Using Ollama, users can easily personalize and create language models according to their preferences. txt nvidia_only. If you're using ollama on the command line, I'd say having the option to easily do that is much more useful than saving a couple commands upon installation. $ ollama run llama3 "Summarize this file: $(cat README. While you can still run multiple instances of ollama, fixing the issue at the core is better. IPEX-LLM's support for ollama now is available for Linux system and Windows system. common: llama_load_model_from_url split support #6192. utils import gather_object. If you have multiple GPUs then the new default split_mode = "layer" option in the wrapped llama. Choose the appropriate command based on your hardware setup: With GPU Support: Utilize GPU resources by running the following command: I got errors when trying to run multiple GPU and couldn't get ollama to offlload to CPU. For Llama 3 8B: ollama run llama3-8b. yaml -c ollama --detach-run. In v0. 💡. We recently introduced gguf-split CLI and support the load of sharded GGUFs model in llama. txt To run fine-tuning on multi-GPUs, we will make use of two packages: PEFT methods and in particular using the Hugging Face PEFTlibrary. 5 1. February 15, 2024. Nvidia Ampere: GPU architecture published in 2020, focused on gaming and AI, with advanced ray tracing and AI capabilities. /deviceQuery . All the features of Ollama can now be accelerated by AMD graphics cards on Ollama for Linux and Windows. Ollama 0. ollama -p 11434: 11434--name ollama ollama / ollama Updating every LLM's in one command This command allows users to update all large language models (LLMs) and related models on their device. And I think an awesome future step would be to support multiple GPUs. How can Ollama use multiple GPUs for Aug 16, 2023 · Each Ollama instance wont know about the others, so if you only have a single GPU, you may run into OOM problems if you don't sequence model loading to ensure they're not racing, or control the amount of layers they load by explicitly setting num_gpu. 6K and $2K only for the card, which is a significant jump in price and a higher investment. Before v0. 👍 22. To launch a different model, use the MODEL_NAME environment variable: GPU Selection. struggle. com, using a T4 GPU. Aug 2, 2023 · Here's what I did to get GPU acceleration working on my Linux machine: In ollama/api/types. So the 8GB Vram on the 3060 Ti could be causing all other GPU to level down to 8GB. 22 Ollama doesn't take it into account. when loading a small model on multiple GPUs, it produces garbage. Even if it was limited to 3GB. I know that supporting GPUs in the first place was quite a feat. 2 / 12. py. If evenly distributed across multiple GPU cards, it can improve Running Models Locally. There is a way to allocate more RAM to the GPU, but as of 0. cpp and ollama on Intel GPU. 👍 4. Prompt }}""" PARAMETER num_ctx 16384 PARAMETER num_gpu 128 PARAMETER num_predict 756 PARAMETER seed 42 PARAMETER temperature 0. md)" Ollama is a lightweight, extensible framework for building and running language models on the local machine. Contemplating the idea of assembling a dedicated Linux-based system for LLMA localy, I'm curious whether it's feasible to locally deploy LLAMA with the support of multiple GPUs? If yes how and any tips 1 Install IPEX-LLM for Ollama #. docker run -d --restart always --device /dev/kfd --device /dev/dri -v ollama:/root/. If you look in the server log, you'll be able to see a log line that looks something like this: llm_load_tensors: offloaded 22/33 layers to GPU. cpp was created by Georgi Gerganov. Ollama now supports AMD graphics cards in preview on Windows and Linux. Conclusion . IPEX-LLM’s support for ollama now is available for Linux system and Windows system. There is a chat. To validate that everything works as expected, execute a docker run command with the --gpus Jan 8, 2024 · Hello Ollama-webui Community, I'm currently exploring the possibility of implementing parallel processing with multiple Ollama services for shared usage. Docker Desktop for Windows supports WSL 2 GPU Paravirtualization (GPU-PV) on NVIDIA GPUs. I am running two Tesla P40s. You specify which GPU the docker container to run on, and assign the port from 11434 to a new number. Dec 10, 2023 · . Ollama is a robust framework designed for local execution of large language models. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. Currently, the implementation with Ollama requires constantly switching between models, which slows down the process. >>> /set parameter num_thread 16. After the installation, you should Ollama is an open-source platform that simplifies the process of running LLMs locally. /deviceQuery Starting CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "NVIDIA GeForce RTX 3080 Ti" CUDA Driver Version / Runtime Version 12. Dec 6, 2023 · Ollama is a fantastic way to run large language models of your choice and the ability to use Fly. The test is simple, just run this singe line after the initial installation of Ollama and see the performance when using Mistral to ask a basic question: Mar 6, 2024 · For many this issue is related to sleep/resume on a laptop. unless ollama runs another model, GPU numbered 2 will be used. Mar 3, 2024 · Multi-GPU Support: Ollama can leverage multiple GPUs on your machine, resulting in faster inference and improved performance for resource-intensive tasks. On systems with enough RAM, the tool can execute models with up to 13B parameters. It cost about $0. This lets you make your assistants more lifelike, your conversations have more context, and your text generation more realistic. Ollama uses basic libraries to do the math directly. Logs: both. Now, you can run the following command to start Ollama with GPU support: docker-compose up -d. We would like to show you a description here but the site won’t allow us. Another tool that lets you execute LLMs locally on your CPU without a GPU is called Ollama. part of both. we have several GPUs in our server and use SLURM to manage the ressources. For instance, it suggests that 3B models have 8GB of RAM, 7B models have 16GB, and 13B models have 💯 Ollama will run on cloud servers with multiple GPUs attached 🤖 Ollama will run on WSL 2 with GPU support 😍 Ollama maximizes the number of GPU layers to load to increase performance This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. 5 bytes). g. dev combined with Tailscale makes it incredibly easy. According to our monitoring, the entire inference process uses less than 4GB GPU memory! 02. 32, when loading a model, it would be evenly distributed across all GPU cards to improve the use of GPU cards. The original implementation of llama. I've ran an L4 and T4 together. Example outputs: 💡Tip: You can further reduce costs by using the --use-spot flag to run on spot instances. Also ollama would try to split loads evenly so the M6000 and 3060 Ti smaller Vram memory would cause the 3090 to use less of its available Vram. More details. gemma:7b a72c7f4d0a15 5. The latest release of Intel Extension for PyTorch (v2. 02 in total co 💯 Ollama will run on cloud servers with multiple GPUs attached 🤖 Ollama will run on WSL 2 with GPU support 😍 Ollama maximizes the number of GPU layers to load to increase performance docker run -d --restart always --gpus all -v ollama: / root /. 15 using ollama-webui on https://cocalc. Here's what my current Ollama API URL setup looks like: Despite this setup, I'm not able to get all GPUs to work together. How should we solve this? The Ext Server was built on the server example on llama. Oct 16, 2023 · @Syulin7 Both the GPU and CUDA drivers are older, from Aug. It requires using both an embedding model and a chat model separately. Jun 2, 2024 · count: This value determines how many Nvidia GPUs you want to reserve for Ollama. 3 CUDA Capability Major/Minor version number: 8. Jun 7, 2024 · Ollama CLI is a powerful command-line tool designed to streamline the management and interaction with large language models (LLMs). But using Brev. self_attn. The model could fit into 2 consumer GPUs. It would be much more efficient if there was a way to use them simultaneously. Some of that will be needed beyond the model data itself. 6 Total amount of global memory: 12288 MBytes (12884377600 bytes) (080) Multiprocessors, (128) CUDA Cores/MP: 10240 CUDA Jan 25, 2024 · ollama / ollama Public. (2) Just tell users "run Ollama" and have our app hit the Ollama API on localhost (or shell out to `ollama`). 133 introduces an experimental approach to parallel processing May 9, 2024 · Running Ollama with GPU Acceleration: With the configuration file ready, save it as docker-compose. yaml -f docker-compose. This technique dramatically Jul 16, 2023 · Hi, I want to fine-tune llama with Lora on multiple GPUs on my private dataset. That would be an additional 3GB GPU that could be utilized. --gpus=all still limited by the CUDA_VISIBLE_DEVICES=0-v is the volume to mount from the HOST/CONTAINER so for me: /usr/share/ollama/. 3. ollama -p 11434:11434 --name ollama ollama/ollama && docker exec -it ollama ollama run llama2'. I successfully ran my code on 1 GPU. Downloading and Running the Model. llama_model_loader: support multiple split/shard GGUFs #6187. Once Ollama is set up, you can open your cmd (command line) on Windows May 15, 2024 · I am running Ollma on a 4xA100 GPU server, but it looks like only 1 GPU is used for the LLaMa3:7b model. 1. txt amd_only. On multiple occasions I've been modifying llama. If your AMD GPU doesn't support ROCm but if it is strong enough, you can still Getting access to extra GPUs is sometimes a challenge. The downloaded checkpoints need to be converted to be able to use them with the AIME LLaMa-Chat repository. Apr 24, 2024 · 3. layers. Edit or create a new variable for your user account for CPU only at 30b is painfully slow on Ryzen 5 5600x with 64gb DDR4 3600, but does provide answers (eval rate ~2ts/s). Mar 31, 2024 · Solution. Click on Edit environment variables for your account. I just want to do the most naive data parallelism with Multi-GPU LLM inference (llama). If you have a AMD GPU that supports ROCm, you can simple run the rocm version of the Ollama image. Nov 12, 2023 · You signed in with another tab or window. Ollama on Windows includes built-in GPU acceleration, access to the full model library, and serves the Ollama API including OpenAI compatibility. Here are my main questions: How can a single GPU machine performs better for 1 request, than 4 Mar 21, 2024 · Intel’s GPUs join hardware support for CPUs (x86 and ARM) and GPUs from other vendors. In the beginning we typed in text, and got a response. cpp library to run fine-tuned LLMs on distributed multiple GPUs, unlocking ultra-fast performance. It supports a wide range of models, including LLaMA 2, Mistral, and Gemma, and allows you to switch between them easily. dhiltgen added windows nvidia and removed needs-triage labels on Mar 20. 10+xpu) officially supports Intel Arc A-series graphics on WSL2, built-in Windows, and native Linux. Multimodal AI is changing how we interact with large language models. 2022. With input length 100, this cache = 2 * 100 * 80 * 8 * 128 * 4 = 30MB GPU memory. Nvidia Volta: Previous GPU architecture (2017), optimized for high-performance computing (HPC) and AI, featuring Tensor Core technology for deep learning tasks. Obviously choice 2 is much, much simpler. cpp and ollama with ipex-llm; see the quickstart here. [2024/04] ipex-llm now supports Llama 3 on both Intel GPU and CPU. It is compatible with multiple models, such as GPT-J and LLaMA. yaml up -d --build. go, set these: MainGPU: 0 and NumGPU: 32 (or 16, depending on your target model and your GPU). 1. from accelerate. 04, Cuda 11. ExLlamaV2 already provides all you need to run models quantized with mixed precision. Extensible Architecture: The framework is GPU Selection. If possible, you can try upgrading your drivers. Have tried to clean slate the VM and start from beginning with Ubuntu as well as Cent OS. Jan 27, 2024 · In this tutorial, we will explore the efficient utilization of the Llama. Mar 18, 2024 · Since the GPU is much faster than CPU, the GPU winds up being idle waiting for the CPU to keep up. 2. py script that will run the model as a chatbot for interactive use. If you want to take advantage of the latest LLMs while keeping your data safe and private, you can use tools like GPT4All, LM Studio, Ollama, LLaMA. fd po qp bk mp bg hv zo yk ws