Beggining of a trip to the LLM world • Martin Lejko Blog

Introduction

Today, I started researching how to approach my thesis on private LLMs, with a focus on the general thing as I have never dabled into LLM’s and AI before. Since I prefer visual content over text, I began by watching YouTube videos on private LLMs. This gave me a basic overview and introduced me to the concept of fine-tuning models. I then sought more specific information, such as LLMs with business knowledge. A particularly helpful video was this one, which provided new perspectives and a clearer direction for my thesis. New terms such as knowledge embedding came up, making me realize that fine-tuning will only be a final touch if time permits. The first major task is to gather information and input it into my LLM.

Vision for the Thesis

I encountered several new concepts today. Fine-tuning involves customizing a pre-trained model to perform specific tasks using additional training, while knowledge embedding is about integrating specific knowledge into the LLM to enhance its performance and relevance. These concepts will be crucial for my thesis as it involves providing the private info we want into the LLM. Fine-tuning will be more of a cherry on top kind of thing. :D Right now, my vision is for sure not final as I am only at the starting line. But I image knowledge embedding will play the main character in my thesis. I will try to gather information about what is the best way to do knowledge embedding and how to do it in a way that the data is not leaked. Still going the not very professional way, and not reading any papers yet, because that would slow me down for sure and bore me to death. I think that being effective and efficient is the key to success. For now, videos and blog posts are my best friends in a way. They are short and to the point, and written in a way that non professionals can understand. Thats why I went for them first.

Overleaf

Lets put the researching to side. More tangible thing was that I started my overleaf. I was surprised how well it is working. I downloaded the repo from my university site, which I inter connected with github. So now when I make a change on overleaf it automatically syncs with github and visa-versa. Soo nice, and the macros there are really helpful. It prevents to make mistakes and makes the thesis more structured and with rules, that are automatically applied. I just filled in the information to the template and it filled everything. Just the content is missing sight. I am surprised / amazed and all right now, BUT I for sure know that the day will come when I will want to puch the monitor because of it. peepopray I HOPE that day will never come.

Privacy

Making a LLM with business knowledge is one thing, but making it in a way to not leak the data will be another. It will be a border that I cannot cross. It is a bit worring that I will be in some cases limited with the technology I can and cannot use. As most of the good stuff want your information to be even better. This I cannot provide. It is still too early to tell, if that will be a problem or not. It is just a concern of mine. I will have to see how it goes. As leaking a company’s data is a big no-no.

Until Next Time

Moving forward, I plan to continue researching methods for gathering and embedding business-specific information into an LLM. I will also identify more resources that discuss the privacy aspects of LLMs. Long-term, I aim to develop a comprehensive approach for embedding business knowledge into an LLM and plan the fine-tuning process as the concluding step of the thesis. I am excited to see where this journey takes me and how my thesis will evolve. Until next time, happy researching!