Wednesday, February 7, 2024

Setup PrivateGPT on a Fresh Ubuntu 22.04



TLDR; We'll set up PrivateGPT on a brand new Ubuntu 22.04 Install. 

I want to preface and warn you, that I am not a savvy user, much less an expert in Artificial Intelligence (AI) or Large Language Models ( LLMs). I am pretty sure I am going to say/type some wrong things :). 

I warned you, here's my write-up:

With all of the hype around AI and ChatGPT, I figured I'd join the bandwagon. A co-worker of sorts pointed out an interesting Github project 'PrivateGPT' that he has been using. His work is private in nature and while he could benefit from the advantages that a toolset like ChatGPT brings it is not feasible/permissible/frowned upon to give OpenAI or any other company the data you are working with( which is usually your clients' data.

That's where PrivateGPT comes to the rescue. The GitHub Repo "PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point." Please visit and support the repo located here: https://github.com/imartinez/privateGPT. The readme mentions that for the latest info, we should visit https://docs.privategpt.dev/.

What are my motivations? 
  • I briefly tried to set it up and failed, I gave up. Now I am back and forcing myself to get it working. 
  • AI is here to stay and what better way to learn than to play around with it. 
  • You never know if your business might be able to use it. 
Use Cases:

  • Well, I'll leave that up to your imagination. Think of PrivateGPT as a ChatGPT alternative that you can feed your documents( DOCS, TXT, PDF) and interact with them, 
    • We'll go over some test scenarios. 
We'll be using the installation instructions here: https://docs.privategpt.dev/installation. If the instructions are there, why do you need to read this? Because I failed once, I'll probably fail again as I am writing this as I go through the steps. I failed so you can learn from my mistakes, the idea is to give you a better starting point. With all of that being said let's get into the installation.

  • We'll start with a VM with a fresh install of Ubuntu 22.04. I had the Desktop Edition handy which will do the job and allow us to run a browser within the machine. 
    • I won't bore you with screenshots of this process. 
    • Make sure you update and upgrade your box. 
    • Take Snapshot before we begin, that way you can revert back to a clean slate.
  • I ran into some issues with dependencies, let's get these out of the way before we get started:
    • sudo apt install git curl gcc g++ pkg-config
    • sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev
      libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
  • Let's create a directory, then Clone the Repo
    • mkdir /home/MYUSER/PrivateGPT
      • cd /home/MYUSER/PrivateGPT
    • git clone https://github.com/imartinez/privateGPT 
Install pyenv and Python 3.11
  • Now we have to install Python 3.11 using a Python version manager.
    • We'll install pyenv
    • Let's use this writeup: https://medium.com/@therazmatrix/how-to-install-and-use-pyenv-in-ubuntu-22-04-fa7c28ca0b67
      • curl https://pyenv.run | bash
        • Then I added this to my /home/MYUSER/.bashrc file
        • # Pyenv
          export PYENV_ROOT="$HOME/.pyenv"
          command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
          eval "$(pyenv init -)"
          eval "$(pyenv virtualenv-init -)"
      • You'll need to restart the shell.( just close it and re-open it)
      • pyenv install 3.11
        • It worked :) 
  • Now Install Poetry
    • https://python-poetry.org/docs/#installing-with-the-official-installer
      • curl -sSL https://install.python-poetry.org | python3 -
      • That worked fine the first time, were on a roll. 
    • So I did not put the PATH for poetry in my .bashrc, because reasons but I can run it calling /home/MYUSER/.local/bin/poetry 
  • We need to create a virtual environment for our project to use the 3.11.7 Python install
  • in /home/MYUSER
    •  pyenv virtualenv 3.11.7 privategpt
  • then go to /home/MYUSER/privateGPT
    • pyenv local privategpt
    • pip install llama-cpp-python
    • poetry install --with ui
      • This will take a while
    • poetry install --with local
      • This will take another while
    • poetry run python scripts/setup
  • Finally
    • /home/MYUSER/.local/bin/poetry run python -m private_gpt
    • You'll see a message similar to:
    • You can now browse to https://127.0.0.1:8001
      • Upload a File, and ask it some questions. 

Friday, February 2, 2024

DNS: Why Can't I have a TXT Record ( Or any other record) alongside my CNAME record?

TLDR; Because that's how DNS works. https://www.ietf.org/rfc/rfc1912.txt


I've run across this issue various times in the last...we'll I won't tell you how long, but it's been a long time. Every time that I see this issue pop up I scramble and learn the same thing, in hopes that the lessons learned will stick I have decided to create a blog post. 

We are all accustomed to nice domain names (i.e. google.com, facebook.com), and as an end-user the backend inner workings are abstracted. What we do know is when I type in my domain on the browser, some magic happens. While I don't understand the complete magic I will do my best to explain why you can't have any other record alongside a CNAME record. 

What's a CNAME record, that's true let's take a step back. Let's take store.mydomain.com as an example, a DNS server is responsible for telling browsers how to traverse the internet and locate the server that is hosting your desired store. Other types of services that have dedicated records are Mail (email) servers they get their own MX record. There are various other records in the DNS scheme, we won't go through all of them but I've selected a sample to go over:


  • A Record
    • This record points store.mysite.com to an IP Address 5.5.5.5, this means that all traffic destined to your store will get forwarded to the IP address. 
  • TXT Record
    • Think of this as a text file that you can use to confirm ownership or management of a domain. This file is readable by the internet, in essence, if you can write to this file we can construe that you own the domain. 
  • CNAME Record
    • This stands for Canonical Name, the easiest way to think of this is an alias or a nickname. www.store.mysite.com can be a nickname for store.mysite.com. But let's take it a step further store.mysite.com can be a nickname to store.BIGCompany.server.hosted.com. That big company server can be Google, AWS, Oracle, or any company offering you a hosted service.
       
That's all great but why use CNAME vs A records if they both point to the same place? As an administrator, I can change records for mydomain.com at will without waiting for anyone else, on the flip side the administrators for the BIGComapny can update their records whenever they feel like it. In the A record above let's say that 5.5.5.5 needs to be updated to 7.7.7.7, if CNAMES were in use that change would be transparent to mydomain.com. Since we are using A records BIGComapny needs to let MyDomain.com know of the change and plan accordingly. For small mom-and-pop shops, it would be fine to coordinate and schedule time, but when dealing with thousands and possibly millions of domains and/or DNS Records it does not scale well. 

I figure giving a rundown on various types of records and why they are used is important to lay down the foundation. Don't get upset, but the reason you can't have CNAME records mixed with any others is that you can't :). DNS was built with this constraint in mind, why? That goes beyond the scope of this article. Taking an excerpt from https://www.ietf.org/rfc/rfc1912.txt, section 2.4 states "A CNAME record is not allowed to coexist with any other data.". 

I passed my CISA Cert