Wednesday, August 28, 2024

Web Scraping With Python ( To display on LED Matrix)

 I had the opportunity to (semi) work with LED matrices at my place of employment, which led me to tackle this project here. I have set up this page here: https://techtucson.com/learning/scrape which we'll use as a real-world example. 

  • Download the page as an HTML file and save it to your computer as scrape.htm.
Our first task is to separate the first ROW and display how many spots are available. 


///
import pandas as pd

url = 'file:///C:/Users/mariouribe/Downloads/scrape.htm'
tables = pd.read_html(url)
df = tables[0]
first = (str(df.loc[0, 'Spots Available']))
new_string = first.replace("spots available.", " ")
new_string2 = new_string.replace("/", " ")
firstnumber, secondnumber = new_string2.split()
subtract = int(secondnumber) - int(firstnumber)
print(subtract,  "Spots Available")
///

Great , we have a working POC, but there's only one problem. The Microcontroller are not as powerful as my machine. While they have access to the internet are running MicroPython which means I won't have access to Pandas or better yet BeautifulSoup. I'll need to use built in libraries as much as possible. Back to the drawing board. 

///
import requests
import re

url = 'file:///C:/Users/mariouribe/Downloads/scrape.htm'
response = requests.get(url)
html = response.text

pattern = r'<td>(.*?)</td>'
regex = re.compile(pattern)
results = regex.findall(html, re.IGNORECASE | re.DOTALL)
garage = results[0]
numbers = results[1]

numbers_new = numbers.replace("spots available.", " ")
print(numbers_new)
numbers_new2 = numbers_new.replace("/", " ")
print(numbers_new2)
firstnumber, secondnumber, thirdthing = numbers_new2.split()
print(firstnumber ,  "Spots Available at ", garage)
\\\

So I got this working with the requests and regularExpressions library. But know there are a couple of more issues, some of the functions of the re library don't seem to exist in MicroPython :( , and I ran out of space while getting the reponse.text output. 

That's when I reached out to the developer and asked for a handout. An API was built where I can call a specific garage and get 20 lines of text which I can now parse without issues. 

More to Come Soon. 

2025 Certification Goals

Certified Information Systems Auditor (CISA) https://www.isaca.org/credentialing/cisa Practical Web Pentest Associate (PWPA pka: PJPT) http...