Module 3 - Regex of the Correspondance

(I think I will try to vaguely keep the Star Wars theme going with the titles of these blog posts, because why not?) After the frustrations of Module 2, Module 3 was certainly a breath of fresh air. Once again, I did the first three exercises, mostly due to time constraints.

All of the exercise results for this module can be found on my GitHub here.

After the API exercise, the TEI exercise was kind of relaxing. Analysing the site's usefulness and reliability is something that I've constantly been practicing since first year. I don't mind transcriptions, either, and we had already done a transcription exercise in one of my second year courses, so I was no stranger to it. Encoding was completely new to me, though, and it was a pretty neat experience. I don't know how useful it is to be doing all of the encoding by hand all the time, especially with large amounts of data (I'm sure there's a program for batch encoding, right? I mean from what I get from this course, there's a program for everything and if there isn't, it's not impossible to make your own), but I can see how it makes things really useful for sorting and analyzing purposes. Also, things weren't working for me well at one point - the encoding just wouldn't work. After some staring at the code and double-checking the workbook instructions, I realised the workbook had a syntax typo for the place name tags. After I fixed that and mentioned it on Slack, everything worked out fine. My notebook entry for this exercise can be found here.

At first, Exercise 2, the Regex exercise, had me staring blankly at my computer screen. The syntax seemed really complicated and I could hardly follow the explanations. I am definitely not strong with equations, and the string of symbols reminded me a lot of high school math class and how easily I got lost there. I took the instructions slowly, step by step, though, and managed to work through it. I can definitely see myself trying to use it again in the future, considering how useful it is compared to regular search methods, but I will also definitely still need to keep a cheat sheet open for it because it would take a lot of practice for me to memorize how everything works.

I think OpenRefine is magical. It was so nice to use in Exercise 3 for OCR text! It didn't catch everything, but it made cleaning up all that data and clumping it together properly so much easier. As I mentioned in my research notebook, I did have to go through some parts manually to correct them, as the different filter settings did not catch everything. As well, I did end up with more names in both the sender and recipiant columns than the workbook instructions said I would. I am not sure exactly why that ended up being the case. Gephi and Palladio were also interesting too, although I think of the two, I prefer Palladio. I found Gephi harder to figure out, whereas Palladio was very simple to use. They both provided visualizations for the data, and I liked having a visual representation of who sent letters to who.

The notebook entry that discusses both Exercise 2 and Exercise 3 can be found here.

Written on April 3, 2016