I just built a Twitter archive by hand. Learn from my mistakes.

We had a really great conference here a little over a week ago. (That “little over a week” is an important bit, as it turns out. Stay tuned.) The outcome of the conference is the emergence of a provincial network on academic integrity. After talking with our keynote, Dr. Sarah Eaton — who notes the lack of archival work around these networks and indeed the history of academic integrity is an impediment to research — and to our organizing team here, I set about working on the digital archive of the day.

Except… I didn’t. I mean, I did. I set up a meeting with the librarian in charge of our institutional repository; I started filing away key communications, etc. But, like. I also had, you know, the rest of my job to do. SSHRC applications are due imminently, we’re planning the winter term offerings, I had a workshop to run last week, and in Moodle-land I’m building finals and helping with gradebook issues. It’s not chaos around here, but it is busy. So knowing I had my repository meeting this week, I shelved figuring out the Twitter archive of the day — arguably the most complicated part of the project — until yesterday.

If you know more about the Twitter API than I did until yesterday, you know where this story is going.

Thanks to the ProfHacker blog, a space that so frequently delivers me from evil, I found two tools for scraping Twitter data via hashtags. I tried the first method, TAGS, which works really well… but this is where I ran into the issue of the Twitter API, which will only serve you about the last week of Tweets. GUESS WHO HAS TWO THUMBS AND WAITED NINE DAYS TO DO THIS WORK. GUESS WHO. THAT’S RIGHT, THIS GAL.

Can we talk about how absurd it is that you can be looking at the tweets in a Twitter search, but you can’t just download them? Give me my tweets, Twitter.

Anyway, so having learned the hard way that there is a hard limit on the easy way of grabbing tweets, I tried the other method mentioned in that ProfHacker blog post: a Chrome extension called Twitter Recorder. It, unfortunately, no longer works within the parameters of “New Twitter,” so while it looks like it’s chugging along, it’s… not. Not for the first time I wish I knew enough about basically anything to build an update to this tool, but I don’t, so here we are.

Much Googling took me to Microsoft Flow, of all places, which professes to have a template to grab tweets and import them to an Excel spreadsheet, and several users described getting tweets older than the API should have allowed. Worth a try. I followed all the instructions, and then… nothing happened. Assuming I had done something wrong, I quickly reviewed a LinkedIn Learning course, did it all again, and then… nothing happened. (The status of every run is listed as “skipped,” but all the checks passed. I’ve got nothing. Let me know if you know more about Flow than me, which is probably everyone.)

So at this point I’ve burned the better part of a morning (luckily I was also half-heartedly in a webinar) and I just gave up. I built a very basic Excel table and cut and pasted each tweet, user, date, and URL into it. I couldn’t face also extracting timestamps by hand, so we have an imperfect archive from a citation perspective, but as least we have the information.

And, now that I know a thing-or-two about a thing-or-two, I have a TAGS instance set up to regularly scrape the hashtag of our network — once a week or so — so that I don’t miss anything in the future. Because I never, ever, want to do this again.

So, here’s my advice: remember the Twitter API is at best going to let you access content from the last seven days, and act accordingly. And if you know in advance that you’re going to want to scrape data from an event, set up TAGS ahead of time. And don’t waste an hour playing with Microsoft Flow unless you are also in a very business-speaky webinar concurrently.

But. You know. I learned a lot.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.