Source: JJ Gouin, via Adobe Stock

Suppose we want to find out how discourse in The New York Times has evolved over a period of time. These days it would be neat to analyze this, since whenever we visit the front page we get firehosed with news about the coronavirus. How has the pandemic shaped the headlines of one of the most popular newspapers in the United States?

To answer a question like this, we would need to collect article metadata from The New York Times. Here I describe how to do this in Python.

Table of Contents:

  1. Where to find The New York Times article…

When you Google how to do this, all these companies pop up, offering to do this for you if you pay them a small fee. But Twitter gives you all the tools you need to do it yourself—for free. This short guide takes you through a Python script that helps you to use those tools.

At the end of this guide, you get a script that downloads a CSV containing all the followers and/or friends for any public Twitter handle. This file will also contain the following details about each person:

  • username
  • user id
  • handle
  • location
  • bio
  • url
  • followers count


Source: Sielan, via iStock

When classes moved online in 2020, academic institutions across the country watched as the rate of cheating soared. It’s never fun dealing with plagiarism, but it is important to detect it, regardless of where we stand in the academic debate on how to best handle this type of cheating during a pandemic, as well as in general.

The best tool that helps you to do this for free is Stanford’s MOSS. This tutorial provides a quick route to setting it up, to help you hit the ground running.

Table of Contents:

  1. What is MOSS?
  2. How to register with MOSS
  3. How…

Source: Fuckology

A few days after the WHO formally declared the COVID-19 pandemic in March, some friends and I began collecting memes about it in a public Facebook album.

Initially, I shared the memes just for laughs—like, look at this absurd, unprecedented situation we’ve ended up in, we’re being told to stay home!—but as lockdown dragged on and on and on, the memes became a source of comic relief and distraction, helped to reduce the pains of social distancing, and often said aloud what we were all thinking.

Over the months, the album unexpectedly grew into a time capsule of a year…

Yesterday Netflix released Deaf U, which is not your average reality TV show. Immediately, you get dropped into rich and vibrant Deaf culture, where hands fly with sign language, and a controversial social hierarchy exists based on one’s degree of cultural Deafness.

But like almost every other reality TV show, Deaf U hypes up what sells — sex, partying, and drama. There isn’t a minute in an actual classroom.

Some in the Deaf community hope that despite this angle, presenting these topics through a Deaf lens will help redefine mainstream society’s perceptions about deafness, by giving them something familiar, relatable…

Source: charles taylor, via Adobe Stock

In this post, we write Python code to generate fake news headlines, using a Markov chain model trained on a corpus of real headlines from The New York Times over the past year.

Some of these fake headlines:

I Used to Hold Hands?

‘We’re Going Down, Down, Down’

We All in This Picture? | Sept. 18, 2019

Zonked on Vicodin in the Presidential Race

Mike Pence Makes Clear There Is a New Constitution

Who Knew How to Clean Your Child’s DNA Information?

A.I. Is Learning That Liberals Eat Their Own Lawyers, Too

How New Yorkers Want Cheap Wine, and Lots…

Source: Sikov, via Adobe Stock

Since its inception in 1991, arXiv, the main database for scientific preprints, has received almost 1.3 million submissions. All of this data can be useful in analysis, so we may want to be able to access the full-texts in bulk. This post goes over how we can do this using Python 3 and the MacOS X command line.

Steps for bulk accessing full-texts using the command line

Although the data is sitting right there on the server, it is not recommended to crawl arXiv directly due to limited server capacity. …

Are you interested in using the popular Python library Matplotlib to analyze text messages or any other conversational medium that includes emojis? You may have noticed some difficulties in visualizing those emojis.

This post investigates why Matplotlib cannot plot emojis from the Apple Color Emoji font, and how we can overcome this lack of support to get the results we want.

The problem

An attempt to plot emojis:

This graph has no meaning other than to demonstrate the appearance of plotted emojis.

Although we explicitly specified it as the font, these emojis are not Apple Color Emoji. …

Brienna Herold

Data analyst. I write about my fun projects, in addition to how-to guides that help you get data for your own fun projects!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store