My word cloud twitter

Inspiration

We both knew Java. We both also needed an idea, so we searched through a list of APIs to use. I (Langston) also attended the Node.js lecture, which discussed things like JSON.
Two APIs caught our eye: the twitter API and a word cloud API. We decided to combine the two.

What it does

First, it authenticates our twitter organization by exchanging keys for an OAUTH2 token. The keys were gained when we registered our twitter app on twitter.com.

Once we have the OATH2 token, we can then use the twitter search API. It sends a POST request to the search API, with the OATH2 token as a header. We then get a JSON file 80 tweets that match the source criteria.

We then had to turn the JSON file into a list of words in the tweets. We did this by extracting the «text» field from the array of tweets in the JSON file. Then, we split each tweet on the space character to create an array of words.

Next, it sanitizes the list of words by removing any instances of the search query, any punctuation, any URLs, or any short words (<3 chars).

It then passes the word list, along with formatting options, as a POST request to the Word Cloud Generator API. The API returns JSON with the URL for the Word Cloud image.

How I built it

We built it with Java and Eclipse. We essentially used the java-json library and a bunch of miscellaneous HTTP classes to send POST requests. Although originally we hard coded the specific POST requests we wanted to do, eventually we generalized the POST request to work with any JSON text and headers.
We also used the APIs previously mentioned.

Challenges I ran into

We had an extremely difficult time doing the authentication, and it was the single longest step even though it is the least impressive. We had a hard time figuring out even what strategy we were going to use, much less how we would do it.
However, eventually we figured out how to send JSON requests with Java, and we decided on an Application-only approach to the twitter API. We also had to make a hashing mechanism for our keys.

Accomplishments that I’m proud of

I was very proud of getting the Twitter API working, because it was my first time working with HTTP or JSON. Also, it is very cool to be able to get data from a web server like that.

What I learned

We learned a lot about APIs, and JSON (and a little bit of OAUTH 2).
Also, we learned how difficult it is to work with raw post requests in Java, so maybe next time it would be better to use Node.js or python.

What’s next for Twitter Word Cloud

Switching to a free and more powerful word cloud library, or hosting it on a finished website.
If the code was to be public, we would need to encrypt our keys better, because right now they are not secure.

Twitter Cloud

This is a Sinatra web app which generates a word cloud from a Twitter username. The app can be viewed online here.

Dependencies

  • Ruby 2.3.1
  • Git
  • Bundler gem
  • Twitter gem
  • Sinatra framework
  • jQCloud to generate the word cloud
  • Skeleton CSS framework

Features

  • Asks user for twitter handle
  • Makes request to Twitter API for 200 tweets
  • Calculates frequency of each word
  • Displays a word cloud generated from tweets by the given Twitter user
  • Stop words removed from word cloud
  • Works in a mobile browser
  • Accessible on the open internet

Installation instructions

Prerequisites:

  • Ruby
  • Git
  • Bundler gem

Set Up

Clone the repo:

$ git clone git@github.com:lsewilson/twitter-word-cloud.git

Install gem dependencies:

$ cd twitter-word-cloud
$ bundle

Copy example environment variables file to a new .env file:

$ cd app
$ cp .env.example .env

Open your new .env file and replace all the *** placeholders with your own Twitter API keys.

Running tests

This app has been tested using Rack-Test, RSpec, Capybara.

Running the app

Using Rack:

$ rackup
Go to localhost:9292 in your browser.

Using Shotgun:

$ shotgun
Go to localhost:9393 in your browser.

Fill in the form with a public Twitter handle and click submit to generate a word cloud.

Screenshot

My Approach

Initial planning:

After researching various word cloud libraries and gems, I decided to use jQCloud. As a result, this meant that I would need an API which returned a JSON object that jQCloud could interpret and use to render a word cloud.

For the back-end I chose to use Sinatra because of its lightweightness compared with Rails.

Building the app:

Once I had a basic index page with a form for a Twitter handle set up, I started working on building in the Twitter API. My first goal was to make sure I could render the tweets of an inputted user on the page. When I was certain the API was working correctly, I started building a TweetParser class which would read the collection of tweets it was passed and return my word cloud JSON object.

With the basic app working, I deployed it to Heroku so that it was accessible on the open internet. This helped me notice bugs in the app, such as in the AJAX requests which were originally directed at the localhost.

My next task was removing stop words and ensuring that any hyperlinks or Twitter user tags were not included in the word cloud so I adapted my TweetParser class.

This was all I managed to do in a day, and there are still features I would like to have added.

To do:

  • Replace jQCloud with custom built word cloud library.
  • Make the site more mobile-responsive.
  • Jazz up the styling.

Word Cloud for my Twitter Account (pbaumgartner)

Word Cloud for my Twitter Account (pbaumgartner)

In yesterday’s post, I have experimented with R packages for generating Twitter Word clouds. In this post, I will give some hints how to proceed. I will also refer to my GitHub repository, where you can find the complete program code. I have added some examples in generating all the twitter clouds for all member of the IBM staff with a Twitter account, for the department and the university account. 

Steps for generating twitter word clouds 

1. Generate Twitter API key

For the purpose of authentication, you have to get a Twitter API key. You have to create an application in Twitter via https://apps.twitter.com/app/new. Creating a Twitter application is free and you don’t need to know all the details for programming a Twitter API. This is done by the R packages twitteR. 

There are several tutorials how to get the Twitter API key: See for instance this YouTube Video or read the article on R-bloggers.

2. Install R and copy the R word cloud program

If you haven’t installed R yet, read one of the many tutorials: For instance: How to install R and a Brief Introduction to R. I recommend also to install RStudio as THE interactive integrated development environment (IDE) for R. (You must install first R, and after that RStudio.) If you want more to do with R as just producing the word cloud, then you should read the (in my opinion) best and still very gentle introductory book by Hadley Wickam: R for Data Science. It is free available on the internet!

You have to fill in your authentication keys and the user account for the word cloud. For instance, the line with my account would be:

user = 'pbaumgartner'

3. Experiment with the different parameters

The last task before you run the program is to adapt the parameters for your word cloud.

Twitter Word clouds: Setting parameters

 # experiment with different settings of the parameters
 if (require(RColorBrewer[/efn_note] {      # using color palette from RColorBrewer
     pal <- brewer.pal(9,"Blues")  # sequential color palettes
     pal <- pal[-(1:4)]            # for a one color (shaded) appearance
     wordcloud(                    # call the essential function
        words,                     # used words by this account
        freqs,                     # frequencies of every word in this account
        scale = c(4.5, .3),        # size of the wordcloud
        min.freq = 6,              # high (5+) if not many different words
        max.words = 200,           # use less (100) if the account is new 
                                   # (< 500 tweets)
        random.order = FALSE,      # most important words in the center
        random.color = FALSE,      # color shades provided by RColorBrewer
                                   # remove RcolorBrewer and set to TRUE
        rot.per = .15,             # percentage of words 90% rotated
        colors = pal)              # use shaded color palette from RColorBrewer
 }

You see this is a little bit complex as there are many different parameters. The best and fastest way is to duplicate the program snippet above and to run it as a separate program. For this, it is essential that the hard word (text mining and transforming the data from the Twitter account is already done and all the variables are still in the R memory.

Examples of Twitter word clouds

You can see a big difference in comparison with the clouds I have published yesterday. This time I have adjusted the parameters so that all word cloud have a similar size and have more or less the same amount of information. You can see the parameters I have used on this page here.

The Twitter account of Wolfgang Rauter is a very new one. So he has not many tweets yet (21). Therefore I had to tweak the parameters. Instead of using a minimum frequency of 5(yesterday)  I had to use 1 and to limit to 100 (yesterday: 200) words.

Adjusted Word Cloud for @wolfgangrauter

Adjusted Word Cloud for @wolfgangrauter
Wordcloud @wolfgangrauter
Wordcloud not adjusted @wolfgangrauter

Another interesting tweaking example is the timeline of @donau_uni.  The word ‚presseaussendung‘ (yesterday) is very dominant (frequency = 240) and destroys a nice appearance of the word cloud. I could delete this word from the list or – I used a greater scale for the cloud with the effect that the huge word ‚presseaussendung“ could not be displayed in the predefined limits.

Word Cloud adjusted: @donau_uni

Word Cloud adjusted: @donau_uni
Wordcloud not adjusted @donau_uni

Enjoy!

Like this post? Please share to your friends:
  • My word book spelling
  • My word book cambridge
  • My word bearers army
  • My word bank book
  • My word background is black