MapD Tweetmap Installation and Quick Overview
Try HeavyIQ Conversational Analytics on 400 million tweets
Download HEAVY.AI Free, a full-featured version available for use at no cost.
GET FREE LICENSETwitter is an excellent way to gain insight into a wide variety of social phenomena. Data can be sliced in many different ways, looking at user, geography or topic/hashtag as well as diving down to the level of individual tweets. In fact, MapD originated when Todd Mostak needed to build a better interactive analysis tool to understand the Arab Spring. Streaming data from Twitter was added later.
MapD recently released an open source Tweetmap Demo online showing roughly 390 million tweets. The code is available on GitHub. It installs easily on Mac OS X. You can get it running locally in 10 minutes and start modifying the code. You will learn how to easily download and install an interesting dataset (Twitter), search through the data, and get an understanding of the three main libraries that are used to display the data. It’s a great learning experience.
Installation
git clone https://github.com/mapd/mapd-tweetmap-2.git
npm install
You will see a few warnings. These can be safely ignored for your initial test.
npm start
You will see a message that says, webpack: Compiled successfully
You can now open a local browser to:
The local test is running a small static dataset of approximately 572,000 Tweets.
The dataset is designed to provide a concise overview of the features of the MapD platform. You will be able to run it on most common hardware. When you want to scale to the 1 billion rows of data that MapD is capable of, you can simply move the application to a GPU instance on your own server or a cloud hosting service such as AWS or IBM Bluemix.
To show the ability of MapD Charting to find data points that can excite your audience, let’s do a fun search is to show the number of parties across the world. The sample dataset captures Tweets from October 31st, 2014. What will we find?
At that one point in time, people tweeted the word party 142 times.
I also looked at people that went trick or treating.
By typing in the URL manually, you can see the actual Tweet or Instagram post. This will provide a brief glimpse into the fun and friendships that people share on Twitter.
This is a quick look at the search features of a dataset. MapD also provides a number of technology platforms to easily perform complex multi-dimensional searches in real-time, either from a dataset in a database or from a stream of real-time data.
Code
The data visualization magic on the front end is possible because of the connection to a live MapD database using three open source APIs. There are three front-end libraries that you should look at:
- Mapd-connector - a JavaScript library for connecting to a MapD GPU database and running queries
- Mapd-crossfilter - a JavaScript library for exploring large multivariate datasets in the browser
- Mapd-charting - dimensional charting built to work natively with crossfilter rendered using d3.js
You can get a quick understanding of how to use MapD Connector and MapD Crossfilter by looking at the code here.
MapD Connector
In /src/services/connector.js, you can see the connection to the database.
This is a hosted MapD Core database with a small dataset for testing.
require("@mapd/connector/dist/browser-connector")
const Connector = window.MapdCon
const connection = new Connector()
.protocol("https") .host("metis.mapd.com") .port("443") .dbName("mapd") .user("mapd") .password("HyperInteractive")
// log SQL queries // connection.logging(true)
export function connect () {
return new Promise((resolve, reject) => connection.connect((error, result) => (error ? reject(error) : resolve(result)))) }
export function getConnection () {
return connection }
Documentation for MapD Connector is available here. In the example above, the host([metis.mapd.com])(http://metis.mapd.com) is a server that MapD makes available for testing. You can replace the information here with your own MapD Core server, either running on your local workstation in a hosted cloud instance such as AWS or IBM Bluemix.
MapD Crossfilter
Cross filter allows you to apply a filter to one chart and automatically update the other charts. You can see this in /src/services/crossfilter.js
import * as CrossFilter from "@mapd/crossfilter"
import {TABLE_NAME} from "../constants"
let crossfilter = null
export function createCf (con) {
return CrossFilter.crossfilter(con, TABLE_NAME).then(cf => {
crossfilter = cf
return Promise.resolve(cf, con)
})
}
export function getCf () {
return crossfilter
}
mapd-crossfilter forms SQL queries that are used to retrieve data which will then be rendered by mapd-charting. It is based on Crossfilter from Square, but adds the ability to make asynchronous network requests to retrieve data. See their wiki for more information.
MapD Charting
The GitHub repository for MapD Charting comes with a number of examples for you to browse and edit.
Streaming Data
The Tweetmap Demo on the MapD site has close to 400 million Tweets in it, much larger than the 0.6 million Tweets in the demo on GitHub.* * You can stream data with either Twitter API or a service like GNIP, Twitter’s enterprise API platform. Gnip comes with a full set of well-documented APIs for both realtime and historical data. Their PowerTrack, Volume and Replay streams use Streaming HTTP protocol to deliver data through their API.
MapD developers can insert this stream of data into MapD Core with StreamInsert.
Documentation on using StreamInsert is here.
StreamInsert can be attached onto the end of a data stream. The data stream could be another program printing to standard out, a Kafka endpoint, or any other real-time stream output. Users can specify the appropriate batch size according to the expected stream rates and desired insert frequency. The target table must already exist before attempting to stream data into the table.
Example:
cat file.tsv | /path/to/mapd/SampleCode/StreamInsert stream_example mapd --host localhost
--port 9091 -u mapd -p MapDRocks! --delim '\t' --batch 1000
The MapD table used in the streaming should be set up with a row_count to automatically manage keeping just the latest records.
With almost 400 million Tweets, I can see that there are 38,000 Tweets about hockey.
You can see some of the hotspots and drill down into individual Tweets.
With a larger dataset, you can discover new facts about hockey and the culture of hockey.
Challenge Yourself!
Download MapD Tweetmap-2 from GitHub and connect to your own MapD Core server. Import your own data or connect to a data stream. Please let us know what you build. We’d love to hear from you. Thanks for your time!