In The Name of Science: 2010

Monday, July 12, 2010

Massively multi-player games: an epidemiologist's workshop

I've been engrossed in reading This Gaming Life by Jim Rossignol recently. A significant portion of the book discusses the oddities of the role of video games (and games in general) across world cultures. This book also presents the most comprehensive defense of playing video games recreationally, that video games are not a pointless time-sink and that the players are not simply "blinking lizards, motionless, absorbed, only the twitching of their hands showing they are still conscious [B. Johnson]."

The most fascinating content of this book is the discussion of how people behave in multi-player online games. Numerous examples are given, ranging from piracy and espionage in EVE online to collaborative meeting spaces in 2nd Life. Such aspects of gaming really hit on my own personal research interests of human behavior in technology mediated social networks. One example of human gaming behavior in particular highlights an unexpected application of online gaming worlds: modeling the spread of disease, or epidemiology.

At one point the game World of Warcraft had a game play mechanic where players could contract a disease called "corrupted blood". This disease caused physical harm to the players' characters, and was communicable from one player to another. The disease was never intended to appear except under very controlled circumstances. Once the disease was unleashed, Blizzard had to run rolling restarts of their servers to eradicate the plague.

Epidemiologist use complicated statistical models and utility function driven multi-agent simulations to try and predict the spread of diseases. One problem with these models is that they are missing a human component. The models are designed to approximate human behavior on a large scale, but this can only truly be accomplished by measuring actual human behavior.

I'm a little late to the party with this story, but Nina Fefferman has been featured extensively in the press for her 2007 paper with Eric Lofgren titled "The Untapped Potential of Virtual Game Worlds to Shed Light on Real World Epidemics," published in The Lancet Infectious Diseases. Nina has recognized that online gaming worlds contain a wealth of data about human social behavior and that this could be used for epidemiology.

According to Rossignol, these data can be harvested from games without affecting the gamers' experience in any way. An online community could be infected with a disease which exhibits no symptoms at all, but merely spreads in a non-deterministic way. As long as the characters carrying the the disease (or payload) can be identified then useful data will be available to researchers.

Harvesting human behavior patterns for epidemiology is just one of the ways in which modern gaming can benefit society. Passionate gamers and radical scientists like Rossignol and Fefferman are helping to give gaming a good name.

Friday, June 4, 2010

More colors on a mobile device = bad?

Intuition says that more colors is better than fewer colors, right? Many mobile devices use size of the color palette as a marketing point. The hotly anticipated new Android phone, the HTC Evo 4G, has come under criticism for supporting "only" 65K colors (16-bit) while the competition (iPhone) supports 4 times as many (18-bit or 262,144 colors).

Are these colors really necessary? Is there any reason why would a new flagship model have only 1/4th of the colors of the reigning champ? Mike Calligaro from the Windows Mobile Team Blog has the answer.

First, it is very difficult to efficiently utilize the circuitry needed to drive 262,144 colors. This many colors requires 18 bits (2^18) per pixel. These data bits are held in a specialized memory location called a frame buffer. They are fed to the frame buffer by the CPU (or perhaps a GPU). Nearly all mobile devices work on a 32 bit CPU. Nearly all modern processors are designed around powers of two, and that is why we commonly see 32 and 64 bit processors today. One register in a 32 bit CPU can store the data for 2 pixels, with no wasted space in the register, and operations can be performed on two pixels in a single pass. Processing an 18 bit pixel with a 32 bit CPU means that only one pixel can be fit in the data register and 14 bits (32-18) of the computational cycle is wasted. One way to combat this is with sophisticated programming and circuitry to spread a portion of the next pixel over the remaining CPU space. This difficulty is explained more thoroughly in the blog post.

So what does this mean? This means the CPU has to do significantly more work to drive the same number of pixels for these extra colors. Mobile devices in particular are often CPU constrained in an attempt to save energy usage and meet a target cost. Wasting CPU power on extra colors could result in sluggish performance.

(click to enlarge)

Secondly, after a point the number of colors means little more than a line of marketing propaganda. The screens on mobile devices are typically 3.5" to 4.3" at best. Even at larger sizes it is often impossible to distinguish between the higher color depths. The Windows Mobile Team Blog has provided a very simple app which encodes a 24-bit source image into 18, 16, and 12 bit color depths. The 12 bit is quite noticeably poorer in color fidelity, but the remaining images are difficult to distinguish, and almost certainly not worth the performance hit that higher color depths require. So remember when comparing the specs on new devices that the numbers alone do not tell the whole story.

Monday, May 17, 2010

Image Error Level Analysis

Image Error Level Analysis is a great online tool for detecting digitally manipulated images. Digital images are ubiquitous. Most cell phones now have a built-in digital camera. Popular internet destinations such as Facebook, Ebay, and dating sites rely almost entirely on digital images. Digital images drive e-commerce, social networking, advertising, dating, and many other web enterprises. Digital images are even admissible in federal and state court in the US. So how do we know if they have been tampered with?

Photo editing software has been available for decades, and it gains more sophisticated capabilities. Everybody knows image editors can be used to tweak the colors of prepared dishes on a recipe web site or enhance photos on dating sites. Adobe Photoshop CS5 has a powerful new tool called "content-aware fill" which is capable of erasing portions of a picture and generating new content to seamlessly patch the missing pieces. One fascinating example can be seen below (images thanks to osici on SomethingAwful). The first image shows the surface of Mars as taken in a series of photographs from a Mars rover. Four rectangular photographs have been stitched together to form a panorama of the Mars surface. Because the camera was not perfectly horizontal, stitching together the images creates a black sawtooth border. The second image has been modified using the content-aware fill tool to generate content to place over the black border. The rocks, dirt, and sky shown in the second photo where there exists only black in the first are generated completely algorithmically. These portions were never part of any picture which was sent back by the rover.

This is very impressive technology which can have many good or bad uses. So how do we know if an image has been tampered with? Aside from noticing small details like a missing shadow or illogical lighting and reflections in the image, the digital encoding of an image file itself can leave many clues. Image Error Level Analysis has an ingenious way of detecting tampered images.

Photographs on the internet are almost always encoded using JPEG compression. This is a lossy compression technique which leaves behind visual artifacts, which act as a signature. Sometimes these artifacts are too subtle to be seen with the naked eye, but image analysis software can detect them. When images are manipulated, often source material from other photos is added in some way. These other photos are also compressed using JPEG compression, and have a slightly different signature. Even edits directly to a single source image often result in the recompression of a portion of the image. This variation in compression level across a single image is a tell-tale indicator that an image has been altered. Image Error Level Analysis is an online tool for detecting if an image has been altered. The user simply supplies a URL to an image, and the program generates a corresponding image which is a mapping of quality level differences.

In the author's words:

"It works by resaving an image at a known quality, and comparing that to the original image. As a jpeg image is resaved over and over again, its image quality decreases. When we resave an image and compare it to the original, we can guess just how many times the image has been resaved. If an image has not been manipulated, all parts of the image should have been saved an equal amount of times. If parts of the image are from different source files, they may have been saved a number of different times, and thus they will stand out as a different colour in the ELA test.

Error level analysis allows you see to see the difference in quality level, represented by brightness. Things which are very bright have been edited most recently, whilst duller parts have been resaved multiple times. This allows you to see not just which parts of the image have been changed, but in what order the changes occurred."

The above image clearly shows where the original photo was digitally manipulated, thus indicating that it is not the original image. This tool does not guarantee conclusive findings, but it can certainly be helpful in detecting manipulated images. Also, the image creator could increase the likelihood of a false negative by monitoring and altering the compression levels. This is definitely a fascinating and well-implemented online tool for attacking a very interesting problem.

Sunday, May 16, 2010

Copenhagen Suborbitals

Copenhagen Suborbitals is a nonprofit group of Danish hobbyists who embarked on the lofty mission to send a human to space. Even more impressive is that the team is a varied group of engineers and hobbyists working on a volunteer basis during their spare time. Their projects are run on a shoestring budget provided by both corporate and private sponsors.

It's easy to write this crew off as a ambitious dreamers that will never get off the ground, but they have the expertise and experience to make their manned spaceflight mission a complete success. Within the past few years they created an 4-8 person functional submarine, the UC3Nautilus, which is capable of diving to 500 meters (about half the depth of the Danish military submarines). The real kicker is that it was created on a razor thin budget of approximately $200,000 USD. In line with the explorer/hobbyist agenda, the UC3 has no torpedo tubes. In their place are small observation windows. The construction group isn't all business and no fun either, the UC3 serves as a primary attraction to the Illutron Collaborative Interactive Art Studio. A snapshot of the team's impressive qualifications as engineers, scientists, designers, and artists can be found here.

So does Copenhagen Suborbitals have what it takes to send a man to space? Well they are way past the paper planning stage. In February 2010 they successfully demonstrated a full-size solid fuel rocket called the HEAT 1X, generating 210,000 horsepower.

By June they plan to use this rocket to launch their own spacecraft named after the Danish astronomer and engineer, Tycho Brahe. This spacecraft will contain a payload for experimentation and return safely to earth. While this summer's launch will be unmanned, the Tycho Brahe spacecraft is designed to accommodate a single person for manned spaceflight. Up in a cold airfield in Denmark this ambitious, dedicated, and talented group of volunteers are well on their way to achieving their dream.

Thursday, April 22, 2010

Bits are Bits

What do movies, music, video games, telephone conversations, textbooks, magazines, television shows, tweets, SMS, blogs, etc. all have in common? They are increasingly created and distributed as bits. Each bit is either a 1 or a 0, and the bits in the latest video game are of course the same as those representing a math textbook. The real magic is the careful arrangement of bits and communicating these bits to the right place at the right time.

Clearly a musician has a different set of skills than a painter or a programmer. But the ultimate output of their work is nearly always a sequence of carefully placed bits. The music industry has the specialized skills needed to transform an artist's expression into recorded sound. Historically they have also been responsible for publishing, distributing, and selling these recordings. Lately they have taken on the additional challenge of enforcing copyright issues. The very different skills of a software developer necessitates its own industry for creating useful or entertaining software. This software industry also publishes, distributes, and sells software in addition to handling intellectual property concerns, namely through DRM (digital rights management). Two wholly independent industries exist (music & software), but a large portion of the effort, namely publishing, distribution, and sales, is completely redundant. So with this underlying digital representation common across all signals and media, why are we stuck with many different, competing channels for the distribution, sale, and protection of bits?

In a word: legacy. Historically phone companies had very little in common with book publishers and television broadcasters. Lucrative industries formed organically around each technology. Each industry grew apart. Public libraries sprung up to allow people to share printed materials. Visitors could use any book in the building, and most were eligible to be taken home for a period of time, often free of charge. It is quite hard to imagine public libraries offering to loan computer software to take home (or download) and use for a specified period of time. These separate industries grew in different directions, and this leads to the redundancy, hassle, and inefficiency we find today.

One obvious example of bit-gouging is the price which U.S. cellular telephone service providers charge for text messaging (SMS). A service plan of $59 per month will give you 1000 minutes of talk time and a free phone on TMobile, but text messages cost 10 - 20 cents each. And fees are levied for a sent message as well as a received one. So with these numbers, a minute of speech costs ~5.9 cents, while a text message costs 20-40 cents. The number of bits in a text message is much smaller (at a max of 160 characters, the maximum text message is only 1280 bits). A minute of speech (audio) on the other hand is on the order of 4 MB before compression. Therefore, one minute of speech contains roughly 3000 times the number of bits as a text message, but the cell providers charge over 5 times as much for the text message.

So how can the cell providers get away with this? Simple: they charge what people are willing to pay. With so many different bit channels available to us today and the growing adoption of smart phones, I suggest using email messages in place of text messaging. Granted, this may not be a smart choice if your provider charges for data downloads by the bit. With growing support for push notification, this is becoming the seamless experience that SMS is praised for. The point is, the industries that served and sold us bits in the past are quickly getting in the way of themselves.

Industrial protection of the bit channels leads to frustrating attempts at DRM. I certainly do not condone piracy. AAA games can cost development studios up to 50 million dollars to create. They obviously rely on sales to stay in business. One of my first experiences with DRM is with the game Where in Time is Carmen San Diego. The game included a ~1000 page printed encyclopedia in the game box. When starting the game (from a DOS prompt no less) the program required the user to look up key words in the encyclopedia to begin playing. This is based on the premise that copying the game disks is easy, but copying a 1000 page printed book is difficult. At the time this seemed a huge hassle. Today we would be thankful for such a transparent user experience.

Microsoft took a huge step in the wrong direction when Windows XP started the "activation" trend, which continues to this day. Users must purchase the software and a license. When the software is installed, the computer must be connected to the internet (now internet connectivity is assumed, but some specialized use computers still remain offline), and the computer must connect to one of Microsoft's servers and exchange license information before the software would work properly. If something unforeseen happens and Microsoft disappears, then the software becomes a shiny coaster. The most egregious violations of personal privacy are the Sony root kit disaster and the MSN music debacle. Two of the biggest names in digital commerce are responsible for sneaking software which opens root vulnerabilities on users' computers and the swift disappearance of entire libraries of legally purchased music (respectively). At best, such measures of controlling the sharing of bits is equivalent to being physically searched for stolen goods when leaving a shop. At worst, DRM measures effectively penetrate personal privacy and render legally purchased products useless.

I am certainly not the first to suggest this, but it is time to adopt a new way of publishing, distributing, selling, and protecting bits. I propose that all digital media and communications adopt a unified model for all stages beyond the creative phase. The current industries are siloed, redundant, and cumbersome. People are willing to pay for valuable content, and they do not deserve to be treated like criminals unless there is reason to believe they are.

Thursday, April 8, 2010

What is Texas?

Obviously it's a state. Perhaps it's also a state of mind. As a Texan for most of the last 6 years, it seems Texans strongly identify with Texas, and no matter where you're from, Texas carries a lot of weight.

Much like New York City overshadows the state of New York, it seems people identify with the state of Texas perhaps more than even our country. A whole host of catchphrases have sprung up including: "Don't mess with Texas", "Texas is bigger than France", "Everything is bigger in Texas", "Buckle up Texas", etc.

Texas has a rich history of being fiercely independent. At one point it was its own republic, and more recently Governor Perry refused federal stimulus funds. This independence has fostered a stronger sense of state identity than is found elsewhere. Everybody knows Texas is the "Lone Star" state. After living in Ohio, the only reason people know Ohio is the "Buckeye State" is because of a successful collegiate athletics program at The Ohio State University, where a Buckeye is the mascot. Four years in Pennsylvania has taught me that nobody knows Pennsylvania is the "Keystone State".

The name Texas conjures strong imagery to Texans and non-Texans alike. Texans associate the word with freedom, friendliness, independence, and familiarity while non-Texans are likely to think of horses, cowboy hats, silly boots, and bola ties.

Whatever the reason for this state identity, major companies have certainly cashed in on its strength and influence. In this sense, Texas is a brand name. The name Texas signifies that something is bigger, better, or somehow more authentic. Of course such shameless branding has led to wonderful consumer goods such as: Texas shaped tortilla chips (found earlier today in the local HEB grocery store), Lone Star beer (the national beer of Texas), countless other beer signs and Texas logo emblazoned on beer cans, and last but not least, pickup trucks. The big 3 US automakers generally sell special edition trucks specifically for the Texas market, such as the King Ranch edition Ford SuperDuty (pictured).

This post is not intended to be a rant against the nebulous concept of Texas; it is intended to spark thought about a fascinating peculiarity. Many Texans are very proud of their state, as reflected in the words of the immortal Willie Nelson:

It's where I want to be

The only place for me

Where my spirit can be free

Texas

Monday, April 5, 2010

The Problem with iPad

The Apple iPad went on sale April 3rd, 2010 to much fanfare. Apple is billing it as a new type of device, and reviewers consider it a tablet media player/PC. Reputable reviewers are raving about iPad with such superlatives as: "a laptop killer", "a winner", "one of the best computers [sic] ever", and "the iPad beats even my most optimistic expectations. This is a new category of device. But it also will replace laptops for many people."

Now the iPad is definitely a slick piece of kit. Apple has created a beautiful hardware product. It is gorgeous in its simplicity and solid metal and glass construction. The size is a ideal for transporting while being big enough to handle some real reading tasks. The benefits end here. Unfortunately the iPad is saddled with a few crippling weaknesses.

Cory Doctorow has a great explanation here. The main argument is against the restrictive software model. Computers are meant to be examined, experimented with, tinkered with, and optimized. Even if a casual user wants only to to check emails and browse YouTube, he benefits greatly from those who are more inclined to take a closer look at the machinery in front of them. These "power users" are the ones who create free and open source software. They're the ones who keep the big guys honest by discovering flaws in commercial software. They're the ones who evangelize the latest technology. This is particularly disturbing to find technologically informed people praising and pushing the iPad.

The iPad is not a computer. It is a glorified media consumption device. A computer can run arbitrary code. It can do what the user tells it to do. The iPad tells the user what he can do, and he should be happy to pay for the privilege. In order to run software on the iPad, a programmer must write a program according to Apple's specs in a specific language (Objective C) and then submit it for approval to the app store. Then Apple must approve of this software before it is published. If the developer wishes to charge for the software, Apple automatically takes a fixed cut of 30% of the proceeds. This means a developer must play by Apple's (and perhaps AT&T's) Draconian rules in addition to sacrificing 30% just for the privilege of writing software for the iPad.

And because I'm a sucker for lists, here's a list of flaws with the iPad:

Relies on the app store for ALL software
Requires iTunes for loading files
No multi-tasking
XGA screen (1024x768) cannot display HD content
No user accessible file system
No replaceable battery
No expansion ports (arguably not a flaw)

The starting price on the iPad is $499, which is respectable given the hardware design. However, just like the automobile industry, the hardware options will affect the price considerably. Adding a 3G radio and 48GB of flash memory increases the price a whopping 66%.

Apple clearly creates different products with different customers in mind. I've been heavily using a MacPro with OSX Snow Leopard Server lately and I am very satisfied with the experience. I'm no UNIX whiz yet (I'm probably still more comfortable with a DOS prompt), but having terminal access right on the dock is a fantastic feature. Like any respectable server, I can connect remotely via SSH and hack until I'm happy. In fact, the Apple II+ even included schematics for the circuit boards. In contrast, Apple's first foray into the ultra-thin and light computing market, the MacBook Air, didn't even include an ethernet port.

In an attempt at streamlining (ahem, asserting control over the user), Apple has eliminated the concept of files with the iPad. Until cloud computing and services like Google docs are ready for prime time, the iPad is useless as a content creation tool. That leaves it as a media and internet consumption device. Such media players aren't necessarily bad, and the form factor of the iPad is quite suitable. But Apple has dropped the ball again, this time by forcing iTunes usage. We all know how Plays for Sure TM turned out; even a juggernaut like Microsoft was unable to salvage the music libraries of paying customers. There is no place for DRM is today's technology climate, though this is a rant for another time. It is easy to fill up the iPad's memory with media content and applications remotely, but it is impossible to remove media content without docking to an iTunes workstation (with wires, no less, even though Wifi is already built into the iPad and ubiquitous).

The critical praise of the iPad reeks of irresponsible, lazy journalism. The reviewers are too lazy to truly identify what the iPad is and what it is not. It is not a computer. It is a handcuffed media player mashed with a mediocre internet browser wrapped in a sexy package. It's much easier to be seduced by Apple's rhetoric and entranced by the pretty hardware design than to truly evaluate the capabilities and merits of the device.

Monday, March 29, 2010

Chinese Internet Censorship

According to the official Google blog, as of March 22nd, 2010 Google has begun redirecting traffic on its Chinese site, google.cn to its uncensored Hong Kong site, google.com.hk. This includes Google Search, Google News, and Google Images.

This was prompted by “Operation Aurora” which was an attack in which the Gmail accounts of Chinese dissidents were hacked via a security hole in Internet Explorer, globally the most widely used browser. Additionally, Google has stated some intellectual property has been stolen. Allegedly these attacks originated in China and they have been traced to two schools in China. These schools have strong ties to a popular Chinese internet company named Baidu, which is infamous for aping Google, albeit with strong censorship.

Google was already maintaining an uneasy balance between complying with Chinese government regulations and providing its services to Chinese users. The attacks of “Operation Aurora” convinced Google to disregard the censorship restrictions mandated by the Chinese government, and Google instead redirected all traffic from China to Hong Kong. This then forces the Chinese government to make a decision: allow Google to operate uncensored in China via Hong Kong, or completely sever ties to Google.

As of this writing, it appears that the Chinese government is choosing inaction. google.cn redirects to google.com.hk. Because google has roughly 30% of the search market in China, where Baidu has ~60% and has the benefit of being homegrown (in a copycat sense), it is highly possible that the Chinese government will simply cut off google in the near future and tout Baidu as offering superior search technology.

Internet handcuffs

So what’s all the fuss about? Wikipedia has an excellent introduction to internet censorship in China. For various reasons perhaps related to maintaining order and power, the Chinese government prefers to keep its citizens in the dark regarding its questionable activities and has done a remarkable job so far.

It is not just the most outrageous sites that are blocked; internet cornerstones including YouTube, Facebook, Twitter, and the Chinese language BBC are all blocked by the great firewall. The core rationale is that the Chinese government is afraid of any content which is critical of their actions and/or casts them in an unfavorable light. Their view is not that transparency makes the system stronger, but critical arguments engender doubt upon the government’s abilities. The primary censored content areas are related to:

Falun Gong
Tiananmen Square protests
Unregulated media sites
Taiwan
Pornography
Tibet

In a shockingly dystopian move, the Chinese government essentially shut down the Chinese internet for a period of four days during the 20th anniversary of the Tiananmen Square protest. Popular portals and discussion groups were forced to post a cryptic maintenance message.

"For reason which everyone knows, and to suppress our extremely unharmonious thoughts, this site is voluntarily closed for technical maintenance between 3 and 6 June 2009..." Dusanben.com (translation)

In addition to blocking many portals, media outlets, and social networks, the Chinese government has gone so far as to block certain words from their internet making it virtually impossible to access restricted content without jumping through some hoops.

With all this effort and 30,000 internet police as of 2005, it is still quite possible to sidestep the great firewall and compute on the free internet. SSH and VPN tunneling to outside machines or routing traffic through an onion router or the TOR network are all effective ways of accessing restricted content. However, these techniques rely on having direct access to a machine outside of China, and such access could be cut at any moment by the government.

What does the Chinese internet look like today?

To satisfy my personal curiosity, I ran a few quick and dirty experiments to see what the internet looks like in China. I have no understanding of the Chinese language, written or spoken, so this informal exploration of the Chinese internet relies on image search results. Additionally Google translate and Chinese character mappings have been used liberally.

Search #1:

First up, an innocuous search query: “Mars” or 火星

US based Google image search (GIS):

Google.cn now redirects to google.hk.cn, and we see very similar results (though not identical), regardless if the query is in English or Chinese:

China’s leading search engine, Baidu, gives reasonable results when queried in Chinese:

When queried in English, Baidu gives:

These image results are much poorer, but keep in mind that Baidu is intended as a Chinese language search engine, and will primarily be used as such. These image results are not particularly surprising.

Search #2:

This time let’s try something a little more interesting: “Tiananmen square protest” or “六四事件”.

First up, is an US based GIS in English:

This image search clearly returns iconic photographs of lines of protesters and Chinese military tanks, along with pictures of violence.

Next a Chinese Google search (via Hong Kong) and a US based Google search on the query “六四事件” yield similar results as above:

The exact same query in Chinese on Baidu results in:

These results are mostly posed archival photos of Communist leaders and the origins of the Communist party. A Baidu search for simply Tiananmen Square (or 天安門廣場) yields only beautifully colored sunlit tourist photos of Tiananmen Square:

Search #3:

To really see the effects of the Great Firewall, check out the results of a very simple forbidden query: “falun gong”

The Baidu search engine not only forbids such queries, it immediately resets the connection to the server and sets a timeout for querying user. Rather than providing an error message stating it is a forbidden search, a query for “falun gong” is met with strict refusal of service. For a period of several minutes following such a query, the search engine main page itself is unavailable, even for completely authorized queries.

The Chinese government chooses to keep its citizens in the dark regarding their questionable practices by cutting access to information. Even this very site (google.blogspot.com) is banned in China, and attempting to access it causes a 5 minute block to be placed on the user's IP.

It is my opinion that placing these restrictive measures is a battle destined for failure. With new technology such as routing protocols, tunneling, long distance wireless communications, and easier access to travel and trade it is only a matter of time before the average Chinese citizen has access to free information. However, I suspect it becomes a much different problem at that point: Chinese citizens must willingly accept this information and choose to act on it for the betterment of society. Kudos to Google for standing up to this form of oppression and working to fulfill their corporate mission: to organize the world's information and make it universally accessible and useful.