Thursday, December 20, 2012

The Google F1 slides

Google F1 is a relational database query engine that works on top of Google Spanner, which is a distributed storage system that sits on top of Google File System. Got it? :)

Basically, it's a really big, distributed relational database, and Google is using F1 to replace MySQL for Adwords.

Thursday, November 1, 2012

Chomsky on Where AI Went Wrong

If one were to rank a list of civilization's greatest and most elusive intellectual challenges, the problem of "decoding" ourselves -- understanding the inner workings of our minds and our brains, and how the architecture of these elements is encoded in our genome -- would surely be at the top. Yet the diverse fields that took on this challenge, from philosophy and psychology to computer science and neuroscience, have been fraught with disagreement about the right approach.

The father of fractals

A nice little piece on Mandlebrot in the Economist:

Friday, September 7, 2012

The Google Dremel Paper

Here is the paper describing Google Dremel, which may replace Hive one day. There does not seem to be anyone working on an open-source version though

Link (PDF)

Update: Apache Drill is the open source version of Dremel (hat tip to Zoltan).

Also, Cloudera's Impala looks simlar.

Self-driving cars: The next revolution

Here is a recent report from KPMG about self-driving cars:

Tuesday, August 7, 2012

Google’s Self-Driving Cars Are Going to Change Everything

Recent News:

Google’s Self-Driving Cars Complete 300K Miles Without Accident, Deemed Ready For Commuting

Here's what is going to happen in the next 5-10 years. It won't all happen right away.

  • The car insurance industry will cease to exist. These cars aren't going to crash. Even if there are hold-outs that drive themselves, insurance would be so expensive they couldn't afford it, as no one else would need it. 

  • If the cars don't crash, then the auto collision repair / auto body industry goes away. The car industry also shrinks† as people don't have to replace cars as often.

  • Long-haul truck driving will cease to exist. Think how much money trucking companies will save if they don't have to pay drivers or collision and liability insurance. That's about 3 million jobs in the States. Shipping of goods will be much cheaper.

  • On that note, no more bus drivers, taxi drivers, limo drivers.

  • Meter maids. Gone. Why spend $20 on parking when you can just send the car back home? There goes $40 million in parking revenue to the City of Vancouver by the way. 

  • Many in cities will get rid of their cars altogether and simply use RoboTaxis. They will be much cheaper than current taxis due to no need for insurance (taxi insurance costs upwards of $20,000/year), no drivers, and no need for taxi medallions (which can cost half a million in Vancouver). You hit a button on your iPhone Android, and your car is there in a few minutes.

  • Think how devastating that would be to the car industry. People use their cars less than 10% of the time. Imagine if everyone in your city used a RoboTaxi instead, at say 60% utilization. That's 84% fewer cars required. Currently, at peak travel times, only 12% of cars are in use - so, you don't need that many cars, even at rush hour (So a more realistic drop of 80%...)

  • No more deaths or injuries from drinking and driving. MADD disappears. The judicial system, prisons, and hospital industry shrink due to the lack of car accidents†. 6 million crashes in 2010 (USA). 2.3 million hospital visits from car accidents (USA 2009). $300 billion annual cost from crashes (USA)

  • Car sharing companies like Zip, Modo, Car2Go are all gone. Or, one of them morphs into a robo-taxi company.

  • Safety features in cars disappear (as they are no longer needed), and cars will become relatively cheaper. Cars become lighter too (more fibreglass), and thus more fuel efficient.

  • "People will want to drive their own car". Of course, and no one is stopping them. But they will have to pay very high car insurance premiums. 

Who is it good for?

  • Consumers. A family of four can probably get by with one car that shuttles everyone to and from work, school. Save $2000+ a year on car insurance. Read on your way to work. 
  • Bars and restaurants. Everyone can now drink and drive safely! More alcohol is sold.
  • The elderly and disabled can get around on their own more easily.
  • Cyclists and pedestrians - fewer deaths from cars

† to the degree that they are affected by this

Saturday, February 11, 2012

Less Painful AJAX / Javascript Web Scraping

If you read my previous post, you'll see that scraping ajax pages can be a pain. So I wrote a little Java program to make it easier. It takes a list of URLs to scrape, and will render them in a browser, and save the (normal and ajax) rendered HTML and screenshots to a folder.

Here's the how-to video:

You need Firefox 3+ installed, as well as Java 1.6. This is a beta project, and no warranty is implied. You can get the file here:

Mad props to the Selenium team

Thursday, February 9, 2012

Web Scraping AJAX Pages

This is part four of a series of video tutorials on web scraping and web crawling.

You can probably skip this one, and go to the easy version.

This post explains how to capture HTML from Ajax / Javascript generated pages.

Here is the accompanying video.

The first thing you should know is that it is a major, major pain in the ass. Set aside half a day in your calendar, and get some hard liquor. The scraping itself is easy, but whoever wrote the installers for these programs has serious issues.

This method involves PHP, but is likely simpler if you already know Java.

The main idea is to use the functional testing framework Selenium, which can automate web browsers, such as  Chrome. Point it to a URL, have Chrome render the page (including ajax), wait a few seconds, get the HTML from the browser, and save it to a file. This is all done automatically.

I am going to gloss over most of the software installation steps, as they are lengthy, and explained (poorly) elsewhere. I will also not answer any comment questions about the software installation, but I encourage you to help one another. Try for help too.

install Java Runtime Environment if you do not already have it
install Selenium Server 2, and run it from the command line: java -jar selenium-server.jar
install PHP, make sure it's in your system path
install PEAR
install PHPUnit with all dependencies (using PEAR, read their site)
install PHPUnit_Selenium with all dependencies (using PEAR)

create a folder 'tests', add phpunit.xml (replace the square brackets with angled brackets):


  [browser name="Chrome" browser="*googlechrome" /]
  [!-- [browser name="Internet Explorer" browser="*iexplore" /] --]


create a file tests/functional/AjaxTest.php:
class AjaxTest extends PHPUnit_Extensions_SeleniumTestCase {

    public function setUp(){

//you would set this to whatever website you want to scrape
$this->setBrowserUrl('');     }     public function testA(){

//this is an example of a page that you would want to scrape
 $this->waitForCondition('', 5000); //5 seconds
//save the html output to a file
file_put_contents("ajax_output.html", $this->getHtmlSource());
and you're done.

then on the command line go to your tests folder and run phpunit functional

Sunday, January 29, 2012

On Making Videos

Here is what i use to make my videos:

1. CamStudio. This is a nice free and open-source desktop video capture program. Make sure to use their Lossless Codec, and go with these settings:

Set Keyframes Every 30 frames
Capture Frames Every = 50 milliseconds
Playback Rate = 20 frames per second
Video codec: CamStudio Lossless Codec
Quality: 70%

2. Handbrake Video Transcoder. This will help you to shrink videos before uploading them to Youtube. It is free and open source. You can cut the size of a video 10x and it will still look & sound good.

3. A Blue Microphones Snowball USB Mic. About $65

4. An Acer Aspire 5742 Laptop (Core i3, 8GB RAM). I like this laptop for the following reasons:

  • got it at Wal*Mart for $369 (plus $70 for the extra RAM)
  • built-in number pad
  • built-in high-def web cam
  • gesture mouse pad
  • hdmi output
  • decent battery time
  • light

The only problem with it is the crappy built-in microphone, hence the Snowball.